Search Options
close
Search the following clips:
All Clips
Everyone's Clips
My Guides
Sign Up
Install
Learn More
Login
Unweaving a Tangled Web With HTMLParser and Lucene
ccharlebois
follow
1
7-12-2007 8:20 AM
385 views
tags:
htmlparser
,
lucene
,
articles
Add a Comment
Login
to Comment. Not a member yet?
Sign up
Today's Top Clips
Cat and Dog Stories
Fantastically Named People
Malay Eagle-Owl
Soldiers' return best gift for families this Christmas
Near-Death Experiences - Scientific evaluation and anecdotes
I'll Be Damned If I'll Let You Misrepresent Me
North Magnetic Pole Moving East
Dutch hero clobbers Muslim Terrorist
Let's Roll 2 by Mark Steyn
Unique And Amazing Places
visit the
Top Clips page
View the Top Clips from
July 12, 2007
Embed This Clip In Your Site...
<div style="margin: 12px 0px; font-family: arial; color: #333333; background: #ffffff; border: solid 4px #e5e5e5; width: 100%; clear: left;"><div class="CM_CTB_Content_Wrap" style="margin: 0px; padding: 0px;background-color: #ffffff;"><div style="border-bottom: solid 1px #dcdcdc; white-space: nowrap; margin-bottom: 8px; background-color: #eeeeee ;background-image: url(http://clipmarks.com/images/source-bg.gif); background-repeat: repeat-x; height: 24px; line-height: 24px; vertical-align: middle; padding-bottom: 4px; color: #666666; font-size: 10px;" ><a href="http://clipmarks.com/clip-to-blog/" title="see clips that are hot right now"><img src="http://content.clipmarks.com/blog_embed/92349b62-aa65-474e-9d76-d0e7b42bb008/08D4ACE7-7FE3-4B25-BFF5-FA6C70E4ED62/" alt="" width="19" height="19" border="0" style="vertical-align: middle; margin: 0px 4px; display: inline; border: none; float:none;" /></a>clipped from <a title="http://javaboutique.internet.com/tutorials/HTMLParser/" href="http://javaboutique.internet.com/tutorials/HTMLParser/" style="font-size: 11px;">javaboutique.internet.com</a></div><blockquote style="text-align: left; padding: 0px 8px; margin: 4px 0px 8px 0px; background: transparent; border: none;" cite="http://javaboutique.internet.com/tutorials/HTMLParser/"><H1>Unweaving a Tangled Web With HTMLParser and Lucene</H1></blockquote><div style="height: 2px; font-size: 2px; background: #dcdcdc; border-bottom: solid 1px #f5f5f5; margin: 2px 4px;"></div><blockquote style="text-align: left; padding: 0px 8px; margin: 4px 0px 8px 0px; background: transparent; border: none;" cite="http://javaboutique.internet.com/tutorials/HTMLParser/"><P> Ever wanted to write a Java program that crawls the web? You know a program that reads HTML-pages, retrieves the links, gets the new pages--with more links and so on. Maybe you also have thought about storing the text from the HTML pages for later use, to be able to search for specific information in the pages for example. These are the characteristics of a search engine like Google or Yahoo. If you have a web site of your own you might be interested in having your own search engine. One possibility is to buy one, or use an Open Source search engine, but you might also find it rewarding to write your own! </P></blockquote><div style="height: 2px; font-size: 2px; background: #dcdcdc; border-bottom: solid 1px #f5f5f5; margin: 2px 4px;"></div><blockquote style="text-align: left; padding: 0px 8px; margin: 4px 0px 8px 0px; background: transparent; border: none;" cite="http://javaboutique.internet.com/tutorials/HTMLParser/"><P> In this article I'll show you the basic technique in building a search engine using two powerful Open Source products: <A href="http://htmlparser.sourceforge.net">HTMLParser</A> and <A href="http://jakarta.apache.org/lucene/docs/index.html">Lucene</A>. </P></blockquote></div><div style="margin: 0px 6px 6px 4px;"><table style="font-size: 11px;border-spacing: 0px;padding: 0px;" cellpadding="0" cellspacing="0" width="100%"><tr><td style="background:transparent;border-width:0px;padding:0px;"> </td><td align="right" style="background:transparent;border-width:0px;padding:0px;width:107px" width="107"><a href="http://clipmarks.com/share/08D4ACE7-7FE3-4B25-BFF5-FA6C70E4ED62/blog/" title="blog or email this clip"><img src="http://content6.clipmarks.com/images/c2b-foot.png" border="0" alt="blog it" width="107" height="17" style="border-width:0px;padding:0px;margin:0px;" /></a></td></tr></table></div></div>
New from the makers of Clipmarks:
Amplify.com - Don't just share the news...Amplify it!
Clipmarks
Home
New Clips
Top Clips
Dashboard
Popular Topics
News
Life
Science
Technology
Entertainment
Get Started
Sign Up
Install Clipping Tool
How Clipping Works
Clip-to-Blog™
ClipSearch
Tools and Resources
FAQ
ClipWeek
Top Clippers
Top Tags
Site Map
About Clipmarks
About Us
Contact
Copyright
Privacy
EULA
OK