Home > Domino Tips > Chapter Downloads > Chapter 6: Crawling the Web with Java
Domino Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

CHAPTER DOWNLOADS

Chapter 6: Crawling the Web with Java


McGraw-Hill/Osborne
09.29.2004
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Crawler technology is useful in many types of Web-related applications For example, you might use a crawler to look for broken links in a commercial Web site. You might also use a crawler to find changes to a Web site.

Although Web crawlers are conceptually easy, in that you just follow the links from one site to another, they are a bit challenging to create. One complication is that a list of links to be crawled must be maintained, and this list grows and shrinks as sites are searched. Another complication is the complexity of handling absolute versus relative links.

Fortunately, Java contains features that help make it easier to implement a Web crawler. First, Java's support for networking makes downloading Web pages simple. Second, Java's support for regular expression processing simplifies the finding of links. Third, Java's Collection Framework supplies the mechanisms needed to store a list of links.

The Web crawler developed in this chapter from the book The Art of Java, by Herbert Schildt and James Holmes, is called Search Crawler. It crawls the Web, looking for sites that contain strings matching those specified by the user. It displays the URLs of the sites in which matches are found. Although Search Crawler is a useful utility as is, its greatest benefit is found when it is used as a starting point for your own crawler-based applications.

Click here to download this free book chapter.

Rate this Tip
To rate tips, you must be a member of SearchDomino.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Chapter Downloads
Chapter 17: Resolving Server Problems
Chapter 15: Managing Users and Groups
Chapter 14: Managing Servers
Anti-Spam Tool Kit book excerpt: Fighting spam defensively
Chapter 22: JavaScript security
Chapter 21: Writing Java servlets and JavaServer Pages
Chapter 5: Blocking spammers with DNS blacklists
Chapter 3: Program control statements
Chapter 18: JavaScript and embedded objects
Chapter 13: Eavesdropping techniques

Java for Lotus Notes Domino
Top 10 Lotus Notes/Domino coding and development tips of 2008
Java code inserts data from Notes documents into a SQL table
Java code shortens strings in a SQL table
How to execute a stored procedure in Lotus Notes Domino using Java
Top 10 Lotus Notes Domino programming and development tips of 2007
How to return an HTML representation of a Lotus Notes rich-text field
Shrink Lotus Notes databases with many attachments
Converting Lotus Notes Domino Web pages to PDF files with a Java agent
Developing Eclipse plug-ins for Lotus Notes and Domino -- 7 tips in 7 minutes
A bevy of Notes/Domino development tips

Web Development for Lotus Notes Domino
Trap JavaScript runtime errors in Domino Web apps
Write HTML and JavaScript in Notes view rows and columns on the Web
JavaScript detects Web browser type and version in Notes/Domino 8.0.2
Top 10 Lotus Notes/Domino coding and development tips of 2008
Top 10 issues when developing Lotus Notes Domino Internet applications
Top 10 Lotus Notes Domino programming and development tips of 2007
Programmatically copy and hide attachments in Lotus Notes rich-text fields
Programmatically edit a rich-text field table from within the Lotus Notes client
Troubleshooting Lotus Notes Domino tabbed table problems
How to validate Lotus Notes forms on a Domino server without losing entered data

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Domino & Lotus Notes Security Solutions: Authentication, Antispam, Encryption and Antivirus
HomeTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersDomino IT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 1999 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts