Friday, April 18, 2008

Crawling The 'Deep' Web

Interesting and valuable, though I'm surprised it took someone this long to try...

Google Spiders to Start Crawling The 'Deep' Web
For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made.

