Google looks to penetrate ‘Deep Web’ with HTML forms crawling
April 24, 2008Google’s ever-active search bots that scour the Web constantly for new pages, have started a novel, more active phase of their indexing jobs. Alon Halevy and Jayant Madhavan of Google’s crawling and indexing team stated the firm has launched an experiment in which its indexing software will experimentally enter text in web site forms to check what previously undiscovered pages may appear.
In the last few months, we have been constantly exploring some HTML forms in an effort to discover new Web pages and URLs, which we otherwise could not find and index for users who search on Google.
This experiment, according to them, is part of Google’s broader effort to increase and enhance its coverage of the Web. In fact, HTML forms have for quite some time been thought to be the ‘gateway’ to large volumes of data beyond the normal purview of search engines.
The new Google indexing practice will involve only ‘high quality’ sites and will not run on sites with ‘robots.txt’ files. To decide what words are to be typed into the forms, the indexing software samples from among terms on the web page surrounding the form. Google has taken one step closer to the Deep Web with this experiment to index HTML forms inclusive of drop-down boxes and select menus.
No Comments
No comments yet.
RSS feed for comments on this post
TrackBack URI
Leave a comment
Just Search Weblog
Archives:
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
Pages:
Meta:
Categories:
- Affiliate Marketing
- Analytics
- AOL
- Directories
- Domains
- eMail Marketing
- Internet Marketing
- Internet News
- Internet Service Providers
- IT
- Microsoft
- Mobile
- PPC
- SEM
- SEO
- Social Networking
- Yahoo!
