Welcome, Unregistered.
You last visited: Today at 08:54 AM






 

Members: 253
Threads: 183
Posts: 454
Online: 5

Newest Member: mofen431


Go Back   Arcades Forum Community > Search Engines > Google

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 04-13-2008, 07:16 PM
Junior Member
 
Join Date: Apr 2008
Posts: 9
Post GOOGLE Bot Can Crawl More Now!!

Google's GoogleBot is now able to crawl forms and they say they can now "scan" Flash and Javascript.

That's big news!!

All those links could be crawlable now that were previously not found due to javascript.

Quote:
Crawling through HTML forms
Friday, April 11, 2008 at 10:50 AM
Written by Jayant Madhavan and Alon Halevy, Crawling and Indexing Team

Google is constantly trying new ideas to improve our coverage of the web. We already do some pretty smart things like scanning JavaScript and Flash to discover links to new web pages, and today, we would like to talk about another new technology we've started experimenting with recently.

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.

Needless to say, this experiment follows good Internet citizenry practices. Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site.

The web pages we discover in our enhanced crawl do not come at the expense of regular web pages that are already part of the crawl, so this change doesn't reduce PageRank for your other pages. As such it should only increase the exposure of your site in Google. This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.

This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.


This is big news, and will spark a new generation of searching!

More...
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 08:54 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
All rights reserved. Arcades-Forum.com. 2008
A vBSkinworks Design