The NOW Web is the cyberspace that contains all those packets of information or Instants that are moving around. Think of that Google Chat message you just received or the indication that you now have 10 Friends online in Facebook or that text message you just received on your cell phone. Many of them last only for a very short period of time and then disappear without trace. Only a very minute fraction of these are associated with a hyperlink or Uniform Resource Locator (URI) so can never be crawled.
They are presumably not part of the Information space that Google wishes to catalogue. They would never have the hyperlink information that allows the Google algorithms with their PageRank concept to offer them as relevant answers to queries.
Now the New York Times has pointed out that part of this NOW Web does persist but is still uncrawlable by crawlers or spiders. They are using the name, Deep Web, for that and suggesting that this is a ‘Deep Web’ That Google Can’t Grasp.
Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. “The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (www.kosmix.com), a Deep Web search start-up. Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources.
With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine — no matter how powerful — to sift through every possible combination of data on the fly.
If that is true for the Deep Web, how much more true is it for the very much bigger NOW Web.
This obviously creates problems for Google with its mission to catalogue all information. However there are two consolations for Google:
- Searching these non-crawlable cyberspaces represents an enormous challenge
- Google can certainly achieve all its financial objectives by focusing on the crawlable Web.
It all depends whether Google will follow Peter Drucker’s advice to Focus, Focus, Focus or whether they wish to dream an impossible dream.