Deep Web, NOW Web – more headaches for Google

The NOW Web is the cyberspace that contains all those packets of information or Instants that are moving around.  Think of that Google Chat message you just received or the indication that you now have 10 Friends online in Facebook or that text message you just received on your cell phone.  Many of them last only for a very short period of time and then disappear without trace.  Only a very minute fraction of these are associated with a hyperlink or Uniform Resource Locator (URI) so can never be crawled.

They are presumably not part of the Information space that Google wishes to catalogue.  They would never have the hyperlink information that allows the Google algorithms with their PageRank concept to offer them as relevant answers to queries.

Now the New York Times has pointed out that part of this NOW Web does persist but is still uncrawlable by crawlers or spiders.  They are using the name, Deep Web, for that and suggesting that this is a ‘Deep Web’ That Google Can’t Grasp.

Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. “The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (www.kosmix.com), a Deep Web search start-up.  Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources.

With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine — no matter how powerful — to sift through every possible combination of data on the fly.

If that is true for the Deep Web, how much more true is it for the very much bigger NOW Web. 

This obviously creates problems for Google with its mission to catalogue all information.  However there are two consolations for Google:

  • Searching these non-crawlable cyberspaces represents an enormous challenge
  • Google can certainly achieve all its financial objectives by focusing on the crawlable Web.

It all depends whether Google will follow Peter Drucker’s advice to Focus, Focus, Focus or whether they wish to dream an impossible dream.

Visitors Bounce

Bounce may be a word that you have not used much in the past.  It is likely to become a hot word in 2009.  We are talking here particularly about the way visitors to online web pages eventually move off elsewhere.  The bounce rate is the percentage of visitors who leave the website from that web page.

If they move off to another web page in the same website, then that is normally a confirmation that they are finding something of interest.  It is the very best indicator.  It may well be far less open to manipulation than the emphasis on hyperlinks that is at the heart of the Google PageRank approach.  That is why I believe the answer to Eric Enge’s question, Do Search Engines Use Bounce Rate As A Ranking Factor, must be in the affirmative.  Google has all the data needed to use this approach and it must only be a matter of time. 

The proportion of visitors who bounce away from any website is a critical measure of performance.  Having sticky websites that hold visitors as they move from page to page gives the best opportunity to achieve whatever objectives the website may have. The one major exception is all those web pages where someone clicks away and the website gains revenues by the move.  Google is a major partner for such web pages since the major part of its revenues comes from AdWords ads.  Provided they move away via the AdWords ad, a high bounce rate here is not a problem.

For all other websites it is best to be considering how to lower that bounce rate.  It is not just a matter of opening links in a new tab as one person suggested. 

Nor is it just a matter of only including links to other web pages within the same website.  As Matthew Ingram pointed out, even the New York Times has now realized that including links to other websites may be the smart thing to do.

There have been hints for a while now that the New York Times was going to start adding links to third-party content on its front page, and now it appears to have finally happened, with the launch of something called Times Extra. The paper has been doing this for some time now on its technology front page, using links aggregated by BlogRunner — the meme-tracker the company acquired a couple of years ago — as well as through content-syndication agreements with blog networks like GigaOm, VentureBeat and Read/Write Web.

The very best way to make a website sticky is to give visitors what they are looking for.  That is what will bring them back again and again.  Even the New York Times is showing the way.