Deep Web, NOW Web – more headaches for Google

The NOW Web is the cyberspace that contains all those packets of information or Instants that are moving around.  Think of that Google Chat message you just received or the indication that you now have 10 Friends online in Facebook or that text message you just received on your cell phone.  Many of them last only for a very short period of time and then disappear without trace.  Only a very minute fraction of these are associated with a hyperlink or Uniform Resource Locator (URI) so can never be crawled.

They are presumably not part of the Information space that Google wishes to catalogue.  They would never have the hyperlink information that allows the Google algorithms with their PageRank concept to offer them as relevant answers to queries.

Now the New York Times has pointed out that part of this NOW Web does persist but is still uncrawlable by crawlers or spiders.  They are using the name, Deep Web, for that and suggesting that this is a ‘Deep Web’ That Google Can’t Grasp.

Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. “The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (www.kosmix.com), a Deep Web search start-up.  Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources.

With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine — no matter how powerful — to sift through every possible combination of data on the fly.

If that is true for the Deep Web, how much more true is it for the very much bigger NOW Web. 

This obviously creates problems for Google with its mission to catalogue all information.  However there are two consolations for Google:

  • Searching these non-crawlable cyberspaces represents an enormous challenge
  • Google can certainly achieve all its financial objectives by focusing on the crawlable Web.

It all depends whether Google will follow Peter Drucker’s advice to Focus, Focus, Focus or whether they wish to dream an impossible dream.

13 thoughts on “Deep Web, NOW Web – more headaches for Google”

  1. I think you have hit on a point that most will not talk about. The truth is the internet is far more vast than we could ever imagine.

    Take into consideration the amount of information available and knowledge accessible in other nations. Being in the military and serving overseas in Korea, I was tied into information Google directed me to because I was in South Korea. 99.9% of the United States will never see this information.

    One must realize that this is still a NEW market, the web. 10 years of widespread popularity, the internet is still very much an infant. Will Google remain the only topic of discussion when it comes to internet browsing over the next 10 years? Maybe, but there will be new players in the market. Businesses must understand this, and be prepared to accommodate this change. The days of SEO could be vanquished in a matter of YEARS.

    As we move along, the internet will become a daily aspect of life. One must attribute the magnitude of reach and exposure the internet provides. One day the internet will become the PRIMARY source of scrutinization and objectivity to ALL day-to-day transactions. The thing to keep in mind is that it may not be Google that makes it possible, but a third party web site that connects with its users and provides feedback in the means necessary for its users. Think outside the box here and realize we are now a CONNECTED world.

    As far as updates that aren’t categorized by Google, what if they were? What impact would that have on you as a business? It’s time to think outside the box and prepare for what is inevitable. The internet WILL influence the majority of CONSUMERS in whether they purchase from you or not. Stop focusing on Google stats and start focusing on customer relations as that is where the internet is heading.

  2. Hello, Barry,

    You may be interested in the emerging invisible web standard called the Internet Search Environment Number. Not ready for prime time quite yet. You will find the isen.org and blog.isen.org useful for background on us and what is up!

    I’m free to answer your questions if you want to cover this in more depth.

    -M@

  3. Thank you for stopping by, Matthew. Your organization certainly is a good indicator of the richness of even the deep web. I wonder who is willing to join the movement.

  4. I wonder if forums and usenet made ‘NOW web’ earlier 🙂 Now it also includes social media sites, twitter, social networking status messages and a lot of things I haven’t even heard about.

    Deep web is being explored by people search companies (including wink), I think the onus should be on webmasters to make a crawlable URI for their sites. Why should search engine be guessing these path?

    For NOW Web, I believe the publishers should start pushing the content towards search engines using a pre-defined protocol. Otherwise the problem would remain unsolved.

  5. Google is going to be very fast to crawl all the dynamic pages in the web these days. The information is changing so fast and those kind of service really make it difficult to follow and report to the search engines.

  6. Hi,

    I’m just wondering (seriously though) why is Google so eager to index and control this so called NOW Web? Could this be a serious step to a live surveillance of the Internet users? (or of the people through the Internet) You know, just a little bit before “Minority Report”…

  7. I’m not sure they will ever attempt to index the NOW Web since much of it is without URLs. Even trying to cover the smaller Deep Web will be a challenge. They are beginning that but I’m not sure how worth-while that effort will be.

  8. Interesting points Barry, i never really gave much thought to this, i have always made use of what was available never really considering the untouchable.

    I have also stopped by Matthews website and ISEN is a brand new concept to me but i get the ISBN metaphor without understanding the end user experience for reading databases.

    Deep Web, we have not even started to understand the possibilities. It`s good to step out of your comfort zone once in a while.

  9. Have to admit, that Google spiders weren’t as fast as they are now. Back in the days, it could take up to a month or even two to index your site. Today, if you know how, you can get your site indexed in less than 24 hours.

  10. This is near & dear to us as we tackle the deep web of the real estate space. This data is notorious for being hidden behind forms. The reality, is that there is so much content being created on a daily basis its a nearly impossible task to index the now and be one size fits all as G is attempting to do. I think there will be small vertical / deep search companies who will reap big wins from applying Druckers Focus, Focus, Focus as users continue to learn how to really search… which is evidenced by trend in longer keyword searches, they will find it will be easier to just use vertical search sites instead.

  11. Google is going to be very fast to crawl all the dynamic pages in the web these days. The information is changing so fast and those kind of service really make it difficult to follow and report to the search engines.

Comments are closed.