Forget We Ever Mentioned A Supplemental Index – Google

It shouldn’t really happen to a nice company like Google. You try to do someone a favor and it blows up in your face.

A few years back, they realized that it would be difficult to give a speedy response to a search query if they had a single database of all the web pages they were spidering. So they decided to put web pages that might come up more frequently in search queries in their regular database. Other less popular web pages they would put in a supplementary or secondary index. By this means, they could keep cataloguing all the web pages they could find and still deliver fast results to most key word searches by using the regular index. Technically it was the right solution. The mistake was they told people about it. At the time the approach seemed a positive move.

Move on to 2005, and this two-index system begins to upset a lot of people. With the explosive growth of the Internet, it is impossible to put the majority of all web pages in the primary index. Since the primary index is spidered more frequently and its web pages are more likely to appear in keyword searches, you can understand why people got upset. Of course if Google hadn’t mentioned the supplemental index, then people would never have known of this possible problem.

The whole issue has become a can of worms for Google. A great many people were upset. SEO (search engine optimization) experts worked hard to figure out how to keep web pages out of the supplemental index. Google has tried to lower the temperature on this topic by reducing the differences between the two indexes (the regular index and the supplemental index). In mid-December, Yonatan Zunger of the Google Search Quantity Team reported on progress.

We improved the crawl frequency and decoupled it from which index a document was stored in, and once these “supplementalization effects” were gone, the “supplemental result” tag itself-which only served to suggest that otherwise good documents were somehow suspect-was eliminated a few months ago. Now we’re coming to the next major milestone in the elimination of the artificial difference between indices: rather than searching some part of our index in more depth for obscure queries, we’re now searching the whole index for every query.

From a user perspective, this means that you’ll be seeing more relevant documents and a much deeper slice of the web, especially for non-English queries. For webmasters, this means that good-quality pages that were less visible in our index are more likely to come up for queries.

You might have hoped that would satisfy searchers. However Barry Schwartz of SERoundtable felt that the announcement drove only more confusion.

I’ll call Google out on this one, and I rarely do.

Google, we need you to stop hiding this index from us. We really need an explanation of what this index does, why a page would be placed in the supplemental index. When Google actually searches it? In what examples would a page in the supplemental index rank better than a page in the main index?

The confusion over the supplemental index has gone on too long.

Andy Beard was equally concerned that his method of identifying web pages in the supplemental index no longer seemed to be working. Apparently he really would like to know which web pages are still in the supplemental index.

Google may well be upset that people do not seem to be accepting its explanation of the “closeness” of the two indexes. On this one I agree with Google. People seem to be fixated on the notion of the supplemental index, as if this was an important issue in the keyword search algorithms.

It’s interesting to compare this with another Google invention, which used to be a hot topic and is now a yawn topic for most SEO keyword searchers. That’s the Google Toolbar Page Rank indicator. It may well be broken and is possibly only kept around for marketing reasons. It has almost zero connection now with how web pages are ranked in keyword searches.

So people, let’s get over it. Forget about that supplemental index and work on the more important things that make a web page memorable, authoritative, trust-worthy and ultimately search-engine visible. You’ll get much better rewards for your efforts.

Related:
Supplemental Result in Google – Hell or Help – March 31st, 2007
Google Supplemental Results Index – A Word To The Wise – July 9th, 2006
Google Supplemental Label Out, PageRank Next? – August 1st, 2007

A Small Peephole On Google Supplemental Results

Don’t Worry, Be Happy: Google indexes all web pages.

At the end of July, Google announced that they would no longer identify Web pages in the Supplemental Index. Although such pages have reduced visibility in Google’s keyword search results, Google felt that the Supplemental label was attracting undue attention. In any case, there would be greater efforts to reduce the disparity between the Google regular index and the Google supplemental index.

Danny Sullivan and many others expressed concern about this move. However there was still a loophole as Danny mentioned:

At the moment, if you want to force the labels to show up, doing a search for [site:domain/&] is a tip that came out of WebmasterWorld this week, and that still seems to be still working.

(Tip of the Hat to Halfdeck who mentioned this in a Cre8asite Forums discussion on this topic).

It now appears that the loophole has been closed. A Google search for site:www.domain.com/& shows exactly the same number of web pages as the recommended Google search for site:www.domain.com/. In other words, both show the total number of web pages that Google has indexed in both the regular index and the supplemental index.

For the moment, the other trick that Halfdeck mentioned seems still to be working. A search for site:www.domain.com/* still seems to give the number of web pages that are in the regular index. Comparing this with the total number of web pages indicated by the regular site:www.domain.com/ can give an indication of what percentage of the total web pages on the domain are in the regular index. For established websites this is often above 75%. For new websites on the other hand, less than 20% of web pages seem to be in the regular index if this test is valid.

Is this a reasonable estimate? If so, how long will Google leave this peephole open? Time will tell.

Related:
Supplemental Result in Google – Hell or Help
Google Supplemental Label Out, PageRank Next?

Should Google Have Smarter Robots?

 
An Open Letter To Matt Cutts On Robots

Dear Mr. Cutts,

I have two purposes in writing. First I would like to offer my support on the suppression of the Google Supplemental Results label. Secondly I would like to offer a simple suggestion that perhaps could reduce the concerns that some webmasters have on the Google Supplemental Index.

On the Supplemental label suppression, although it provided some information it really was too crude a measure. Anyone who needed such an imprecise signal of weak performance would likely not be very effective in dealing with it. The Supplemental Index was introduced for computational reasons to provide the best balance between speed of computation and relevancy of results, at least in Google’s estimation. It may appear to separate the sheep from the goats but this is only a problem if one of your sheep looks too much like a goat.

Where this has turned out to be a problem is with blogs. A recent post by Michael Gray, How WordPress Makes Comments SEO Unfriendly, points out how this can happen. As he said:

I love WordPress I really do. It makes it really easy to publish, however the WordPress developers really need some help sometimes. It seems when there is a choice to make things SE friendly, more often than not they make the worst choice possible.

The big issue he is describing is that blogs produce RSS news feeds as well as blog postings. There is a certain duplication of content between these and that can be a trigger to designate Web pages as goats. Goats are housed in the Supplemental Index and tend to be less visible for keyword searches. There’s the problem.

.. and the solution is .. Like any other competent SEO, my instinctive reaction is that if Google has a problem, then it’s up to me to find the solution. Of course the natural answer is an appropriate robots.txt file that will block the Google robots so that they only see one copy of any content.

It then struck me that the blog postings and the RSS news feeds are prepared for human beings and both have value. If anyone has a problem, should it be all those bloggers or should it be Google? If we assume it is Google’s problem, is there any obvious solution.

Once my mind was thinking in this direction, a possible solution did come to mind. I apologize, Mr. Cutts, if there is an obvious flaw in what I am about to propose but I felt it was worth bringing to your attention.

What triggered my thoughts was a post you wrote in April 2006. You were explaining with pride, quite rightly, that Google with its crawl caching proxy was reducing the load on websites through visits from your spiders. You had the following diagram to explain the functioning:
Googlebots
Although different Google services would have different Googlebots, any given one would likely use the cached version of the web page if it were reasonably recent. At the time that sounded a great idea. Presumably those cached versions would reside in the regular index unless deemed to be goats and assigned to the Supplemental Index. Unless the diagram is misleading, there is no suggestion that different Googlebots would deal with cached versions that were segregated in some way. Of course images would be handled in their own database (Index) but that is a clear distinction since it deals with non-text content.

If I’m understanding correctly, we now have a somewhat paradoxical situation. The regular Google keyword search deals with standard HTML or equivalent web pages. The Google Blogsearch deals only with RSS news feeds. So in applying these algorithms two quite distinct sets of entities are examined. On the other hand it would seem that all these entities are held in the same database and may be assigned either to the regular index or to the supplemental Index.

If this is a correct ‘big picture’ view, then that leads to my suggestion on smarter robots. In fact it’s only a small increase in smartness. Since Blogsearch and the regular search deal with quite different entities, why not segregate the work of the robots. Some would deal only with news feed type files: others would deal with regular web pages. By keeping them in separate databases, the problem of duplication between feeds and web pages would be avoided.

I hope this suggestion is of value. If it is not, then an explanation of the flaw in the argument may help us all understand better how the Googlebots are behaving.

Respectfully submitted,

Barry Welford

Related: Google Supplemental Label Out, PageRank Next?

Google Supplemental Label Out, PageRank Next?

Which item published by Google is:

  • Followed by webmasters with feverish interest
  • Can cause intense depression
  • Is linked to the visibility of web pages in SERPs (Search Engine Report Pages)
  • Receives more attention than it deserves

If you guessed the Toolbar PageRank, as symbolized by that little horizontal thermometer (values from 0 to 10), then you’re right. Clearly the publishing of such an item would seem to be of questionable value to Google given the anguish it creates among clients. You would imagine the powers that be would be deciding when was the most appropriate time to cease publication.

Google apparently sees this in a different light for the moment. They do publish something else that is somewhat similar although certainly a number 2 choice as an answer to the question. You may not even be aware of it. In the SERPs you will occasionally see a web page result that has the words ‘Supplemental Result‘ added. That label is also an answer to the question. Given the concerns, Google has now taken action and as of this morning the label no longer appears. That doesn’t mean your web pages are no longer in the Supplemental Index. It just means Google won’t tell you whether they are or not. Danny Sullivan for one regrets its disappearance and would still like to be able to get the information in some way.

The big question remains. Does Google have an equal concern about that Toolbar PageRank? By the same logic, it should. There’s even more furious and inappropriate anguish and reporting on that. However a different answer may apply. Publishing or not publishing the Toolbar PageRank will not be decided by the Google technocrats. More likely it will be decided by the Google Marketing group. Like the ‘I’m Feeling Lucky Button’, it’s become part of the Google brand. The Supplemental label had no such redeeming quality. Since you can’t afford to mess with a winning brand, I guess the Toolbar PageRank is safe.

Related:
Supplemental Result in Google – Hell or Help
Google Supplemental Results Index – A Word To The Wise

Supplemental Result in Google – Hell or Help

Note – July 31, 2007

Google has announced that it will no longer show the Supplemental Result label in the SERPs. For details see Google Supplemental Label Out, PageRank Next?.

Most of the content of this post is still relevant although the specific tests to identify which posts are in the Supplemental Index no longer apply. Perhaps Google will heed the calls to identify Supplemental Results via Google Webmaster Central.

Google Hell was an evocative term when Jim Boykin coined it in 2005 to describe the Google Supplemental Results Index. Many webmasters earlier that year had been devastated by the Florida update when their Google generated website traffic in some cases had disappeared. Paranoia was everywhere. Google Hell struck a responsive chord and since then it’s been a term to make webmasters shudder. Even this year, Aaron Wall is still offering good advance on how to escape from that Hell.

It’s a strange term to associate with a company like Google, with its slogan ‘Do No Evil‘. Mark Cutts, one of the best known Google bloggers, has recently suggested that supplemental results are not all that bad and will likely be getting better. Here is what he wrote:

As a reminder, supplemental results aren’t something to be afraid of; I’ve got pages from my site in the supplemental results, for example. A complete software rewrite of the infrastructure for supplemental results launched in Summer o’ 2005, and the supplemental results continue to get fresher. Having URLs in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a URL is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past. The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit). I think going forward, you’ll continue to see the supplemental results get even fresher, and website owners may see more traffic from their supplemental results pages.

Perhaps it’s the time to set the record straight on Google’s supplemental results. People have leapt too quickly to vilify Google for putting some results in their Supplemental Results Index. There is a silver lining to this Google Hell. Let me explain.

The True Nature Of Google’s Supplemental Results

First we should acknowledge that at first sight it is somewhat surprising that Google has split all web pages into two separate databases. This is not a strategic decision but rather a very practical operational decision. Let us remember that Google’s mission here is to produce relevant results fast when people are doing keyword queries. Google is not applying some moral judgement to each web page as it is assigned to one database or the other.

Web pages are being created every second at a staggering rate. This explosive growth rate is now accelerating as blogging and social media such as MySpace and YouTube become more and more popular. It would be an impossible operational process to give the same attention to every new web page that is created. So web pages that are more likely to come up as most relevant for particular keyword queries get more attention. These are registered in the main database. This means that these web pages are spidered on a regular basis to ensure they still exist and that they still have the same content. In a sense Google separates the sheep from the goats.

Web pages that are less likely to feature in responses to keyword queries are registered in the Supplemental Results Index. There may be two reasons for that:

  1. For common keyword queries, there are other web pages that are deemed by Google to be more relevant.
  2. The keyword query for which they are the most relevant web page is very rarely posed by searchers.

Note that the supplemental result web pages do appear if they are the most relevant for a given keyword enquiry. Supplemental results are not permanently hidden or buried.

The Silver Lining Of Supplemental Results

Silver lining in the Google Hell

The division of all web pages into two distinct databases is likely done for operational reasons. Whether we like it or not that is the way Google has arranged matters. This division does help us understand how Google is assessing the individual web page. With both the Yahoo! and the MSN/Live search engines, it is not possible to know how valuable a given web page may be.

With all three engines, it is relatively easy to check whether a given web page is indexed. If a search is done for relatively long patch of text from the web page, then the search engine should produce only that web page as the result of the query.

Is this web page a sheep or a goat?

For Yahoo! and MSN/Live, the result will show whether it is included or not: it’s a simple Yes or No. It gives no indication whether the search engine thinks the page is valuable or worthless. The result for the Google search is much more informative. It may turn out that the web page is shown as a supplemental result. So the web page is less valuable in Google’s view. Either other similar web pages are viewed as more valuable, or it is very rare that a search will be looking for this particular web page.

If the web page is a ‘goat’, improve the quality.

If the web page is rightly assessed as being rarely searched for, then there need be no concern if it turns up in the supplemental index. If the web page should be visible in keyword queries, then being in the supplemental results index is a clear Call to Action. Good advice on what to do is easily found. In short, look to improve the content and improve the inlinks (or in Google-ese the back links) to the web page.

Evaluating A Web Site

Checking a single web page helps to understand the principles involved, but what is a practical way to evaluate a whole website. This can be done by using the Google site: search. This section is written given the way such searches function at the date of writing this post. Changes are always possible in the way Google does such searches and in general there is no announcement on when this may happen.

If the search is done for site:www.mydomain.com, then this will show a listing of all the web pages indexed by Google. At the foot of the results, you will find the following:

In order to show you the most relevant results, we have omitted some entries very similar to the N already displayed.
If you like, you can repeat the search with the omitted results included.

Clicking on the link to include the omitted results shows all results. This listing shows all the web pages in the main index first followed by the results in the supplemental results index. These will appear on the later search result pages.

For example, the SMM Strategic Marketing Montreal website by this analysis has currently 345 web pages in the main index and 8 web pages in the supplemental results index. There is no duplicate copy effect here, it is just that presumably the final 8 have less PageRank.

As another example, one of our clients with an e-store as part of the website shows the following statistics. There are 397 web pages in the main index and 235 web pages in the supplemental index. It is likely that this occurs because there is a great deal of similarity (duplication) between web pages in the supplemental index and those in the main index.

Such situations are not a cause for alarm. If there are a satisfactory number of web pages in the main index, and if these are suitably visible in keyword queries, then Google will deliver a good volume of traffic to the website.

A More Precise Listing Of Supplemental Index Web Pages

Google searches giving numbers such as the site: search will usually give slightly varying numbers. The number of web pages in the Supplemental Results Index as calculated in the previous section will usually be an under-estimate. The following method will give a more precise estimate. Again be aware that this method may cease to work at some time but does work at the time of writing. The results are given by a search for the expression shown on the next line:
site:www.mydomain.com *** -dxdxdx(anynonsensetext)

For the SMM website, this search lists 22 web pages in the Supplemental Results Index (as compared with the 8 shown in the previous analysis). For the e-commerce website, this search lists 323 web pages in the Supplemental Results Index (as compared with the 235 shown in the previous analysis). If this count seems too high or if there are important web pages in the list then again this should be a Call to Action for corrective measures.

Conclusion

Web pages registered in the Supplemental Results Index have not been lost for ever in some Google Hell. Indeed they will appear in some keyword queries. However if they are important pages then this should serve as a Call to Action to improve their quality and their inlinks.

Related: Google Supplemental Results Index – A Word To The Wise
View All Your Google Supplemental Index Results