The Irony Of Google Slapping Its Own Wrist Over Chrome Paid Links

The Search world is all a-twitter with the news that the Google Spam team has downgraded the search rankings for the Google Chrome group because their actions resulted in bloggers being paid to write posts that included links to Google Chrome web pages. That is in violation of the Google Quality Guidelines.

Continue reading

Technorati Tags: , , , , ,

Higher Search Engine Rankings Without A Home Page

Most websites receive the greatest proportion of their visitors from search engines.  Having a high ranking in Search Engine Results Pages (SERPs) is therefore a priority.  The factors that are important in this are fairly well known by now.  People with traditional websites with static web pages apply such methods, usually with a reasonable amount of success.

Blogs have now come on the scene and ‘out of the box’ they seem to do exceptionally well in keyword queries.  There are two reasons for this.

  1. Blogs usually create RSS news feeds, which give an instant alert to the search engines that a new post has been written.
  2. Google attaches considerable weight to the recency of new web pages. 

This is why blog pages seem to rank very highly in keyword searches, particularly in the early days after they appear.


This has resulted in a feverish interest in having blogs.  Without thinking too much about it, many have set up blogs.  Software such as WordPress make it Oh So Easy. .. and surprise, surprise Google loves blog posts.  This sounds like a no-brainer?

Even though the results are impressive, you can do even better.  However you may need to discard some of your preconceptions.  Let us explore the nature of a blog and how it performs in search engine keyword searches.

The Typical blog

If you go to visit a typical blog, then at the blog website, say www.myblog.com, you will find displayed a long scrolling Home page with several posts, usually with the most recent post appearing at the top.  In some cases, the full content of each post may be shown and in other cases you may get only a short extract with a link to read More.  This link takes you to a web page that shows only the single post you are interested in.  Some people prefer to arrange their blog in this way to avoid duplicating exactly the same content on two separate web pages.  This would mean that the search engine might serve up either web page in a keyword search when the single post web page was really more appropriate.

The simple picture below shows what a search engine holds about any blog web page.  For the Home page for example, that is displayed when you visit www.myblog.com, the search engine has the URL, www.myblog.com, the blog title, the blog meta description, the current content of the Home page and a list of back links.  Back links are the URLs of other web pages that have hyperlinks pointing to this Home page.

blog structure

The Google algorithm (and probably other search engine algorithms too) take into account those back links in determining the importance of this particular Home page.  In the case of Google, they use the term PageRank as a measure of this importance.  A large number of back links, particularly if they come from authoritative websites like the New York Times, the BBC or CNN, will mean that the Home page has a higher PageRank.  As the search engine spiders (sometimes called crawlers or robots) wander around the Internet, they register all these back link URLs.  Since many of them ‘point’ to the Home Page, this means that the Home page will have the highest PageRank.  New single blog post web pages will have few direct links so their PageRank is usually not defined for weeks or even months.

Even though the single blog post web pages may never get direct links, they can benefit from the internal links within the website.  For example that link  to read More on the Home page does confer some PageRank contribution on the single blog post web page.  According to what has been published by Google, a discount factor applies so that say only 85% of that PageRank contribution is applied to the single post web page.  So far this is all standard ‘stuff’.  Let us now begin to give some different insights on what is happening.

Note that we have signaled something different about that content on the Home page with that yellow background.  Unlike a traditional website with static web pages, the content keeps changing as new blog posts are written.   Either an extract or the full content of the latest blog post is added to the top of the web page and the oldest blog post is bumped off the bottom of the web page.  Once two or three blog posts have been added, what the search engine is holding in its database may differ markedly from what is appearing on a visit made today.  Of course if the search engine spider does recrawl the web page, then the content will be updated.  However for most blogs what the search engine is storing for the Home page will be different from what is currently being displayed.

The Typical SERP For A Keyword Query

SERP is the acronym for Search Engine Results Page, which is what the search engine displays when you do a keyword query.  When someone does a keyword query in a search engine, the search engine finds content in the database that matches the keywords.  If both the Home page content as stored and the individual blog post web page content (if stored) would be relevant, nevertheless the Home page ranks higher so will very likely be shown first or may be the only entry shown.  Although the entry was chosen as being relevant, what appears in the first line of the SERP entry is the Title of the whole blog.  This is general and probably does not refer to the keywords.  Moreover in developing the explanatory snippet, the search engine can only rely on the Meta Description for the whole blog, which is probably irrelevant, and on the content of the blog Home page as recorded when spidered.  The combination of an irrelevant title and a somewhat fuzzy snippet in the SERP is probably unlikely to attract the click of the searcher.

If the searcher does click on the entry, the blog Home page has likely changed by now and the keywords may no longer even be in the current version.  It might be thought that by going to the Cached version of the page, you may be able to see a version that includes the keywords.  However even here a caution is appropriate.  Search engines do not necessarily create a cached version of the web page on every spider visit.  The cached version may then be from an even earlier period.  In such a case, we have the somewhat anomalous situation that the cached version of the web page and the current version of the web page do not show the keywords, but the version that the search engine crawled in between the cached date and the current date did contain the keywords.  This is why the blog Home page was the item shown in the keyword query SERP.

The search engine time cycles for crawling and indexing web pages on the Internet can occasionally be measured in weeks so it is not surprising that entries in SERPs can be on occasions completely irrelevant to the keyword query.  Is there a way of correcting this situation?  It all stems from the fact that the Home page has too much authority?  Could this be reduced in some way and the resulting ‘authority’ that is freed up be spread around among other blog post pages?

The LMNHP approach

In trying to ‘flatten out’ the authority profile of a blog, no obvious solutions came to mind.  However a somewhat unorthodox approach seemed of interest, partially fuelled by the blogging approach that was being used for all the SMM blogs.  Having been frustrated by the typical blog with a number of blog posts all featured on the Home page, for some time all the SMM blogs had featured the latest post content on the Home page.  In other words, the latest post content appeared for example at http://www.staygolinks.com/. If you then clicked on the permalink that appeared in the H1 heading to the post content you would then be switched to the single post entry at http://www.staygolinks.com/latestpost.htm. This particular single post web page had only minor differences from the Home page version of this post entry.

Suddenly the light bulb came on.  Why not avoid the traditional Home page entirely and immediately switch (301 redirection) to the single latest blog post web page.  Details are given in the LMNHP post.  LMNHP is an acronym for Look Mom No Home Page.

What this means is that while this post continues to be the latest, the http://www.staygolinks.com/latestpost.htm will be treated as the URL that applies for the blog website.  Any back links, either external or internal, will be deemed to apply to that URL.  Thinking again about the earlier picture of what the search engines are registering, all items are now unchanging.  That means we have a Title and Meta Description that is appropriate precisely for this blog post content.  The content is unchanging and is the same over the long term.  The only difference is that the search engine may have been attempting to access http://www.staygolinks.com/ and is instructed by the 301 redirection to access the single post web page.  You might consider that in a sense the blog has no Home page.

Note that this is the only web page that is no longer active.  All the other web pages on the blog are active and unchanged.  All the back links are assigned to some blog web page or other.  It is uncertain how the search engines might be working with all this but so far there appear to be no surprises.

SERP Results For The SMM Blogs

It is still early days but so far the results of keyword queries are extremely gratifying.  Entries for new blog post pages are indexed and displayed rapidly.  They also are appearing with high rankings and do appear with relevant titles and descriptions.  This undoubtedly gives an SEO boost and also means that it is more likely that searchers will click on the item.  The biggest boost comes from the fact that the single post page seems to be directly assigned the back links for the domain.  For a time, this latest blog post is really working as a Home page and will only be supplanted when the next blog post is written.

Blog posts have always enjoyed an initial visibility, presumably based on some recency factor.  This is now magnified by a large number of back links that are also assigned to that URL.

A Possible PageRank Benefit

This approach undoubtedly results in a better (more even) distribution of PageRank among web pages.  No one outside Google knows exactly how the algorithms work with PageRank at this time.  The basic view is that the Home page amasses all the incoming back links to the domain.  This link-juice is then distributed among the internal web pages with some discounting applying, perhaps of the order of 15%.

In this new approach those same back links that were all directed to the Home page now go in groups directly to each of the single post web pages as they are issued.  It would seem that the discounting factor no longer apples since these links are now all external.  It has also been assumed that external links carry more weight than internal links so this may be an additional benefit.  Since Google is very guarded in whatever is said about the search algorithms, it is unlikely that these surmises can be confirmed or denied.

A Pleasant Surprise In The Tail

The results of this approach as seen in the SERPs is that better items appear that give a much clearer indication to the searchers of what they may find.  The constancy of content for blog pages fits the Google mission better since it is easier to index correctly and deliver relevant results. 

One pleasing bonus is revealed in the image below, which shows a search done prior to this current post being added to this blog.  As might be expected, a search for the previous blog title, The Google Tango, did give that blog post as the #1 entry in the SERP.  Clicking on the link took you precisely to that single post web page. 

google tango

What is intriguing is that Google still shows the domain itself as the URL for that web page, even though that is not the specific hyperlink for the entry.  However that is probably the most useful way of presenting information to the searcher.

Reblog this post [with Zemanta]

Technorati Tags: , , ,

SEO Gets Simpler In 2009

During the last 12 months at least, some SEO clients have apparently been paying sizeable fees for SEO (Search Engine Optimization) work that was completely ineffective.  That really is the bottom line on a development that Danny Sullivan describes in a post entitled, PageRank Sculpting Is Dead! Long Live PageRank Sculpting!

Earlier this month, Google’s Matt Cutts sent a shockwave through the advanced SEO community by saying that site owners could no longer perform “PageRank sculpting” using the nofollow tag in the way they’d previously thought.

Google helped advanced the notion of using nofollow to flow PageRank. No one was forced to do it; no one is being punished that it might no longer work. But Google did help put it out there, and that’s why it should have spoken up sooner when it took nofollow out as a sculpting tool. Instead, it said nothing about the change that happened sometime from May 2008 or earlier.

.. and why did Google not spill the beans earlier:

At first, we figured that site owners or people running tests would notice, but they didn’t. In retrospect, we’ve changed other, larger aspects of how we look at links and people didn’t notice that either, so perhaps that shouldn’t have been such a surprise. So we started to provide other guidance that PageRank sculpting isn’t the best use of time.

Danny Sullivan then provides the following view on how it all now works:

Google itself solely decides how much PageRank will flow to each and every link on a particular page. In general, the more links on a page, the less PageRank each link gets. Google might decide some links don’t deserve credit and give them no PageRank. The use of nofollow doesn’t ‘conserve’ PageRank for other links; it simply prevents those links from getting any PageRank that Google otherwise might have given them.

I was moved to write this post today by an e-mail message I received from Dan Thies of SEO Fast Start.  He revealed that he was in two minds as to whether he should roll back the Site Structure chapter in SEO Fast Start, and basically go back to what he was teaching in 2006. He has some concerns about what some folks have done with "PageRank sculpting" since he has seen more mistakes than good implementations with that technique.

Dan Thies is the author of a free 97-page ebook, SEO Fast Start 2008, that you can download. Interestingly the preface includes the following:

The 2008 edition is not much different from last year’s – because if you’re not trying to game the search engines, very little changes. Heck, if you had a copy of the November 2001 edition, your site wouldn’t exactly burn down if you followed what I wrote then.

It appears that once more that French phrase applies that translates as, The more it changes, the more it stays the same.  I still believe that the word PageRank is being used in at least two senses in these discussions.  Underlying this is the basic PageRank measure that applies to all URLs as suggested in the PageRank Calculation – Null Hypothesis.  Thereafter that basic value is used within the keyword search algorithms in a modified way as Danny Sullivan pointed out above. 

Even if PageRank sculpting is no longer the hot topic it was, there are still key issues to address with the most important being to Avoid Duplicate Content Problems.

Other resources you may find helpful are:

Never forget that of course SEO is only part of the answer to getting better business results online.  You really have to start off with the right strategy and follow through to good usability of the website and making the sales. The series of articles on Marketing Right Now is a useful primer on the total process.

Technorati Tags: , , , , , ,

PageRank Calculation – Null Hypothesis

Summary

The SEO world continues to be shaken by the hints offered by Matt Cutts last week on the nofollow tag and PageRank Sculpting. Many seem shaken but perhaps people have made assumptions about PageRank that are not true. It is difficult to prove how things really work since Google remains cagey on what really happens. Here we offer a Null Hypothesis, a simple explanation, that people may wish to consider. This should only be replaced by a more complex view, when that can be proven to be better.

Introduction

Andy Beard is a keen observer of the Google scene and poses the key question following the Matt Cutts revelations last week: Is PageRank Sculpting Dead & Can Comments Kill Your PageRank?

Has Google in one quick swipe removed all benefit of Dynamic Linking (old school term) or PageRank sculpting (when it became “trendy”), and potentially caused massive penalties for sites nofollowing links for user generated content and comments?

Important articles he cites on this are:

Not everyone is so concerned and Andrew Goodman frankly states, PageRank Sculpting is Dead? Good Riddance.

How Is PageRank Calculated

The Google website is strangely obscure on how PageRank is calculated. There are a few explanations in Google Answers (no longer supported) by others on such issues as My Page Rank and Page Rank Definition – Proof of convergence and uniqueness. Phil Craven offers a reasonably straightforward account in Google’s PageRank Explained and how to make the most of it. That includes the following:

Notes:
Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites.

The KISS principle (Keep It Simple, Sweetheart)

The problem in trying to estimate how the Google PageRank process works is that PageRank is only one of over 100 or more factors involved in how web pages rank in keyword searches. Any attempt to prove a given assumption about PageRank involves a typical statistical analysis where one tries to infer an explanation from somewhat fuzzy data. That is where the KISS principle comes in. In such a situation, we should perhaps rely on the approach favored by some great minds.

Of two competing theories or explanations, all other things being equal, the simpler one is to be preferred.
Occam of Occam’s Razor
A scientific theory should be as simple as possible, but no simpler.
Albert Einstein
The Null Hypothesis is presumed true until statistical evidence indicates otherwise.
Sir Roland Fisher

Given the Matt Cutts suggestions that some find perplexing, what is the simplest explanation of the Google PageRank process that could explain what is involved?

The Null Hypothesis

What the PageRank process attempts is mind-boggling. It aims to attach a PageRank value to any hyperlink in the total space of web pages and their inter links. It involves an iterative calculation since the values are interdependent. This PageRank can be considered as the probability that a random surfer (human or robot) will pass down that link as compared with all the other links in the total space (graph).

As Matt Cutts reminded us last week, even if a web page is excluded by its owner using a robots.txt file, it may still get into consideration if it is included in a sitemap file or has a link coming in from another external web page. Having an extra indicator on each link indicating whether it is influential in passing PageRank would increase the complexity of the data enormously. Given that, we propose the following Null Hypothesis (See foot of article for definition of this expression). This of course could be abandoned in favor of a more complex explanation if that could be proven statistically with sufficient confidence or if Google chose to provide a more correct explanation of what is done.

The Null Hypothesis runs as follows. The whole process splits into two phases. The first phase looks at all web pages (URLs) and all associated links to determine the PageRank of each web page and thus the contributions that would flow down each link. There are no exclusions and this calculation process handles all URLs in the total Internet space (graph). Modifiers that website owners may have applied such as robots.txt files or tags such as noindex, nofollow, etc. do not get involved at this stage. For all URLs and links without exception, values such as those illustrated below would be calculated.

PageRank Chart

The second phase of the process involves how these PageRank values are then used within the search algorithms. Here whatever is specified via robots.txt or nofollow tags would apply. The PageRank contribution from nofollow-ed links thus would not be included in the calculation. Also any filtering factors that Google may wish to apply for bad neighborhoods, etc. would only apply in this second phase. The underlying web pages and links would still have a first-phase PageRank calculated but this would in no way influence the second phase results.

This two-phase approach would seem to square with what we have been hearing recently. It is offered very much as a Null Hypothesis, so if someone has an Alternative Hypothesis, we look forward to hearing it. Over to you.

Implications of this Null Hypothesis

In the mean time, if this explanation is true, some obvious considerations apply. The basic PageRank calculation is determined by the total set of URLs and links. The PageRank value for a URL is never changed by modifications that apply in the second phase. All that happens in the second phase is that some of the PageRank contributions are ignored. So the effective PageRank that a URL has in the second phase is always less than or equal to what it had in the first phase. A URL can be more prominent solely because others become less prominent.

Significant changes can only come about through influences that affect the first phase. These relate to the site architecture rather than the interlinkages. That is after all what Matt Cutts was recommending.

Footnote:
See this link for an explanation of Null Hypothesis.

Technorati Tags: , , ,

Forget We Ever Mentioned A Supplemental Index – Google

It shouldn’t really happen to a nice company like Google. You try to do someone a favor and it blows up in your face.

A few years back, they realized that it would be difficult to give a speedy response to a search query if they had a single database of all the web pages they were spidering. So they decided to put web pages that might come up more frequently in search queries in their regular database. Other less popular web pages they would put in a supplementary or secondary index. By this means, they could keep cataloguing all the web pages they could find and still deliver fast results to most key word searches by using the regular index. Technically it was the right solution. The mistake was they told people about it. At the time the approach seemed a positive move.

Move on to 2005, and this two-index system begins to upset a lot of people. With the explosive growth of the Internet, it is impossible to put the majority of all web pages in the primary index. Since the primary index is spidered more frequently and its web pages are more likely to appear in keyword searches, you can understand why people got upset. Of course if Google hadn’t mentioned the supplemental index, then people would never have known of this possible problem.

The whole issue has become a can of worms for Google. A great many people were upset. SEO (search engine optimization) experts worked hard to figure out how to keep web pages out of the supplemental index. Google has tried to lower the temperature on this topic by reducing the differences between the two indexes (the regular index and the supplemental index). In mid-December, Yonatan Zunger of the Google Search Quantity Team reported on progress.

We improved the crawl frequency and decoupled it from which index a document was stored in, and once these “supplementalization effects” were gone, the “supplemental result” tag itself-which only served to suggest that otherwise good documents were somehow suspect-was eliminated a few months ago. Now we’re coming to the next major milestone in the elimination of the artificial difference between indices: rather than searching some part of our index in more depth for obscure queries, we’re now searching the whole index for every query.

From a user perspective, this means that you’ll be seeing more relevant documents and a much deeper slice of the web, especially for non-English queries. For webmasters, this means that good-quality pages that were less visible in our index are more likely to come up for queries.

You might have hoped that would satisfy searchers. However Barry Schwartz of SERoundtable felt that the announcement drove only more confusion.

I’ll call Google out on this one, and I rarely do.

Google, we need you to stop hiding this index from us. We really need an explanation of what this index does, why a page would be placed in the supplemental index. When Google actually searches it? In what examples would a page in the supplemental index rank better than a page in the main index?

The confusion over the supplemental index has gone on too long.

Andy Beard was equally concerned that his method of identifying web pages in the supplemental index no longer seemed to be working. Apparently he really would like to know which web pages are still in the supplemental index.

Google may well be upset that people do not seem to be accepting its explanation of the “closeness” of the two indexes. On this one I agree with Google. People seem to be fixated on the notion of the supplemental index, as if this was an important issue in the keyword search algorithms.

It’s interesting to compare this with another Google invention, which used to be a hot topic and is now a yawn topic for most SEO keyword searchers. That’s the Google Toolbar Page Rank indicator. It may well be broken and is possibly only kept around for marketing reasons. It has almost zero connection now with how web pages are ranked in keyword searches.

So people, let’s get over it. Forget about that supplemental index and work on the more important things that make a web page memorable, authoritative, trust-worthy and ultimately search-engine visible. You’ll get much better rewards for your efforts.

Related:
Supplemental Result in Google – Hell or Help – March 31st, 2007
Google Supplemental Results Index – A Word To The Wise – July 9th, 2006
Google Supplemental Label Out, PageRank Next? – August 1st, 2007

Technorati Tags: ,

URLs – Human-Friendly Or Robot-Friendly?

 
To WWW
Or Not
To WWW.

Many websites will find at least half of their traffic comes because someone has done a Google search. Sometimes it’s even higher than that. So if there is a conflict between what human beings prefer and what search engine robots prefer, which should you favour? This puzzle was graphically illustrated by two blog posts that appeared in the last 48 hours.

Today on the side of the humans, we had perhaps naturally Seth Godin. He was discussing URL Hygiene. He believes URLs are for humans. He particularly likes the advice that is given on Aaron Goldman‘s goodURLbadURL website. Here are some key points:

URL Best Practices
Do’s

3. Whenever possible, use YourBrandName.com.

7. Use subdomains when driving people deeper than your homepage – e.g. Product.YourBrandName.com.

Don’ts
1. Don’t include www. We know to go to the World Wide Web to find you.

The previous day, Matt Cutts of Google had blogged about Subdomains and Subdirectories. In a sense he is speaking for the robots, because Google wants to make sure the robots will see what humans see. His advice yesterday would encourage web designers to use Subdirectories rather than Subdomains. That now goes quite counter to the 7th Do above.

On the much bigger question of whether to WWW or not to WWW, Google does not take a position. The only point they would recommend highly is to be consistent in using one or the other. Using Google Webmaster Tools in fact, you can specify whether you prefer them to index www.mydomain.com or mydomain.com.

The reason why this is important is that if both exist in the Google index, then each will be less visible than if only one of them was indexed. That visibility is created by other websites that have links to the website in question. A summary measure of this is the so-called PageRank. This is a fundamental factor in Google’s algorithm, which ranks Web pages in keyword searches. If both versions (the WWW version and the non-WWW versions) are used indiscriminately, then some links will point to one and other links will point to the other. Standardizing on one ensures the maximum PageRank and thus the maximum visibility in keyword searches for the website.

Which is better, the WWW version or the non-WWW version? If you follow Seth Godin and Aaron Goldman, you’ll go with the human-friendly URL and use the non-WWW version. If you’re trying to be friendly to the robots, that’s a tougher question. It all depends on those webmasters out there who may provide links to your website. The WWW version is much the more popular way of handling URLs so many of those links will point to that version. If you want to make sure that more of them get it right, then you’ll join the WWW camp.

Technorati Tags: , , ,

Google Tries To Close Pandora's Box

Google’s PageRank is now a flawed concept

When Sergey Brin and Lawrence Page were working on Backrub, which became Google, almost 10 years ago, their PageRank concept seemed reasonable. Links to web pages were almost like votes. If it had only been used in the halls of Academia, perhaps it would have continued to work. However applying it to all the information in the world was really opening a Pandora’s box. With Google’s success, it became a key way of generating visitor traffic to websites. The natural consequence was that everyone tried to develop as many links to their websites as they could. The resulting chaos is not something that is easily reversed.

Despite the Herculean effort required, Google is trying to put the lid back on the box. As Andy Greenberg of Forbes noted last week their actions are scaring the search experts.

Google, for online businesses, has the impact that Alan Greenspan once had on the financial markets. … Web site administrators for major sites including the Washingtonpost.com, Techcrunch, and Engadget (as well as Forbes.com) found that their “PageRank” – a number that typically reflects the ranking of a site in Google (nasdaq: GOOG ) results for key search terms – had dropped precipitously according to the Google Toolbar, a software program that shows Google’s assessment of a website.

The experts are all chiming in on this hot topic. Donna Fontenot and Rand Fishkin both note that hundreds of websites are being affected. Barry Schwartz suggests that it is websites offering paid links to other websites that are being affected. Andy Beard points to other websites that are affected where this explanation does not apply. This PageRank adjustment is not part of a universal PageRank revision. Rather it would seem to be a manual adjustment for some fairly significant websites. The PageRank has been reduced by 2 on the 10-point scale.

The whole exercise would seem to be one where Google cannot win. Even if it were done to try to remove some of the artificialities created by the PageRank concept, it is alienating some of its most staunch supporters. Most often it is being seen as a policy driven by only bottom line considerations, so as to promote its own AdSense program at the expense of competing advertisers.

I think Robert Scoble has it right when he says that Google PageRank’s Been Dead for Quite Some Time. It is highly probable that links should not play a big part in any keyword search algorithm given the distortions that PageRank has created on the Internet. Google by now has a wealth of information on what makes the most relevant answers for searchers. Perhaps it’s time for Google 2.0 as Google moves into its second decade. This could be based on some completely new concept that gives better relevancy to answers for searchers’ queries.

Related:
Official: Selling Paid Links Can Hurt Your PageRank Or Rankings On Google
Google Supplemental Label Out, PageRank Next?
Emperor Google Has No Clothes

Technorati Tags: ,

Emperor Google Has No Clothes

Are Paid Links Google’s Achilles’ Heel?

We may be mixing metaphors here, but both are illustrative of what is going on in Search Marketing Land at the moment. There is intense and heated debate about Google’s strong suggestion that paid links should carry a no-follow attribute.

In other words if you pay to have another website link to your own website then that link should carry a no follow attribute. This would mean that such links in no way affect the ranking of web pages in the Google keyword query search process. Jennifer Laycock has described this as idiocy. Google has even suggested that the Federal Trade Commission (FTC) should rule on how such links are designated, since to Google they are clearly advertising. Dan Thies somewhat archly has welcomed the intervention of the FTC, if such should happen. Given the huge economic implications of any decision, such powerful advocacy is not at all unexpected.

Standing back from the fray, and looking at a longer-term perspective, it is not surprising to see what is happening. When Sergey Brin and Lawrence Page were working on Backrub, which became Google, their fundamental paper on The Anatomy of a Large-Scale Hypertextual Web Search Engine clearly presages the present conflict. The following quote is the key:

Another big difference between the web and traditional well-controlled collections is that there is virtually no control over what people can put on the web. Couple this flexibility to publish anything with the enormous influence of search engines to route traffic and companies which are deliberately manipulating search engines for profit become a serious problem. This problem has not been addressed in traditional closed information retrieval systems. Also, it is interesting to note that metadata efforts have largely failed with web search engines, because any text on the page which is not directly represented to the user is abused to manipulate search engines. There are even numerous companies which specialize in manipulating search engines for profit.

The solution they suggested was to use the set of hyperlinks from other web pages pointing towards any given webpage as a measure of the importance of that webpage. It has similarities to the way in which academic papers are often evaluated by the number of citations in other academic papers. Their own Backrub paper has a long list of such citations. Perhaps not surprisingly it does not include one such by the father of the Internet, Sir Tim Berners-Lee, which had appeared a few months prior to their own. (Thanks to Ruud Hein for this reference.) Entitled Links and Law, its Abstract includes the following:

Normal hypertext links do not of themselves imply that the document linked to is part of, is endorsed by, or endorses, or has related ownership or distribution terms as the document linked from.

Brin and Page were taking a diametrically opposed position.

That was all in 1997. Now in 2007 we see how the world has evolved. Google is enormously successful as a search engine. It puts high-value on hyperlinks to web pages. Everyone reacts accordingly. With only a minor change their Backrub quotation still makes sense:

Also, it is interesting to note that using backlinks as a measure of authority has largely failed with web search engines, because such backlinks can be abused to manipulate search engines. There are even numerous companies which specialize in manipulating search engines for profit.

What is the appropriate solution for search engines now. Jason Calacanis with his Mahalo is suggesting that algorithms will not provide the answer and human judgment must be involved. On the other hand Google has enormous resources and collects massive amounts of data about actual keyword queries and how searchers react to the answers they receive. As a current thread in the Cre8asite Forums discusses, perhaps Google will ease back on its attempts to remove “spammy” hyperlinks and begin to mine the rich performance data it has on the actual keyword search process. It can’t happen soon enough.

Technorati Tags: ,

A Small Peephole On Google Supplemental Results

Don’t Worry, Be Happy: Google indexes all web pages.

At the end of July, Google announced that they would no longer identify Web pages in the Supplemental Index. Although such pages have reduced visibility in Google’s keyword search results, Google felt that the Supplemental label was attracting undue attention. In any case, there would be greater efforts to reduce the disparity between the Google regular index and the Google supplemental index.

Danny Sullivan and many others expressed concern about this move. However there was still a loophole as Danny mentioned:

At the moment, if you want to force the labels to show up, doing a search for [site:domain/&] is a tip that came out of WebmasterWorld this week, and that still seems to be still working.

(Tip of the Hat to Halfdeck who mentioned this in a Cre8asite Forums discussion on this topic).

It now appears that the loophole has been closed. A Google search for site:www.domain.com/& shows exactly the same number of web pages as the recommended Google search for site:www.domain.com/. In other words, both show the total number of web pages that Google has indexed in both the regular index and the supplemental index.

For the moment, the other trick that Halfdeck mentioned seems still to be working. A search for site:www.domain.com/* still seems to give the number of web pages that are in the regular index. Comparing this with the total number of web pages indicated by the regular site:www.domain.com/ can give an indication of what percentage of the total web pages on the domain are in the regular index. For established websites this is often above 75%. For new websites on the other hand, less than 20% of web pages seem to be in the regular index if this test is valid.

Is this a reasonable estimate? If so, how long will Google leave this peephole open? Time will tell.

Related:
Supplemental Result in Google – Hell or Help
Google Supplemental Label Out, PageRank Next?

Technorati Tags: ,

Google Supplemental Label Out, PageRank Next?

Which item published by Google is:

  • Followed by webmasters with feverish interest
  • Can cause intense depression
  • Is linked to the visibility of web pages in SERPs (Search Engine Report Pages)
  • Receives more attention than it deserves

If you guessed the Toolbar PageRank, as symbolized by that little horizontal thermometer (values from 0 to 10), then you’re right. Clearly the publishing of such an item would seem to be of questionable value to Google given the anguish it creates among clients. You would imagine the powers that be would be deciding when was the most appropriate time to cease publication.

Google apparently sees this in a different light for the moment. They do publish something else that is somewhat similar although certainly a number 2 choice as an answer to the question. You may not even be aware of it. In the SERPs you will occasionally see a web page result that has the words ‘Supplemental Result‘ added. That label is also an answer to the question. Given the concerns, Google has now taken action and as of this morning the label no longer appears. That doesn’t mean your web pages are no longer in the Supplemental Index. It just means Google won’t tell you whether they are or not. Danny Sullivan for one regrets its disappearance and would still like to be able to get the information in some way.

The big question remains. Does Google have an equal concern about that Toolbar PageRank? By the same logic, it should. There’s even more furious and inappropriate anguish and reporting on that. However a different answer may apply. Publishing or not publishing the Toolbar PageRank will not be decided by the Google technocrats. More likely it will be decided by the Google Marketing group. Like the ‘I’m Feeling Lucky Button’, it’s become part of the Google brand. The Supplemental label had no such redeeming quality. Since you can’t afford to mess with a winning brand, I guess the Toolbar PageRank is safe.

Related:
Supplemental Result in Google – Hell or Help
Google Supplemental Results Index – A Word To The Wise

Technorati Tags: ,