SEO Gets Simpler In 2009

During the last 12 months at least, some SEO clients have apparently been paying sizeable fees for SEO (Search Engine Optimization) work that was completely ineffective.  That really is the bottom line on a development that Danny Sullivan describes in a post entitled, PageRank Sculpting Is Dead! Long Live PageRank Sculpting!

Earlier this month, Google’s Matt Cutts sent a shockwave through the advanced SEO community by saying that site owners could no longer perform “PageRank sculpting” using the nofollow tag in the way they’d previously thought.

Google helped advanced the notion of using nofollow to flow PageRank. No one was forced to do it; no one is being punished that it might no longer work. But Google did help put it out there, and that’s why it should have spoken up sooner when it took nofollow out as a sculpting tool. Instead, it said nothing about the change that happened sometime from May 2008 or earlier.

.. and why did Google not spill the beans earlier:

At first, we figured that site owners or people running tests would notice, but they didn’t. In retrospect, we’ve changed other, larger aspects of how we look at links and people didn’t notice that either, so perhaps that shouldn’t have been such a surprise. So we started to provide other guidance that PageRank sculpting isn’t the best use of time.

Danny Sullivan then provides the following view on how it all now works:

Google itself solely decides how much PageRank will flow to each and every link on a particular page. In general, the more links on a page, the less PageRank each link gets. Google might decide some links don’t deserve credit and give them no PageRank. The use of nofollow doesn’t ‘conserve’ PageRank for other links; it simply prevents those links from getting any PageRank that Google otherwise might have given them.

I was moved to write this post today by an e-mail message I received from Dan Thies of SEO Fast Start.  He revealed that he was in two minds as to whether he should roll back the Site Structure chapter in SEO Fast Start, and basically go back to what he was teaching in 2006. He has some concerns about what some folks have done with "PageRank sculpting" since he has seen more mistakes than good implementations with that technique.

Dan Thies is the author of a free 97-page ebook, SEO Fast Start 2008, that you can download. Interestingly the preface includes the following:

The 2008 edition is not much different from last year’s – because if you’re not trying to game the search engines, very little changes. Heck, if you had a copy of the November 2001 edition, your site wouldn’t exactly burn down if you followed what I wrote then.

It appears that once more that French phrase applies that translates as, The more it changes, the more it stays the same.  I still believe that the word PageRank is being used in at least two senses in these discussions.  Underlying this is the basic PageRank measure that applies to all URLs as suggested in the PageRank Calculation – Null Hypothesis.  Thereafter that basic value is used within the keyword search algorithms in a modified way as Danny Sullivan pointed out above. 

Even if PageRank sculpting is no longer the hot topic it was, there are still key issues to address with the most important being to Avoid Duplicate Content Problems.

Other resources you may find helpful are:

  • 7-Minute SEO Guide by Andy Beal
  • SEOmoz Beginner’s Guide to SEO
  • Search Engine Optimization: An Hour a Day (Book via Amazon)

Never forget that of course SEO is only part of the answer to getting better business results online.  You really have to start off with the right strategy and follow through to good usability of the website and making the sales. The series of articles on Marketing Right Now is a useful primer on the total process.

Hanging / Dangling Web Pages Can Be PageRank Black Holes

Summary

Search Engine Optimization (SEO) for blogs is often not done effectively and posts rank below where they should be in keyword searches.  One particular problem can be hanging/dangling web pages created by the blogging software coupled with inappropriate use of robots.txt files and tags.  Such hanging web pages can act as sinks or black holes for PageRank, a key factor in the Google search algorithm.  This article provides a simple explanation of the issues involved and appropriate solutions.

Introduction

"You are creating hanging/dangling pages", wrote Andy Beard in a recent comment on a post on Avoiding WordPress Duplicate Content. After an e-mail exchange, I could understand his concern.  It is a potential problem that robots.txt files could create.  As Andy wrote some time back, it is one of the SEO Linking Gotchas Even The Pros Make. 

More recently, Rand Fishkin has pointed out that you should not Accidentally Block Link Juice with Robots.txt.  Rand advised doing the following:

  1. Conserve link juice by using nofollow when linking to a URL that is robots.txt disallowed
  2. If you know that disallowed pages have acquired link juice (particularly from external links), consider using meta noindex, follow instead so they can pass their link juice on to places on your site that need it.

Link juice is just another term for PageRank.  This PageRank value for any web page is an important element in how well it will rank in any keyword search.  It may be one of over 100 factors but it probably is the most important in the Google keyword search process. Avoiding losing PageRank that a web page could amass is an important function that SEOs should pursue.

After doing some research, it turns out to be a somewhat more complex issue requiring an understanding of some weighty articles.  Anyone involved in doing SEO or hiring an SEO consultant should be aware of the potential problem to ensure things are done correctly.  I also realized that there was no simple explanation of the issues so this post will attempt to rectify that omission.

Research on Hanging / Dangling Web Pages

If you want to do some of your own research, before checking out the later explanations, I found the following useful:

  • Dangling Pages – WebProWorld SEO Forum
  • What Do SEO/SEM People Put In Robots.txt Files? – Shaun Anderson
  • WordPress robots.txt SEO – AskApache Web Development
  • Internal Linking – META nofollow, rel nofollow, robots.txt Confusion thereon – Josh Spaulding

Of course with search engine algorithms, things are always in evolution.  The official word on the Google website gives the following information on rel="nofollow".

How does Google handle nofollowed links?

We don’t follow them. This means that Google does not transfer PageRank or anchor text across these links. Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap. Also, it’s important to note that other search engines may handle nofollow in slightly different ways.

That lead to the practice of PageRank sculpting, whereby people try to manage how PageRank is distributed among the web pages in a website.  More recently Matt Cutts of Google in a Q&A session at SMX Advanced 2009 in Seattle, WA, provided the current thinking on nofollow as recorded by Lisa Barone:

Q: It seems like you supported PageRank sculpting a year ago and now it seems like you don’t support it anymore. Why is that and will it become a negative indicator?

A: No, it won’t hurt your site. You can do your links however you want. You can use it to eliminate links to sign in forms and whatnot, but it is a better use of your time to fix your site architecture and fix the problem from the core. Suppose you have 10 links and 5 of them are nofollowed. There is this assumption that the other 5 links get ALL that PageRank and that may not be as true anymore (your leftover PageRank will now “evaporate”, says Matt.). You can’t shunt your PageRank where you want it to go. It’s not a penalty. It’s not going to get you in trouble. However, it’s not as effective. It’s a better use of your time to go make new content and do all the other things. If you’re using nofollow to change how PageRank flows, it’s like a band-aid. It’s better to build your site how you want PageRank to flow from the beginning.

Let us now try to pull all that together in a short number of simple explanations covering the important issues involved.

How PageRank is calculated

Google is not always completely open on what is involved in its search algorithms for obvious reasons.  The algorithms also evolve as the Q&A quote above shows.  The following is a best judgment on what is involved, but if anyone has corrections or modifications to what is shown, they are encouraged to add a comment.

The following diagram illustrates how PageRank is calculated for any web page and how fractions of the PageRank flow to and from linked web pages.  PageRank here is not the value that appears in the ‘thermometer’ in the Google Toolbar, and which goes from 0 to 10.  Instead this PageRank is the mathematical value used in the Google keyword search algorithm.  It is calculated for any web page and represents the probability that a random visitor would visit the given web page as opposed to visiting other web pages.

Here we have multiplied this mathematical value by a huge multiplier to give values that are easier to talk about.  We will use the term, PageRank factor, for this derived number.  The resulting number would normally be a value like 5.6 or 16.2 but here we have simplified yet again to round off to whole numbers.  This illustrates a typical web page (but with very few links).  Some links are external links involving other web pages on other websites (domains).  Some are internal links from web pages on the same website (domain).  The inlinks are hyperlinks on other web pages leading to this web page.  The outlinks are hyperlinks on the given web page to other web pages.

PageRank Illustration

What the image illustrates is that the PageRank factor of this web page (16) is determined by the sum of the PageRank factor contributions flowing through the inlinks.  This PageRank factor then flows out via the 4 outlinks with an equal PageRank factor contribution (4) on each link.

You can imagine this particular web page as being only one among the whole set of web pages on the Internet.  For the technically inclined, we should mention that these PageRank values all are interdependent so they are developed by a process of iteration starting with starting values and repeatedly recalculating to determine what the values are. That goes beyond the scope of this article.

How a robots.txt file changes the picture

PageRank with robots.txt file Illustration

If a robots.txt file disallows this web page for crawl visits by the search engine spiders, then provided they obey the robots.txt file, they would record the values and links shown in this image. These PageRank values are the same, whether or not the web page is blocked to crawlers by the robots.txt file. The record is indexed because there is an external inlink that the Google robots are crawling and they would also note the outlink going to another domain.  The outlinks to other web pages on the same domain (internal links) would not be recorded so these PageRank contributions are lost.  In this sense the web page has become a sink or black hole for these PageRank contributions.  They can no longer contribute to the PageRank of these other web pages.

Note that the PageRank factor values on the remaining links are the same as they were when the other links were being included. Merely saying the links should not be crawled, does not necessarily mean they should be assumed not to exist. This is in line with Matt Cutt’s most recent pronouncements.

How nofollow changes the calculation

Even if this web page was not excluded by a robots.txt file, a similar effect is created if all outlinks from the web page carry an attribute, rel=nofollow.  Again this assumes that the search engine correctly observes this attribute.  If on the other hand the links are assigned a follow attribute, then the PageRank contribution would flow through to all such links.

How to get only one web page that counts for any specific content

As Rand Fishkin suggested above, if more than one web page contains the same content, you can use a meta tag on all the secondary ones to signal noindex.  Then only the primary web page is in the search database, provided the meta tags are being observed.  Coupling this with a follow attribute in the meta tag, then assures that the PageRank contributions still flow out to the other web pages.

A better approach according to John Mueller, a Google representative, is to use a Rel=Canonical Tag rather than NoIndex.  Here is how Google describes this canonical tag.

We now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that’s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.

Apparently Google treats this as a hint rather than a standard so it is not fool-proof. Others see Reasons to use rel=canonical, and reasons not to.

Best Practices

As Matt Cutts recommended, given the wooliness in some of the above, the preferred approach is to develop the website architecture so that duplicate web pages do not arise.  Then one does not have to rely on the canonical tag or the noindex follow combination.  In this way one avoids the hanging / dangling web pages problem.

The exact methods will depend on the architecture. One very useful approach is to show only an initial excerpt on the blog Home Page with a … more link to the full post as a single web page. For category or tag archive pages, you can show only the titles of items so this again avoids the duplicate content problem. The important thing is to be vigilant and look out for essentially duplicate web pages as revealed by a full website scan using the equivalent of a search engine robot such as Xenu.