Higher Search Engine Rankings Without A Home Page

Most websites receive the greatest proportion of their visitors from search engines.  Having a high ranking in Search Engine Results Pages (SERPs) is therefore a priority.  The factors that are important in this are fairly well known by now.  People with traditional websites with static web pages apply such methods, usually with a reasonable amount of success.

Blogs have now come on the scene and ‘out of the box’ they seem to do exceptionally well in keyword queries.  There are two reasons for this.

  1. Blogs usually create RSS news feeds, which give an instant alert to the search engines that a new post has been written.
  2. Google attaches considerable weight to the recency of new web pages. 

This is why blog pages seem to rank very highly in keyword searches, particularly in the early days after they appear.

This has resulted in a feverish interest in having blogs.  Without thinking too much about it, many have set up blogs.  Software such as WordPress make it Oh So Easy. .. and surprise, surprise Google loves blog posts.  This sounds like a no-brainer?

Even though the results are impressive, you can do even better.  However you may need to discard some of your preconceptions.  Let us explore the nature of a blog and how it performs in search engine keyword searches.

The Typical blog

If you go to visit a typical blog, then at the blog website, say www.myblog.com, you will find displayed a long scrolling Home page with several posts, usually with the most recent post appearing at the top.  In some cases, the full content of each post may be shown and in other cases you may get only a short extract with a link to read More.  This link takes you to a web page that shows only the single post you are interested in.  Some people prefer to arrange their blog in this way to avoid duplicating exactly the same content on two separate web pages.  This would mean that the search engine might serve up either web page in a keyword search when the single post web page was really more appropriate.

The simple picture below shows what a search engine holds about any blog web page.  For the Home page for example, that is displayed when you visit www.myblog.com, the search engine has the URL, www.myblog.com, the blog title, the blog meta description, the current content of the Home page and a list of back links.  Back links are the URLs of other web pages that have hyperlinks pointing to this Home page.

blog structure

The Google algorithm (and probably other search engine algorithms too) take into account those back links in determining the importance of this particular Home page.  In the case of Google, they use the term PageRank as a measure of this importance.  A large number of back links, particularly if they come from authoritative websites like the New York Times, the BBC or CNN, will mean that the Home page has a higher PageRank.  As the search engine spiders (sometimes called crawlers or robots) wander around the Internet, they register all these back link URLs.  Since many of them ‘point’ to the Home Page, this means that the Home page will have the highest PageRank.  New single blog post web pages will have few direct links so their PageRank is usually not defined for weeks or even months.

Even though the single blog post web pages may never get direct links, they can benefit from the internal links within the website.  For example that link  to read More on the Home page does confer some PageRank contribution on the single blog post web page.  According to what has been published by Google, a discount factor applies so that say only 85% of that PageRank contribution is applied to the single post web page.  So far this is all standard ‘stuff’.  Let us now begin to give some different insights on what is happening.

Note that we have signaled something different about that content on the Home page with that yellow background.  Unlike a traditional website with static web pages, the content keeps changing as new blog posts are written.   Either an extract or the full content of the latest blog post is added to the top of the web page and the oldest blog post is bumped off the bottom of the web page.  Once two or three blog posts have been added, what the search engine is holding in its database may differ markedly from what is appearing on a visit made today.  Of course if the search engine spider does recrawl the web page, then the content will be updated.  However for most blogs what the search engine is storing for the Home page will be different from what is currently being displayed.

The Typical SERP For A Keyword Query

SERP is the acronym for Search Engine Results Page, which is what the search engine displays when you do a keyword query.  When someone does a keyword query in a search engine, the search engine finds content in the database that matches the keywords.  If both the Home page content as stored and the individual blog post web page content (if stored) would be relevant, nevertheless the Home page ranks higher so will very likely be shown first or may be the only entry shown.  Although the entry was chosen as being relevant, what appears in the first line of the SERP entry is the Title of the whole blog.  This is general and probably does not refer to the keywords.  Moreover in developing the explanatory snippet, the search engine can only rely on the Meta Description for the whole blog, which is probably irrelevant, and on the content of the blog Home page as recorded when spidered.  The combination of an irrelevant title and a somewhat fuzzy snippet in the SERP is probably unlikely to attract the click of the searcher.

If the searcher does click on the entry, the blog Home page has likely changed by now and the keywords may no longer even be in the current version.  It might be thought that by going to the Cached version of the page, you may be able to see a version that includes the keywords.  However even here a caution is appropriate.  Search engines do not necessarily create a cached version of the web page on every spider visit.  The cached version may then be from an even earlier period.  In such a case, we have the somewhat anomalous situation that the cached version of the web page and the current version of the web page do not show the keywords, but the version that the search engine crawled in between the cached date and the current date did contain the keywords.  This is why the blog Home page was the item shown in the keyword query SERP.

The search engine time cycles for crawling and indexing web pages on the Internet can occasionally be measured in weeks so it is not surprising that entries in SERPs can be on occasions completely irrelevant to the keyword query.  Is there a way of correcting this situation?  It all stems from the fact that the Home page has too much authority?  Could this be reduced in some way and the resulting ‘authority’ that is freed up be spread around among other blog post pages?

The LMNHP approach

In trying to ‘flatten out’ the authority profile of a blog, no obvious solutions came to mind.  However a somewhat unorthodox approach seemed of interest, partially fuelled by the blogging approach that was being used for all the SMM blogs.  Having been frustrated by the typical blog with a number of blog posts all featured on the Home page, for some time all the SMM blogs had featured the latest post content on the Home page.  In other words, the latest post content appeared for example at http://www.staygolinks.com/. If you then clicked on the permalink that appeared in the H1 heading to the post content you would then be switched to the single post entry at http://www.staygolinks.com/latestpost.htm. This particular single post web page had only minor differences from the Home page version of this post entry.

Suddenly the light bulb came on.  Why not avoid the traditional Home page entirely and immediately switch (301 redirection) to the single latest blog post web page.  Details are given in the LMNHP post.  LMNHP is an acronym for Look Mom No Home Page.

What this means is that while this post continues to be the latest, the http://www.staygolinks.com/latestpost.htm will be treated as the URL that applies for the blog website.  Any back links, either external or internal, will be deemed to apply to that URL.  Thinking again about the earlier picture of what the search engines are registering, all items are now unchanging.  That means we have a Title and Meta Description that is appropriate precisely for this blog post content.  The content is unchanging and is the same over the long term.  The only difference is that the search engine may have been attempting to access http://www.staygolinks.com/ and is instructed by the 301 redirection to access the single post web page.  You might consider that in a sense the blog has no Home page.

Note that this is the only web page that is no longer active.  All the other web pages on the blog are active and unchanged.  All the back links are assigned to some blog web page or other.  It is uncertain how the search engines might be working with all this but so far there appear to be no surprises.

SERP Results For The SMM Blogs

It is still early days but so far the results of keyword queries are extremely gratifying.  Entries for new blog post pages are indexed and displayed rapidly.  They also are appearing with high rankings and do appear with relevant titles and descriptions.  This undoubtedly gives an SEO boost and also means that it is more likely that searchers will click on the item.  The biggest boost comes from the fact that the single post page seems to be directly assigned the back links for the domain.  For a time, this latest blog post is really working as a Home page and will only be supplanted when the next blog post is written.

Blog posts have always enjoyed an initial visibility, presumably based on some recency factor.  This is now magnified by a large number of back links that are also assigned to that URL.

A Possible PageRank Benefit

This approach undoubtedly results in a better (more even) distribution of PageRank among web pages.  No one outside Google knows exactly how the algorithms work with PageRank at this time.  The basic view is that the Home page amasses all the incoming back links to the domain.  This link-juice is then distributed among the internal web pages with some discounting applying, perhaps of the order of 15%.

In this new approach those same back links that were all directed to the Home page now go in groups directly to each of the single post web pages as they are issued.  It would seem that the discounting factor no longer apples since these links are now all external.  It has also been assumed that external links carry more weight than internal links so this may be an additional benefit.  Since Google is very guarded in whatever is said about the search algorithms, it is unlikely that these surmises can be confirmed or denied.

A Pleasant Surprise In The Tail

The results of this approach as seen in the SERPs is that better items appear that give a much clearer indication to the searchers of what they may find.  The constancy of content for blog pages fits the Google mission better since it is easier to index correctly and deliver relevant results. 

One pleasing bonus is revealed in the image below, which shows a search done prior to this current post being added to this blog.  As might be expected, a search for the previous blog title, The Google Tango, did give that blog post as the #1 entry in the SERP.  Clicking on the link took you precisely to that single post web page. 

google tango

What is intriguing is that Google still shows the domain itself as the URL for that web page, even though that is not the specific hyperlink for the entry.  However that is probably the most useful way of presenting information to the searcher.

Hyperlinks Get Even More Respect

Hyperlinks have never really got the respect they deserve.  Without them the Internet would be impossible.  The word is often now shortened to link and this word is often bandied around without thinking about the mind-opening implications bound up in that hyperlink word.

The term “hyperlink” was coined in 1965 (or possibly 1964) by Ted Nelson.  The Wikipedia explanation describes what he had defined

Hyperlinks are the basic building block of hypertexts. For example, some key words in a wiki such as Wikipedia are highlighted, and provide links to explanations of those words at other pages in the same wiki.
In directed links, the area from which the hyperlink can be activated is called its anchor (or source anchor); its target (or destination anchor) is what the link points to, which may be another location within the same page or document, another page or document, or a specific location within another page or document.

He also coined the word hypertext and the associated word hypermedia.  He bemoaned the fact that the latter had not taken off and instead became what we often call interactive media.

The hyperlink concept is really very powerful.  However Microsoft, as it has done with so many great ideas, did not leverage that power.  It is true that files or documents in the Office Suite of programs always have the hyperlinking capability.  So you will find:

  • Word hyperlinks
  • Excel hyperlinks
  • Powerpoint hyperlinks, and   
  • Outlook hyperlinks

Adobe also to an extent slowed down the wider use of hyperlinks since it is only recently that you can now create a PDF document with their software with active hyperlinks.

Luckily the hyperlink concept is much too powerful to be sidelined by this somewhat lukewarm support.  What really caused the hyperlink concept to take off was the creation of the World Wide Web by Sir Tim Berners-Lee.  No longer would a hyperlink merely connect you with some other point in the same document.  You could now connect with some online website that could be half way round the world.

The other powerful influence was that the two Google founders latched on to the notion that hyperlinks confirmed the popularity or authority of web pages.  They then they used this concept within their search algorithm.  Since for a given Web page they were interested in hyperlinks pointing to that web page, they used the term Backlink instead of hyperlink.  If they had only stuck with the term hyperlink, then again the concept might have gained more general understanding.

The strength of hyperlinks is confirmed by what was written in 1999.  As the ClueTrain Manifesto authors pointed out, almost everyone was hyperlinking and this was a movement that could not be stopped.

However, employees are getting hyperlinked even as markets are. Companies need to listen carefully to both. Mostly, they need to get out of the way so intranetworked employees can converse directly with internetworked markets.

Corporate firewalls have kept smart employees in and smart markets out. It’s going to cause real pain to tear those walls down. But the result will be a new kind of conversation. And it will be the most exciting conversation business has ever engaged in.

Ten years later, the strength of hyperlinks and the World Wide Web they made possible cannot be denied.  Most website owners acknowledge the mutual networking benefits they receive and include hyperlinks to other relevant sites that their visitors may wish to visit.  This summer there was even a question whether the BBC had finally changed policy and was using hyperlinks to external sources.  The answer is unclear but the eventual outcome will undoubtedly include external hyperlinks.

The latest word from Google points to an even greater support for the hyperlink concept.  The Google Webmaster Central Blog is now encouraging webmasters to include named anchors to define sections of their webpages and tips on how to do this best.  This will mean that a keyword search could actually rank most highly a hyperlink to a point within a document that is deemed to be most relevant.

As the Official Google Blog explains, the aim is to enable users to get to the information they want faster. Searchers will now find additional links in the result block, which allow users to jump directly to parts of a larger page. This is useful when a user has a specific interest in mind that is almost entirely covered in a single section of a page. Now they can navigate directly to the relevant section instead of scrolling through the page looking for their information.

We generate these deep links completely algorithmically, based on page structure, so they could be displayed for any site (and of course money isn’t involved in any way, so you can’t pay to get these links). There are a few things you can do to increase the chances that they might appear on your pages. First, ensure that long, multi-topic pages on your site are well-structured and broken into distinct logical sections. Second, ensure that each section has an associated anchor with a descriptive name (i.e., not just “Section 2.1”), and that your page includes a “table of contents” which links to the individual anchors. The new in-snippet links only appear for relevant queries, so you won’t see it on the results all the time — only when we think that a link to a section would be highly useful for a particular query.

If you have such web pages, this should ensure greater visibility and higher rankings for sections of your information-packed pages, so this is something to carefully consider. As a small test, you may wish to see how these internal web page links for Therapeutic Riding Associations and for Associations for the Disabled rank in Google searches for those terms. Once indexed, they should rank highly in related searches. Those hyperlinks certainly deserve some serious respect now.

Hanging / Dangling Web Pages Can Be PageRank Black Holes

Summary

Search Engine Optimization (SEO) for blogs is often not done effectively and posts rank below where they should be in keyword searches.  One particular problem can be hanging/dangling web pages created by the blogging software coupled with inappropriate use of robots.txt files and tags.  Such hanging web pages can act as sinks or black holes for PageRank, a key factor in the Google search algorithm.  This article provides a simple explanation of the issues involved and appropriate solutions.

Introduction

"You are creating hanging/dangling pages", wrote Andy Beard in a recent comment on a post on Avoiding WordPress Duplicate Content. After an e-mail exchange, I could understand his concern.  It is a potential problem that robots.txt files could create.  As Andy wrote some time back, it is one of the SEO Linking Gotchas Even The Pros Make. 

More recently, Rand Fishkin has pointed out that you should not Accidentally Block Link Juice with Robots.txt.  Rand advised doing the following:

  1. Conserve link juice by using nofollow when linking to a URL that is robots.txt disallowed
  2. If you know that disallowed pages have acquired link juice (particularly from external links), consider using meta noindex, follow instead so they can pass their link juice on to places on your site that need it.

Link juice is just another term for PageRank.  This PageRank value for any web page is an important element in how well it will rank in any keyword search.  It may be one of over 100 factors but it probably is the most important in the Google keyword search process. Avoiding losing PageRank that a web page could amass is an important function that SEOs should pursue.

After doing some research, it turns out to be a somewhat more complex issue requiring an understanding of some weighty articles.  Anyone involved in doing SEO or hiring an SEO consultant should be aware of the potential problem to ensure things are done correctly.  I also realized that there was no simple explanation of the issues so this post will attempt to rectify that omission.

Research on Hanging / Dangling Web Pages

If you want to do some of your own research, before checking out the later explanations, I found the following useful:

  • Dangling Pages – WebProWorld SEO Forum
  • What Do SEO/SEM People Put In Robots.txt Files? – Shaun Anderson
  • WordPress robots.txt SEO – AskApache Web Development
  • Internal Linking – META nofollow, rel nofollow, robots.txt Confusion thereon – Josh Spaulding

Of course with search engine algorithms, things are always in evolution.  The official word on the Google website gives the following information on rel="nofollow".

How does Google handle nofollowed links?

We don’t follow them. This means that Google does not transfer PageRank or anchor text across these links. Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap. Also, it’s important to note that other search engines may handle nofollow in slightly different ways.

That lead to the practice of PageRank sculpting, whereby people try to manage how PageRank is distributed among the web pages in a website.  More recently Matt Cutts of Google in a Q&A session at SMX Advanced 2009 in Seattle, WA, provided the current thinking on nofollow as recorded by Lisa Barone:

Q: It seems like you supported PageRank sculpting a year ago and now it seems like you don’t support it anymore. Why is that and will it become a negative indicator?

A: No, it won’t hurt your site. You can do your links however you want. You can use it to eliminate links to sign in forms and whatnot, but it is a better use of your time to fix your site architecture and fix the problem from the core. Suppose you have 10 links and 5 of them are nofollowed. There is this assumption that the other 5 links get ALL that PageRank and that may not be as true anymore (your leftover PageRank will now “evaporate”, says Matt.). You can’t shunt your PageRank where you want it to go. It’s not a penalty. It’s not going to get you in trouble. However, it’s not as effective. It’s a better use of your time to go make new content and do all the other things. If you’re using nofollow to change how PageRank flows, it’s like a band-aid. It’s better to build your site how you want PageRank to flow from the beginning.

Let us now try to pull all that together in a short number of simple explanations covering the important issues involved.

How PageRank is calculated

Google is not always completely open on what is involved in its search algorithms for obvious reasons.  The algorithms also evolve as the Q&A quote above shows.  The following is a best judgment on what is involved, but if anyone has corrections or modifications to what is shown, they are encouraged to add a comment.

The following diagram illustrates how PageRank is calculated for any web page and how fractions of the PageRank flow to and from linked web pages.  PageRank here is not the value that appears in the ‘thermometer’ in the Google Toolbar, and which goes from 0 to 10.  Instead this PageRank is the mathematical value used in the Google keyword search algorithm.  It is calculated for any web page and represents the probability that a random visitor would visit the given web page as opposed to visiting other web pages.

Here we have multiplied this mathematical value by a huge multiplier to give values that are easier to talk about.  We will use the term, PageRank factor, for this derived number.  The resulting number would normally be a value like 5.6 or 16.2 but here we have simplified yet again to round off to whole numbers.  This illustrates a typical web page (but with very few links).  Some links are external links involving other web pages on other websites (domains).  Some are internal links from web pages on the same website (domain).  The inlinks are hyperlinks on other web pages leading to this web page.  The outlinks are hyperlinks on the given web page to other web pages.

PageRank Illustration

What the image illustrates is that the PageRank factor of this web page (16) is determined by the sum of the PageRank factor contributions flowing through the inlinks.  This PageRank factor then flows out via the 4 outlinks with an equal PageRank factor contribution (4) on each link.

You can imagine this particular web page as being only one among the whole set of web pages on the Internet.  For the technically inclined, we should mention that these PageRank values all are interdependent so they are developed by a process of iteration starting with starting values and repeatedly recalculating to determine what the values are. That goes beyond the scope of this article.

How a robots.txt file changes the picture

PageRank with robots.txt file Illustration

If a robots.txt file disallows this web page for crawl visits by the search engine spiders, then provided they obey the robots.txt file, they would record the values and links shown in this image. These PageRank values are the same, whether or not the web page is blocked to crawlers by the robots.txt file. The record is indexed because there is an external inlink that the Google robots are crawling and they would also note the outlink going to another domain.  The outlinks to other web pages on the same domain (internal links) would not be recorded so these PageRank contributions are lost.  In this sense the web page has become a sink or black hole for these PageRank contributions.  They can no longer contribute to the PageRank of these other web pages.

Note that the PageRank factor values on the remaining links are the same as they were when the other links were being included. Merely saying the links should not be crawled, does not necessarily mean they should be assumed not to exist. This is in line with Matt Cutt’s most recent pronouncements.

How nofollow changes the calculation

Even if this web page was not excluded by a robots.txt file, a similar effect is created if all outlinks from the web page carry an attribute, rel=nofollow.  Again this assumes that the search engine correctly observes this attribute.  If on the other hand the links are assigned a follow attribute, then the PageRank contribution would flow through to all such links.

How to get only one web page that counts for any specific content

As Rand Fishkin suggested above, if more than one web page contains the same content, you can use a meta tag on all the secondary ones to signal noindex.  Then only the primary web page is in the search database, provided the meta tags are being observed.  Coupling this with a follow attribute in the meta tag, then assures that the PageRank contributions still flow out to the other web pages.

A better approach according to John Mueller, a Google representative, is to use a Rel=Canonical Tag rather than NoIndex.  Here is how Google describes this canonical tag.

We now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that’s accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.

Apparently Google treats this as a hint rather than a standard so it is not fool-proof. Others see Reasons to use rel=canonical, and reasons not to.

Best Practices

As Matt Cutts recommended, given the wooliness in some of the above, the preferred approach is to develop the website architecture so that duplicate web pages do not arise.  Then one does not have to rely on the canonical tag or the noindex follow combination.  In this way one avoids the hanging / dangling web pages problem.

The exact methods will depend on the architecture. One very useful approach is to show only an initial excerpt on the blog Home Page with a … more link to the full post as a single web page. For category or tag archive pages, you can show only the titles of items so this again avoids the duplicate content problem. The important thing is to be vigilant and look out for essentially duplicate web pages as revealed by a full website scan using the equivalent of a search engine robot such as Xenu.