Google Duplicate Content And WordPress – An Unresolved Problem

Of all the topics that come up frequently in SEO discussions, duplicate content is at the head of the list.  It comes up in two contexts.  The first concerns all those scraper sites that are created by spammers to create backlinks and do this by stealing copy from the original, legitimate authors. 

Continue reading

Technorati Tags: , , , ,

Seth Godin Thinks Bigger Is Better

If you think that is a very un-Godin-like title, you’re in for a surprise.  You may have been thinking about Seth Godin’s post on Small Is The New Big.

Small is the new big only when the person running the small thinks big.  Don’t wait. Get small. Think big.

Continue reading

Technorati Tags: , , , ,

The Marshmallow Challenge

marshmallow challenge las vegas


A Marshmallow Challenge might seem to be an unlikely topic for this blog, but it is a topic on today’s Ted Blog that is amusing and stimulating.  It is a talk by Tom Wujec and it is all about collaboration, innovation and creativity.  He is a Fellow at Autodesk and is the creator of the marshmallow challenge.

He describes it as one of the fastest and most powerful techniques for teams to improve their capacity to generate fresh ideas, build rapport, and master the skill of prototyping – all of which lie at the heart of team innovation.

The challenge is disarmingly simple.  The teams get 20 sticks of spaghetti, one yard of masking tape, one yard of string and one marshmallow.  The rules are straightforward:

  1. Build the Tallest Freestanding Structure: The winning team is the one that has the tallest structure measured from the table top surface to the top of the marshmallow. That means the structure cannot be suspended from a higher structure, like a chair, ceiling or chandelier.
  2. The Marshmallow Must be on Top: The entire marshmallow needs to be on the top of the structure. Cutting or eating part of the marshmallow disqualifies the team. 
  3. Use as Much or as Little of the Kit: The team can use as many or as few of the 20 spaghetti sticks, as much or as little of the string or tape. The team cannot use the paper bag as part of their structure.
  4. Break up the Spaghetti, String or Tape: Teams are free to break the spaghetti, cut up the tape and string to create new structures.
  5. The Challenge Lasts 18 minutes: Teams must not be holding on to the structure when the time runs out. Teams holding the structure will be disqualified.

If you explore the website, you will see some of the creative solutions that teams have used around the world.

This post is also a challenge in another way.  Like the other SMM blogs, this blog is currently structured according to the LMNHP (Look Mom No Home Page) approach.  It is early days and the key measure of how useful this approach is will come from rankings of posts in the SERPs (Search Engine Results Pages). 

The Marshmallow Challenge has produced no less than 390,000 items and the TED blog post today will encourage many more.  It will be interesting to see, given that competition, how high this blog post can rise for a Google search for ‘Marshmallow Challenge’.  We will keep you posted.

Reblog this post [with Zemanta]

Technorati Tags: , , , ,

Higher Search Engine Rankings Without A Home Page

Most websites receive the greatest proportion of their visitors from search engines.  Having a high ranking in Search Engine Results Pages (SERPs) is therefore a priority.  The factors that are important in this are fairly well known by now.  People with traditional websites with static web pages apply such methods, usually with a reasonable amount of success.

Blogs have now come on the scene and ‘out of the box’ they seem to do exceptionally well in keyword queries.  There are two reasons for this.

  1. Blogs usually create RSS news feeds, which give an instant alert to the search engines that a new post has been written.
  2. Google attaches considerable weight to the recency of new web pages. 

This is why blog pages seem to rank very highly in keyword searches, particularly in the early days after they appear.


This has resulted in a feverish interest in having blogs.  Without thinking too much about it, many have set up blogs.  Software such as WordPress make it Oh So Easy. .. and surprise, surprise Google loves blog posts.  This sounds like a no-brainer?

Even though the results are impressive, you can do even better.  However you may need to discard some of your preconceptions.  Let us explore the nature of a blog and how it performs in search engine keyword searches.

The Typical blog

If you go to visit a typical blog, then at the blog website, say www.myblog.com, you will find displayed a long scrolling Home page with several posts, usually with the most recent post appearing at the top.  In some cases, the full content of each post may be shown and in other cases you may get only a short extract with a link to read More.  This link takes you to a web page that shows only the single post you are interested in.  Some people prefer to arrange their blog in this way to avoid duplicating exactly the same content on two separate web pages.  This would mean that the search engine might serve up either web page in a keyword search when the single post web page was really more appropriate.

The simple picture below shows what a search engine holds about any blog web page.  For the Home page for example, that is displayed when you visit www.myblog.com, the search engine has the URL, www.myblog.com, the blog title, the blog meta description, the current content of the Home page and a list of back links.  Back links are the URLs of other web pages that have hyperlinks pointing to this Home page.

blog structure

The Google algorithm (and probably other search engine algorithms too) take into account those back links in determining the importance of this particular Home page.  In the case of Google, they use the term PageRank as a measure of this importance.  A large number of back links, particularly if they come from authoritative websites like the New York Times, the BBC or CNN, will mean that the Home page has a higher PageRank.  As the search engine spiders (sometimes called crawlers or robots) wander around the Internet, they register all these back link URLs.  Since many of them ‘point’ to the Home Page, this means that the Home page will have the highest PageRank.  New single blog post web pages will have few direct links so their PageRank is usually not defined for weeks or even months.

Even though the single blog post web pages may never get direct links, they can benefit from the internal links within the website.  For example that link  to read More on the Home page does confer some PageRank contribution on the single blog post web page.  According to what has been published by Google, a discount factor applies so that say only 85% of that PageRank contribution is applied to the single post web page.  So far this is all standard ‘stuff’.  Let us now begin to give some different insights on what is happening.

Note that we have signaled something different about that content on the Home page with that yellow background.  Unlike a traditional website with static web pages, the content keeps changing as new blog posts are written.   Either an extract or the full content of the latest blog post is added to the top of the web page and the oldest blog post is bumped off the bottom of the web page.  Once two or three blog posts have been added, what the search engine is holding in its database may differ markedly from what is appearing on a visit made today.  Of course if the search engine spider does recrawl the web page, then the content will be updated.  However for most blogs what the search engine is storing for the Home page will be different from what is currently being displayed.

The Typical SERP For A Keyword Query

SERP is the acronym for Search Engine Results Page, which is what the search engine displays when you do a keyword query.  When someone does a keyword query in a search engine, the search engine finds content in the database that matches the keywords.  If both the Home page content as stored and the individual blog post web page content (if stored) would be relevant, nevertheless the Home page ranks higher so will very likely be shown first or may be the only entry shown.  Although the entry was chosen as being relevant, what appears in the first line of the SERP entry is the Title of the whole blog.  This is general and probably does not refer to the keywords.  Moreover in developing the explanatory snippet, the search engine can only rely on the Meta Description for the whole blog, which is probably irrelevant, and on the content of the blog Home page as recorded when spidered.  The combination of an irrelevant title and a somewhat fuzzy snippet in the SERP is probably unlikely to attract the click of the searcher.

If the searcher does click on the entry, the blog Home page has likely changed by now and the keywords may no longer even be in the current version.  It might be thought that by going to the Cached version of the page, you may be able to see a version that includes the keywords.  However even here a caution is appropriate.  Search engines do not necessarily create a cached version of the web page on every spider visit.  The cached version may then be from an even earlier period.  In such a case, we have the somewhat anomalous situation that the cached version of the web page and the current version of the web page do not show the keywords, but the version that the search engine crawled in between the cached date and the current date did contain the keywords.  This is why the blog Home page was the item shown in the keyword query SERP.

The search engine time cycles for crawling and indexing web pages on the Internet can occasionally be measured in weeks so it is not surprising that entries in SERPs can be on occasions completely irrelevant to the keyword query.  Is there a way of correcting this situation?  It all stems from the fact that the Home page has too much authority?  Could this be reduced in some way and the resulting ‘authority’ that is freed up be spread around among other blog post pages?

The LMNHP approach

In trying to ‘flatten out’ the authority profile of a blog, no obvious solutions came to mind.  However a somewhat unorthodox approach seemed of interest, partially fuelled by the blogging approach that was being used for all the SMM blogs.  Having been frustrated by the typical blog with a number of blog posts all featured on the Home page, for some time all the SMM blogs had featured the latest post content on the Home page.  In other words, the latest post content appeared for example at http://www.staygolinks.com/. If you then clicked on the permalink that appeared in the H1 heading to the post content you would then be switched to the single post entry at http://www.staygolinks.com/latestpost.htm. This particular single post web page had only minor differences from the Home page version of this post entry.

Suddenly the light bulb came on.  Why not avoid the traditional Home page entirely and immediately switch (301 redirection) to the single latest blog post web page.  Details are given in the LMNHP post.  LMNHP is an acronym for Look Mom No Home Page.

What this means is that while this post continues to be the latest, the http://www.staygolinks.com/latestpost.htm will be treated as the URL that applies for the blog website.  Any back links, either external or internal, will be deemed to apply to that URL.  Thinking again about the earlier picture of what the search engines are registering, all items are now unchanging.  That means we have a Title and Meta Description that is appropriate precisely for this blog post content.  The content is unchanging and is the same over the long term.  The only difference is that the search engine may have been attempting to access http://www.staygolinks.com/ and is instructed by the 301 redirection to access the single post web page.  You might consider that in a sense the blog has no Home page.

Note that this is the only web page that is no longer active.  All the other web pages on the blog are active and unchanged.  All the back links are assigned to some blog web page or other.  It is uncertain how the search engines might be working with all this but so far there appear to be no surprises.

SERP Results For The SMM Blogs

It is still early days but so far the results of keyword queries are extremely gratifying.  Entries for new blog post pages are indexed and displayed rapidly.  They also are appearing with high rankings and do appear with relevant titles and descriptions.  This undoubtedly gives an SEO boost and also means that it is more likely that searchers will click on the item.  The biggest boost comes from the fact that the single post page seems to be directly assigned the back links for the domain.  For a time, this latest blog post is really working as a Home page and will only be supplanted when the next blog post is written.

Blog posts have always enjoyed an initial visibility, presumably based on some recency factor.  This is now magnified by a large number of back links that are also assigned to that URL.

A Possible PageRank Benefit

This approach undoubtedly results in a better (more even) distribution of PageRank among web pages.  No one outside Google knows exactly how the algorithms work with PageRank at this time.  The basic view is that the Home page amasses all the incoming back links to the domain.  This link-juice is then distributed among the internal web pages with some discounting applying, perhaps of the order of 15%.

In this new approach those same back links that were all directed to the Home page now go in groups directly to each of the single post web pages as they are issued.  It would seem that the discounting factor no longer apples since these links are now all external.  It has also been assumed that external links carry more weight than internal links so this may be an additional benefit.  Since Google is very guarded in whatever is said about the search algorithms, it is unlikely that these surmises can be confirmed or denied.

A Pleasant Surprise In The Tail

The results of this approach as seen in the SERPs is that better items appear that give a much clearer indication to the searchers of what they may find.  The constancy of content for blog pages fits the Google mission better since it is easier to index correctly and deliver relevant results. 

One pleasing bonus is revealed in the image below, which shows a search done prior to this current post being added to this blog.  As might be expected, a search for the previous blog title, The Google Tango, did give that blog post as the #1 entry in the SERP.  Clicking on the link took you precisely to that single post web page. 

google tango

What is intriguing is that Google still shows the domain itself as the URL for that web page, even though that is not the specific hyperlink for the entry.  However that is probably the most useful way of presenting information to the searcher.

Reblog this post [with Zemanta]

Technorati Tags: , , ,

The Google Tango

What image does that Google Tango call to mind?  Perhaps it was of the Google co-founder Sergey Brin ordering three electric Tango vehicles.  Brin and others have been heavy into electric cars recently. Brin is invested in Tesla, the manufacturer of the Tango, and has ordered three Tangos (all the luxury T6000 model, which cost $148,000 each).

tango
Courtesy of Camille Cusumano

What we had in mind was the other Tango.  For those who are not into ballroom dancing, that’s the evocative South American dance with the rhythm, Slow, Slow, Quick, Quick, Slow.  That seemed an appropriate description of Google’s speed of action on a variety of operations.  Of course Google prides itself on delivering search results on complex keyword searches in a fraction of a second. 

Google can also react fast to signals that are sent directly to it.  This means that for blogs, indexing of blog posts can be very fast given that RSS news feeds provide an immediate signal when new posts have been added.

That is a process that Google finds very effective.  That is why Google is pushing for a new system that will allow the Google Index to Go Real Time.

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. The PubSubHubbub (PuSH) real time syndication protocol, could be used by Google for indexing the web instead of crawling the links.  PuSH is a syndication system based on the ATOM format whereby a publisher tells the world about a Hub that it will notify every time new content is published. Google would ask every website to declare which Hub they push to at the top of each document.

So much for the Quick, Quick but why the Slow, Slow for Google.  This is because there are some processes that operate on a much slower time cycle. Perhaps one of the most extreme is Google Maps.  Google can partially blame the map database sources it uses. However there are some examples that are almost ludicrous.  The biggest local example of that is hard to miss.  The data for the Golden Ears Bridge across the Fraser River took almost 9 months of operations before Tele Atlas updated its map index as of March 31.  Mapquest picked it up immediately.  At the time of writing some 12 days later, Google Maps still has not picked this up.

The other area where Slow, Slow applies is the speed at which new web pages not included in RSS news feeds get into the Google index. In some cases, this can be measured in months.  Here the enormous and explosively growing size of the Internet limits what is possible.  Even if a URL to a web page is found, it may be some time before the spiders or crawlers can revisit to fully identify what is located at that URL.

In this case, Google had a choice on whether its index should be Big and/or Fast and/or Accurate.  In practice given the Internet dynamics, only two of these are attainable at the same time.  Google has chosen Big and Accurate and the result is as fast as they can make it, which is still very slow. 

We are now promised that a new process, Google Caffeine, is being slowly rolled out.  However this will probably deal with the way search results are developed rather than the way web pages are added to the index.  It seems likely that we must stay satisfied with the Slow, Slow rhythm for the speed at which web pages are included in the index.

Nevertheless Google offers sufficient processes that go at the Quick, Quick pace so must of us will continue to be happy with the Google Tango.

Reblog this post [with Zemanta]

Technorati Tags: , , , ,