Make Your Website Search Engine Robot-Friendly

Search Engine Robots Read Site Maps Too

In November 2006, all the major search engines for once agreed on new Sitemap standards. set out the rules for sitemap files that all the major search engines would follow.

If you use a program such as GSiteCrawler, you can produce a full listing of all the web pages on your website in an XML file: the standard name for this file is sitemap.xml. The search engines do prefer a G-zipped version of this file, usually named sitemap.xml.gz. The GSiteCrawler program produces both versions. Although even Microsoft’s MSN/Live subscribed to this standard, as yet they have not indicated how they wish to implement the standard. The other majors have been more helpful.

A good way to start is via the website for Google’s Webmaster Tools. Once you have loaded your sitemap file to your domain, you can submit this to Google. An advantage of this approach is that Google will then in due course evaluate the sitemap file and indicate any errors therein.

The real news came up last week when Google, Yahoo! and Ask indicated that another route to inform them of the sitemap file is to include a reference to the precise URL for the sitemap file in the robots.txt file. Every domain should have a robots.txt file, even if it is an empty file. Search engine robots (or spiders) will sometimes visit a domain and check only the robots.txt file. This confirms that the domain is live. Without such a file, an error is recorded. Now you can add anywhere in the file, say at the bottom, an additional line that reads as follows:

The robots.txt file is normally checked often by search engine spiders. By doing the above, you should quickly get the new file picked up. Ask, Google and Yahoo! are all using this robots.txt file approach.

If you have just loaded up a sitemaps file and want to be sure that the sitemap file is picked up ASAP, you can ping the search engines directly. The following hyperlinks are the appropriate way to do this.

Ask: sitemap=
Google: sitemap=
Yahoo: sitemap=

NOTE: The space after ping? should be removed. It is included here to improve the formatting of the blog post.

This should provide all the information you need on the sitemap file and how to alert the search engine robots that you have one. If there are additional points, hopefully someone will add them in the comments.

What’s new with – Official Google Webmaster Central Blog
Use Your Robots.txt To Publish Your Sitemaps Xml File – Cre8asite Forums Discussion

10 thoughts on “Make Your Website Search Engine Robot-Friendly”

  1. Pingback:
  2. Hi,

    Here is a question for anyone who may know the answer…

    I have a new website with google ads on it. If I get 10 clicks per day on my ads, does google favor my site over and above others that may be in the same category that don’t run google ads?

    I was thinking to some extent they might, so they can make more money.

    Do you know anything about this?

    Thanks, and I look forward to your reply.


  3. That’s a bit off-topic and I think you’d do better to post such a question in a forum such as the Cre8asite Forums, where you’d get a number of opinions.

    My own opinion on this is that what you do with Adwords in no way affects the ranking of your web pages in normal keyword searches. It would only make sense for Google to do so if it could tell everyone it was doing so, which would destroy its credibility.

  4. By the way submitting the sitemap.xml.gz is not the correct file sitemap.xml is the appropriate file
    as I used all those examples you offered and they all failed until i submitted sitemap.xml

    Kind regards Still useful


  5. Thanks for that comment, Bill. The .gz version works for me and is recommended by some experts in the field so I don’t know why you had a problem with it.

  6. I created a sitemap and pinged google 2 months ago but still don’t have all pages indexed? Does PR have anything to do with getting all your pages indexed? I have a PR 2.

    Also, thanks for the other and Yahoo ping urls!

  7. Sites with a higher PR get crawled and indexed significantly faster than a new site or lower PR site. Higher sites have the luxury of getting more attention from google. It’s also worth mentioning that it may take Google several visits to crawl an entire site. In many cases, pages that aren’t indexed haven’t been crawled yet.

Comments are closed.