Search Engine Robots Read Site Maps Too
If you use a program such as GSiteCrawler, you can produce a full listing of all the web pages on your website in an XML file: the standard name for this file is sitemap.xml. The search engines do prefer a G-zipped version of this file, usually named sitemap.xml.gz. The GSiteCrawler program produces both versions. Although even Microsoft’s MSN/Live subscribed to this standard, as yet they have not indicated how they wish to implement the standard. The other majors have been more helpful.
A good way to start is via the website for Google’s Webmaster Tools. Once you have loaded your sitemap file to your domain, you can submit this to Google. An advantage of this approach is that Google will then in due course evaluate the sitemap file and indicate any errors therein.
The real news came up last week when Google, Yahoo! and Ask indicated that another route to inform them of the sitemap file is to include a reference to the precise URL for the sitemap file in the robots.txt file. Every domain should have a robots.txt file, even if it is an empty file. Search engine robots (or spiders) will sometimes visit a domain and check only the robots.txt file. This confirms that the domain is live. Without such a file, an error is recorded. Now you can add anywhere in the file, say at the bottom, an additional line that reads as follows:
The robots.txt file is normally checked often by search engine spiders. By doing the above, you should quickly get the new file picked up. Ask, Google and Yahoo! are all using this robots.txt file approach.
If you have just loaded up a sitemaps file and want to be sure that the sitemap file is picked up ASAP, you can ping the search engines directly. The following hyperlinks are the appropriate way to do this.
NOTE: The space after ping? should be removed. It is included here to improve the formatting of the blog post.
This should provide all the information you need on the sitemap file and how to alert the search engine robots that you have one. If there are additional points, hopefully someone will add them in the comments.
What’s new with Sitemaps.org? – Official Google Webmaster Central Blog
Use Your Robots.txt To Publish Your Sitemaps Xml File – Cre8asite Forums Discussion