How Sitemaps are Generated

The Sitemap plug-in does not actively generate sitemaps: it is simply a webapp that responds to certain HTTP requests. It will generate a sitemap in response to an HTTP GET request directed to one of four specific URLs, for example:

http://your-publication/sitemap/archive.xml

The plug-in generates and returns an aggregated sitemap containing the URLs of all the content items belonging to the content types for which you have enabled sitemap generation.

http://your-publication/sitemap/update.xml

The plug-in generates and returns a sitemap containing the URLs of all content items belonging to the content types for which you have enabled sitemap generation, and that were published in the last 72 hours. If you have enabled Google News or Google Video extensions for any content types, however, then they are omitted from this sitemap, and instead included in one of the two sitemaps listed below.

http://your-publication/sitemap/googlenews.xml

The plug-in generates and returns a sitemap containing the URLs of all the content items belonging to content types for which you have enabled the generation of Google News extended sitemaps, and that were published in the last 72 hours.

http://your-publication/sitemap/googlevideo.xml

The plug-in generates and returns a sitemap containing the URLs of all the content items belonging to content types for which you have enabled the generation of Google Video extended sitemaps, and that were published in the last 72 hours.

Once you have installed and configured the plug-in, nothing happens until a request for one of the above index documents is received. Typically you would:

  1. Submit a request yourself for http://your-publication/sitemap/archive.xml.

  2. Submit requests for all the sitemap URLs listed in the returned index document.

  3. Post the returned sitemap documents to the search engines you want to index your site.

  4. Add the URL http://your-publication/sitemap/update.xml to your robots.txt file.

  5. If you have enabled Google News sitemap extensions for any of your content types, add the URL http://your-publication/sitemap/googlenews.xml to your robots.txt file.

  6. If you have enabled Google Video sitemap extensions for any of your content types, add the URL http://your-publication/sitemap/googlevideo.xml to your robots.txt file.

Then each time a search engine indexer reads your robots.txt file it will:

  1. Send requests for all the sitemap indexes listed there.

  2. Send requests for all the sitemap URLs listed in the returned index documents.

  3. Send requests for all the recently published content items listed in the returned sitemaps and index them.

If you have enabled Google News or Google Video sitemap extensions for any of your content types, then you also need to submit the URLs of your extended sitemap indexes (http://your-publication/sitemap/googlenews.xml and http://your-publication/sitemap/googlevideo.xml) to Google's Webmaster Tools. This is necessary since Google does not use the standard robots.txt mechanism for indexing news and videos. For more information about this, see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668&topic=8476&ctx=topic#2.