How Sitemaps are Generated
The
Sitemap
plug-in does not actively generate sitemaps: it
is simply a webapp that responds to certain HTTP requests. It will
generate a sitemap in response to an HTTP GET
request directed to one of four specific URLs, for example:
http://
your-publication/sitemap/archive.xml
-
The plug-in generates and returns an aggregated sitemap containing the URLs of all the content items belonging to the content types for which you have enabled sitemap generation.
http://
your-publication/sitemap/update.xml
-
The plug-in generates and returns a sitemap containing the URLs of all content items belonging to the content types for which you have enabled sitemap generation, and that were published in the last 72 hours. If you have enabled Google News or Google Video extensions for any content types, however, then they are omitted from this sitemap, and instead included in one of the two sitemaps listed below.
http://
your-publication/sitemap/googlenews.xml
-
The plug-in generates and returns a sitemap containing the URLs of all the content items belonging to content types for which you have enabled the generation of Google News extended sitemaps, and that were published in the last 72 hours.
http://
your-publication/sitemap/googlevideo.xml
-
The plug-in generates and returns a sitemap containing the URLs of all the content items belonging to content types for which you have enabled the generation of Google Video extended sitemaps, and that were published in the last 72 hours.
Once you have installed and configured the plug-in, nothing happens until a request for one of the above index documents is received. Typically you would:
-
Submit a request yourself for
http://
your-publication/sitemap/archive.xml
. -
Submit requests for all the sitemap URLs listed in the returned index document.
-
Post the returned sitemap documents to the search engines you want to index your site.
-
Add the URL
http://
your-publication/sitemap/update.xml
to yourrobots.txt
file. -
If you have enabled Google News sitemap extensions for any of your content types, add the URL
http://
your-publication/sitemap/googlenews.xml
to yourrobots.txt
file. -
If you have enabled Google Video sitemap extensions for any of your content types, add the URL
http://
your-publication/sitemap/googlevideo.xml
to yourrobots.txt
file.
Then each time a search engine indexer reads your
robots.txt
file it will:
-
Send requests for all the sitemap indexes listed there.
-
Send requests for all the sitemap URLs listed in the returned index documents.
-
Send requests for all the recently published content items listed in the returned sitemaps and index them.
If you have enabled Google News or Google Video sitemap extensions
for any of your content types, then you also need to submit the URLs
of your extended sitemap indexes
(http://
your-publication/sitemap/googlenews.xml
and
http://
your-publication/sitemap/googlevideo.xml
)
to Google's Webmaster Tools. This is necessary since Google does not
use the standard robots.txt
mechanism for
indexing news and videos. For more information about this, see
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668&topic=8476&ctx=topic#2.