How It Works
The Sitemap
plug-in generates sitemaps in the sitemaps.org
XML
format specified in http://www.sitemaps.org/protocol.html. This
is what a (very small) sitemap.org
sitemap document
looks like:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/incoming/article51.ece</loc> <lastmod>2013-05-31T11:51:13+06:00</lastmod> </url> <url> <loc>http://www.example.com/incoming/article42.ece</loc> <lastmod>2013-05-31T11:51:23+06:00</lastmod> </url> </urlset>
The plug-in can generate two basic types of sitemap:
- Aggregated sitemap
-
An aggregated sitemap contains the URLs of all selected content items that are in a published state at the time the sitemap is generated. This kind of sitemap is only really intended to be generated one time, when a site is first published and you want ensure that the entire site is indexed. The idea is that you explicitly request generation of the sitemap yourself and then upload it to the search engines you are interested in.
- Update sitemap
-
An update sitemap only contains the URLs of recently published content items that have been published recently (by default over the last 72 hours). The idea is that you publish the URL of this sitemap in your site's
robots.txt
file so that in can be found by search engine indexers, which periodically visit it and index all the listed URLs. Alternatively you can control the process yourself by creating an application orcron
job that actively posts it to the search engines you are interested in at intervals.
Both types of sitemap have exactly the same structure, the only difference is the number of entries they contain.
In order to prevent sitemap documents becoming unmanageably large,
the sitemaps.org
standard allows sitemaps to be split
into multiple documents that are then referenced by a master sitemap
index. The Sitemap
plug-in makes use of this feature. It generates one sitemap index per
Escenic
publication, which in turn references one sitemap document for every
content type that you choose to include. Here is a small example of a
sitemap index:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap/sections.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/news.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/review.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/video.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> </sitemapindex>
If number of articles of a content type exceeds entry per sitemap value
which is defined in SitemapConfig.properties
then sitemap
documents are generated based on the ratio of entry per sitemap value for every
content type. Here is a small example of a sitemap index which contains multiple
sitemap documents for a single content type:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap/sections.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/news/1.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/news/2.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/review.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap/video.xml</loc> <lastmod>2013-05-31T12:07:35+06:00</lastmod> </sitemap> </sitemapindex>
You choose which content types you want to be included by adding
seo:enabled
elements to content types in your
publication content-type
resources (see Editing the content-type Resource).