The XML Sitemaps allow webmasters to inform search engines about web pages on their websites that are available for crawling. The sitemap.xml file should only include URLs that you want to be indexed, to exclude URLs you should use robots.txt instead.
The XML Sitemap file must be UTF-8 encoded and you must entity-escape all data values to ensure full compatibility.
| Character | Escape Code | |
| "Ampersand" | & | & |
| "Single Quote" | ‘ | ' |
| "Double Quote" | "" | " |
| "Greater Than" | > | > |
| "Less Than" | < | < |
Here’s a sample sitemap containing one URL, all sitemap.xml files must contain the tags <urlset>; <url>; and <loc>. The < lastmod>; < changefreq>; and < priority> tags are optional:
<?xml version="1.0" encoding="UTF-8"?> < urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
Each sitemap cannot have more than 50,000 URLs and cannot be larger than 10MB (10,485,760 bytes). You can compress the sitemaps files using gzip compression to keep them below 10MB file size with the added benefit of reducing bandwidth usage. If you want to list more than 50,000 URLs, you must create multiple sitemap files, listing each sitemap in a sitemap index file.
Location of the sitemap.xml file
It is strongly recommended to place your sitemap files in the root directory of your server because its location determines which URLs can be indexed.
For example if a sitemap file is located in http://www.yoursite.com/shop/sitemap.xml can include URLs started with http://www.yoursite.com/shop/ but cannot include URLs started with http://www.yoursite.com/blog/.
The same rule applies for protocol (http vs https) or port numbers. Thus, in this example you couldn’t list https://www.yoursite.com/shop/contact.html or http://www.yoursite.com:100/shop/register.html.
Submitting your sitemap
Once your sitemap is created it’s time to inform search engine crawlers about its existence. In theory you can do this automatically through your robots.txt file. In your robots.txt just add the sitemap auto-discovery directive as follows:
SITEMAP: http://www.yoursite.com/sitemap.xml
Alternatively you can actively tell search engines every time your sitemap.xml file has changed through the ping URL or their webmaster interfaces:
Tino
SEO Programmer