How to sitemap your website – Understanding Google Yahoo Bing XML HTML GZ PHP NET WEB Sitemaps
Sitemap Best Practices
What is a site map? (or sitemap) it is a list of pages of a website – accessible to crawlers or users. It can be either a document in any form used as a planning tool for web design, or a web page that lists the pages on a web site, typically organized in a simple hierarchical fashion.
Worldclassmedia recommends creating sitemaps for your site – based on the Sitemap.org Sitemap Protocol. Google, Bing and Yahoo support this coding protocol (these rules).
A little more specific:
Of course when we refer to sitemaps, we are talking about three main types of sitemaps for your site, all recommended:
- The HTML version that actually appears on your site.
- The XML version that you create for search engine submission.
- The .GZ sitemap (Gzipped compressed version) you can also submit to search engines.
Our definitive list of sitemap best practices will feature tips for all three.
Sequence: The best sequence to build this is:
1. Produce an HTML sitemap.
2. Add link to this in the footer of the site.
3. Produce an automatic generated .XML sitemap.
3. Produce automatically generated GZipped (.GZ) sitemap.
4. Submit both the .XML and the .GZ sitemap to Google, Yahoo and Bing via Webmaster tools.
5. Place a link to the .XML and .GZ sitemap in the robots.txt file.
This can be achieved fairly quickly by good software and done in 1 or two days. This will make Search Engine Crawling and Indexing a much easier process and allow more pages to be indexed faster.
An HTML sitemap is an important way to get higher crawl rates and indexed pages.
Ideally, for Search Engines and users we recommend providing a link (usually in the footer of your site) to an HTML sitemap giving users and the Search Engines a way to easily see the structure and crawl the site. An HTML sitemap should be generated and posted – at least for all directory level pages. If possible, best sitemap practice is an “on-the-fly automatically updated sitemap” for all pages (or listings) – this can be done with various languages (like PHP) and other CMS plugins.
- Your sitemap should be at the root of your website directory – your URL should look something like: “example(dot)com/sitemap.html”.
- Paginate the pages of the sitemap and, if possible, limit the number of links sitemap to 100-150 links per page This still appears to be the sweet spot with the search engines. If you are a larger site, consider the creation of multiple themed based sitemaps. Google even states in their Webmaster Guidelines, “If the site map is larger than 100 or so links, you may want to break the site map into separate pages.”
- Separate Your Links – Create a Multiple Themed Based Sitemaps – (Don’t Mix Peanut Butter with Jelly) – Try to separate the links/pages on site by themes or directories, and thus communicate your site as an authority on a given topic, creating themed sitemaps is a great way to not only help users navigate to this content, but to also let the engines know what your site is about. Themed based sitemaps also help create consistent interlinking by using similar keywords in anchor text thereby improving the inbound link quality of your pages that you are linking to.
Example: Zappos.com sitemap:
- Organize your sitemap(s) in a logical sequence – this speaks to the “Silo’s” as mentioned above. It should be easy for users to navigate your site, especially from your sitemap. Directing the user through a logical navigation map is an easy way to ensure that users will engage with your most important content.
- Use the word “sitemap” in the document name – each page can be “sitemap-1.html” (2, 3, etc.) this is important for easy recognition by Search Engines.
- Use descriptive, relevant, keyword rich anchor text – see Zappos’ example above. This is very important. Chances are that your sitemap is going to inherit Page-Rank from the homepage and thus be considered somewhat of an authoritative page. Using keyword rich anchor text from an authoritative (and more importantly relevant page) is a great way to improve the inbound link quality score for the page being linked to (a key factor used in the ranking algorithms of the search engines.). Try to incorporate keywords into the linking text that is featured on your sitemap.
- Keep your sitemap up-to-date – check for broken links and correct HTML.
- Link to the cleanest (canonical – i.e. your preferred page) URLs – If you have multiple versions of a webpage, you should use the canonical tag to identified the preferred URL. Ideally you should only have a single version of a given webpage on your site anyway. As a result, this is the page that you should be linking to from your sitemap.
- Avoid linking to redirected pages – any old page links that are redirected should be cleaned up if possible so that you are referring to the final destination page.
- Place a link to your sitemap on the homepage – preferably near the top. There are still numerous sites out there that make it difficult for users to find their sitemap. The link to your sitemap should be conspicuous. Many site owners link to their sitemap from their footer as well.
- Make your HTML sitemap a static page – avoid placing your sitemap in an image, in a Flash file or in an <i-frame> or type of coding that cannot be read by the search engine crawlers.
- Avoid hidden text or hidden links. – this should be self-explanatory by now, but this can and will get you penalized by the search engines. Try not to use a super small font either. Again if you need to have multiple pages for your sitemap do so. Take the time to plan out your sitemap strategy.
Here’s an example of a basic Sitemap with a single entry for a URL that includes an image and a video (for convenience, only a subset of available video information is shown).
<?xml version="1.0" encoding="UTF-8"?>
<video:player_loc allow_embed="yes" autoplay="ap=1">http://www.example.com/videoplayer.swf?video=123</video:player_loc>
<video:title>Grilling steaks for summer</video:title>
<video:description>Get perfectly done steaks every time</video:description>
Once you’ve created your Sitemap, you can submit it to Google using Webmaster Tools. (Before you do this, you must have added your site to your Webmaster Tools account.)
Other important points:
- Keep the sitemap under 50,000 URLs and 10MB in size – Search Engines will produce an error if the sitemap is too big. If needed, split the XML sitemaps up into separate files to allow better indexing and smaller sitemap sizes.
- Determine if you want to include the modified date – You may not want to use the modified date in the sitemap if you do not want to tell Search Engines the date of pages. Considering evergreen.
- Help Google Tell if anything changes– Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since they last crawled your site. Supporting this feature saves you bandwidth and overhead.
- Set the priority and frequency of your crawling activity – this is specific to your XML sitemap. It is a good idea to set the priority of your sitemap page higher so that the search engines crawl this page regularly. That way when you update your sitemap with new links to new pages, the search engines should be able to crawl and index this content sooner rather than later.
- Include information about images – again specific to your XML sitemap, you can use the sitemaps extension to provide Google with key information about images. For each URL you list in your Sitemap, you can add additional information about important images that exist on that page.
- Use Video Sitemaps – do you host your own videos on your site? If so you might want to leverage a video sitemap to enlighten the search engines about your video content.
- Put a link to your sitemaps in the Robots.txt file – The robots.txt file should contain a link to the sitemap.xml file.
Gzipped sitemaps are basically compressed versions of sitemaps. Most Search Engines accept .GZ and this presents another opportunity to submit your URL’s to search engines.
The same rules above apply to the .GZ files.