A sitemap.xml tells search engines which pages exist on your site and when they were last updated. This complete guide covers creating the right sitemap, what to include and exclude, how to submit it to Google and Bing, and how to troubleshoot common sitemap errors.
SEO Tip: A properly structured sitemap can significantly speed up Google indexing of new pages. After submitting your sitemap, use PageGuard to verify your pages have the correct SEO meta tags, canonical URLs, and structured data that improve your indexing rate.
Check your sitemap health right now
Scan your site to find missing canonical URLs, meta tags, and structured data issues that hurt Google indexing.
A sitemap.xml is an XML-formatted file that serves as a roadmap of your website for search engine crawlers. It lists the URLs you want indexed, along with optional metadata about each URL: when it was last modified, how often it changes, and its relative importance.
The Sitemaps Protocol was originally developed by Google in 2005 and is now supported by all major search engines including Google, Bing, Yahoo, and DuckDuckGo. While search engines can discover pages through link crawling alone, sitemaps give you direct control over which pages get submitted for indexing.
Minimum valid sitemap structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yourdomain.com/page</loc>
<lastmod>2026-03-01</lastmod>
</url>
</urlset>
The only required element is <loc> — the canonical URL of the page. All other elements are optional.
Google's official guidance says a sitemap is particularly valuable if your site meets any of these criteria:
Even for small, well-linked sites, having a sitemap costs nothing and provides useful signals to Google. There is no downside to having one.
Understanding each XML element helps you build an effective sitemap:
| Element | Required | Description |
|---|---|---|
| <loc> | Yes | Absolute URL of the page. Must use HTTPS if available. Must match the canonical URL. |
| <lastmod> | Optional | Date of last modification in W3C Datetime format (YYYY-MM-DD). Must be accurate — Google may demote sites that set false lastmod dates. |
| <changefreq> | Optional* | How often the page changes: always, hourly, daily, weekly, monthly, yearly, never. *Largely ignored by Google; Bing may use it. |
| <priority> | Optional* | Relative importance from 0.0 to 1.0 (default 0.5). *Largely ignored by Google. Bing may use it for crawl scheduling. |
Note: Google officially states it ignores <changefreq> and <priority>. Focus on accurate <lastmod> dates instead.
The right approach depends on your website platform:
Install Yoast SEO or RankMath. Both auto-generate and continuously update your sitemap at yourdomain.com/sitemap_index.xml. They create separate sitemaps for posts, pages, categories, tags, and custom post types. Configure which post types to include in the plugin settings.
Shopify automatically generates a sitemap at /sitemap.xml that includes products, collections, pages, and blog posts. WooCommerce with Yoast SEO includes product and product category pages in the sitemap automatically.
Use the next-sitemap package. It generates sitemaps at build time or via an API route. Configure next-sitemap.config.js to exclude private pages like /admin/* and /dashboard/*.
All three static site generators include built-in sitemap support. Hugo generates sitemap.xml automatically. Jekyll uses the jekyll-sitemap gem. Eleventy uses eleventy-plugin-sitemap.
Generate sitemaps programmatically using your CMS API. Fetch all published URLs, filter out private or duplicate pages, and output valid XML. For large sites (50,000+ URLs), use a sitemap index file that references multiple sitemap files.
These platforms auto-generate sitemaps. Squarespace: /sitemap.xml. Webflow: automatically created and submitted. Wix: /sitemap.xml is generated automatically.
Only include pages that you want Google to index and that represent unique, valuable content:
✓ Include These
✗ Exclude These
Add Sitemap to robots.txt
Add Sitemap: https://yourdomain.com/sitemap.xml to your robots.txt file. Any crawler that reads your robots.txt will automatically find your sitemap.
Open Google Search Console
Go to search.google.com/search-console and select your property. If you haven't verified your site yet, complete verification first using the HTML meta tag method.
Navigate to Sitemaps
In the left sidebar, under Indexing, click Sitemaps. You'll see a list of any previously submitted sitemaps and their status.
Enter and Submit Your Sitemap URL
In the "Add a new sitemap" field, enter the relative path (just sitemap.xml, not the full URL). Click Submit. GSC will start processing your sitemap immediately.
Submit to Bing Webmaster Tools
Don't forget Bing. Go to bing.com/webmasters, verify your site, and submit your sitemap under Sitemaps. Bing drives 6–10% of search traffic in the US and is the default search engine on Edge and many AI assistants.
A single sitemap file can contain a maximum of 50,000 URLs and must be under 50MB uncompressed. For large sites, use a sitemap index file that references multiple sitemap files:
Sitemap index structure:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://yourdomain.com/sitemap-pages.xml</loc>
<lastmod>2026-03-01</lastmod>
</sitemap>
<sitemap>
<loc>https://yourdomain.com/sitemap-posts.xml</loc>
<lastmod>2026-03-01</lastmod>
</sitemap>
<sitemap>
<loc>https://yourdomain.com/sitemap-products.xml</loc>
<lastmod>2026-03-01</lastmod>
</sitemap>
</sitemapindex>
Group your sitemaps logically by content type (pages, posts, products, images) or by date range for archives. Submit only the index file to Google Search Console — it will process all referenced sitemaps automatically.
Extend your sitemap with image and video extensions to make media content eligible for Google's image and video search results:
<url>
<loc>https://yourdomain.com/blog/post</loc>
<image:image>
<image:loc>https://yourdomain.com/images/hero.jpg</image:loc>
<image:title>Hero image description</image:title>
<image:caption>Caption for the image</image:caption>
</image:image>
</url>
For video content, include video:video extensions with video:thumbnail_loc, video:title, video:description, and video:content_loc or video:player_loc. This makes your videos eligible for video carousels in search results.
News publishers should use the Google News sitemap extension with news:news, news:publication, and news:publication_date to get content included in Google News. Only include articles published within the last 2 days.
❌ "Couldn't fetch" error in GSC
Google can't access your sitemap file. Check: (1) The sitemap URL is publicly accessible (no login required); (2) Your server returns a 200 status for the sitemap URL; (3) The file isn't blocked in robots.txt; (4) Your server's User-Agent policy doesn't block Googlebot.
❌ URLs discovered but not indexed
Google found your pages but chose not to index them. Common causes: thin content, duplicate content, quality issues, noindex tags, incorrect canonicals, or low page authority. Improve content quality, verify no accidental noindex tags, and build internal links to affected pages.
❌ Invalid URL in sitemap
Special characters must be HTML-encoded in XML sitemaps: & for &, < for <, > for >, ' for apostrophes. URLs must use UTF-8 encoding. Spaces and special characters in URLs should be percent-encoded.
❌ noindex URL in sitemap
Including noindex pages in your sitemap is a contradiction and wastes crawl budget. GSC will report these as errors. Audit your sitemap regularly to ensure every included URL is indexable and returns 200.
❌ Sitemap contains redirect URLs
Always use the final destination URL in your sitemap, not the redirecting URL. Update your sitemap whenever you change URL structure and implement 301 redirects. Submitting redirect URLs wastes crawl budget and confuses Google about your canonical URL structure.
❌ Invalid date format for lastmod
Use W3C Datetime format: YYYY-MM-DD (e.g., 2026-03-04) or the full datetime format YYYY-MM-DDThh:mm:ss+00:00. Common mistake: using MM/DD/YYYY or DD-MM-YYYY which causes GSC to report invalid date errors.
https://yourdomain.com/sitemap.xml, not in a subfolder, for easy discovery
Sitemap: https://yourdomain.com/sitemap.xml so all crawlers find it automatically
https://), not relative paths
<link rel="canonical"> tag
Content-Encoding: gzip header.
These two files serve complementary but distinct purposes:
| sitemap.xml | robots.txt | |
|---|---|---|
| Purpose | Tell crawlers which pages exist and should be indexed | Tell crawlers which pages NOT to crawl |
| Function | Discovery aid — helps find and index pages | Access control — restricts crawl access |
| Format | XML with URL list | Plain text with Allow/Disallow directives |
| Location | Any path (commonly /sitemap.xml) | Must be at /robots.txt (root only) |
| Indexing effect | Suggests indexing (not a guarantee) | Disallow prevents crawling but not indexing if linked |
Use both together: robots.txt restricts crawler access to private sections; sitemap.xml promotes your public content pages for indexing.
Submitting your sitemap is the first step — but you also need to ensure the pages in your sitemap are technically sound enough for Google to actually index them. Pages with missing canonical tags, incorrect meta tags, or structured data errors are often deprioritized or skipped by Google's indexer even if they're in your sitemap.
PageGuard scans individual pages and checks the technical SEO signals that influence whether Google indexes them: correct canonical URLs, valid meta tags, structured data, Core Web Vitals, and accessibility compliance — all factors Google uses when deciding indexing priority.
Ensure your sitemap pages are indexable
Scan any URL from your sitemap to verify it has the correct canonical, meta tags, and structured data Google needs to index it.
A sitemap.xml is an XML file that lists all the URLs on your website you want search engines to discover and index. While Google can find pages through links alone, a sitemap ensures every important page gets crawled — especially for new sites, large sites, or pages with few internal links. Even if your site is small and well-linked, having a sitemap costs nothing and is always recommended.
The method depends on your platform: WordPress uses Yoast SEO or RankMath plugins; Shopify, Squarespace, and Wix generate sitemaps automatically; Next.js uses the next-sitemap package; Hugo, Jekyll, and Eleventy have built-in sitemap support. For custom sites, generate XML programmatically from your URL list and deploy at /sitemap.xml.
Go to Google Search Console → Indexing → Sitemaps → enter 'sitemap.xml' in the Add a new sitemap field → click Submit. Google will process your sitemap and show discovered vs. indexed URL counts. Also add 'Sitemap: https://yourdomain.com/sitemap.xml' to your robots.txt file so all crawlers can find it automatically.
Exclude: pages with noindex meta tags, login and admin pages, thank-you/confirmation pages, duplicate paginated pages, filtered URL variants, pages blocked in robots.txt, and redirect URLs (use the destination URL instead). Only include pages with unique, valuable content that return HTTP 200 and have a canonical pointing to themselves.
Update your sitemap whenever you add, remove, or significantly change important pages. For CMS-based sites, automate sitemap regeneration on publish. Use accurate lastmod dates — Google may demote sites that set all dates to today as a trick to force re-crawling. Resubmit in Google Search Console after major content additions.
robots.txt Guide 2026
Configure crawler access, protect private pages, and optimize crawl budget
Technical SEO Checklist 2026
Complete technical SEO audit covering crawlability, indexing, and Core Web Vitals
Google Search Console Guide 2026
Set up GSC, submit sitemaps, and fix indexing issues
On-Page SEO Checklist 2026
Optimize title tags, meta descriptions, headings, and structured data