What Should a Sitemap Include?

sitemap example

An XML sitemap is a search engine indexing tool, not a dumping ground for every URL your website can generate. Its job is simple: tell search engines which pages actually matter and should be crawled and indexed.

A sitemap exists to guide crawlers to important, index-worthy URLs—the pages you want showing up in search results. That means pages with real content, real intent, and real value to users.

One of the biggest misconceptions in SEO is thinking that adding more URLs helps performance. It doesn’t. More URLs ≠ better SEO. In fact, bloated sitemaps can make it harder for search engines to understand what’s important on your site.

The Core Purpose of a Sitemap

At its core, a sitemap helps search engines discover and prioritize your most important pages. It removes guesswork and gives crawlers a clear roadmap of where to go.

A well-built sitemap also reinforces your site structure and content hierarchy. It signals which pages sit at the top of your site’s importance ladder and which ones support them.

For larger websites in particular, sitemaps support efficient crawling and indexing. Search engines have limited crawl resources, and a clean sitemap helps ensure that time is spent on pages that actually matter—rather than wasted on junk URLs.

Pages That SHOULD Be Included in a Sitemap

Core Website Pages

These pages form the backbone of your site’s search visibility and should always be included:

  • Homepage
  • Primary service or product pages
  • Key category or pillar pages
  • About, Contact, and other essential brand pages

These URLs define who you are, what you offer, and how users (and search engines) understand your site. If a page represents your business or core offerings, it belongs in your sitemap.

Blog Posts and Resource Content

Your sitemap should also include:

  • Evergreen blog posts
  • High-value guides, FAQs, or educational content
  • Articles intended to rank and attract organic traffic

This is where quality over quantity matters most. If a post is thin, outdated, or not meant to rank, it doesn’t belong. Sitemaps should reflect your strongest content—not every post you’ve ever published.

Index-Worthy Product or Location Pages (If Applicable)

For eCommerce or service-area businesses, include:

  • Product detail pages (not filters or variants)
  • Location-based landing pages designed to rank
  • Pages with unique copy and clear search intent

If a page is meant to attract search traffic and convert users, it belongs in your sitemap. If it exists only for navigation or filtering, it does not.

Pages That Should NOT Be in a Sitemap

Author Archives

For most websites, author pages don’t provide standalone search value. They often repeat information already found elsewhere and add little unique content.

Including them can create duplicate content and thin-value issues, which dilute crawl focus and weaken overall SEO clarity.

Category and Tag Archives

Category and tag pages sometimes feel important—but in most cases, they aren’t.

Unless a category page is intentionally optimized, offers real value, and is meant to rank, it should be excluded. Most category and tag URLs dilute crawl focus and indexing signals by creating near-duplicate collections of the same content.

Low-Value System Pages

These pages should never appear in a sitemap:

  • Login, admin, cart, checkout, and thank-you pages
  • Internal search result pages
  • Paginated archive pages

Including these URLs wastes crawl budget and sends mixed signals to search engines. If a page isn’t meant to rank—or shouldn’t be visible in search at all—it doesn’t belong in your sitemap.

Non-Canonical, Redirected, or Noindexed URLs

Your sitemap should never include URLs that you’ve already told search engines not to index or rank. That includes:

  • URLs with noindex tags
  • Redirected URLs (301 or 302)
  • Parameterized URLs and tracking variations

These URLs either point somewhere else, aren’t meant to appear in search results, or represent duplicate versions of the same content. Including them creates confusion and wastes crawl resources. The rule is simple: if it shouldn’t rank, it doesn’t belong in your sitemap.

Sitemap Metadata That Actually Matters

Last Modified Dates

The lastmod field tells search engines when a page was meaningfully updated. When used correctly, it helps Google prioritize which pages should be re-crawled first.

The key word there is meaningfully. Auto-updating last modified dates every time a page is touched—without real content changes—can hurt trust. If nothing substantial changed, the date shouldn’t either.

Priority and Frequency (When to Ignore Them)

Many sitemap generators still include priority and change frequency fields, but in practice, Google mostly ignores them today.

Instead of micromanaging numbers that don’t move the needle, focus on what actually matters: clean structure, accurate URLs, and a sitemap that reflects your real site hierarchy. That clarity does far more for SEO than tweaking priority values ever will.

Sitemap Size, Structure, and Organization

URL Limits and File Size Rules

Each sitemap file is limited to 50,000 URLs or 50MB, whichever comes first. If your site grows beyond that, the solution isn’t cramming more in—it’s splitting your sitemap.

Large sites should break sitemaps into logical chunks and reference them through a sitemap index. This keeps files manageable and easier for search engines to process.

Using Multiple Sitemaps

Many sites benefit from having more than one sitemap, such as:

  • A main sitemap index
  • A dedicated blog sitemap
  • A product or location sitemap
  • Image or video sitemaps, when rich media is a ranking focus

This approach improves organization and makes it easier to manage growth without bloating a single file.

Best Practices for a Clean, Effective Sitemap

A strong sitemap follows a few simple rules:

  • Include only index-worthy, canonical URLs
  • Keep it auto-updated through your CMS or SEO plugin
  • Make sure sitemap URLs align with your internal linking strategy
  • Validate regularly in Google Search Console
  • Reference the sitemap in your robots.txt file

Think of your sitemap as a curated list of your best pages—not a mirror of everything your CMS can produce.

Common Sitemap Mistakes That Hurt SEO

Some of the most common sitemap issues come from over-inclusion:

  • Treating the sitemap like a full site map of every URL
  • Including thin, duplicate, or archive pages
  • Forgetting to remove old, deleted, or redirected URLs
  • Letting plugins auto-include junk pages by default

Left unchecked, these mistakes dilute crawl focus and reduce indexing efficiency.

How to Audit Your Sitemap

Auditing your sitemap doesn’t need to be complicated:

  • Compare sitemap URLs with indexed pages in Google Search Console
  • Crawl the sitemap using Screaming Frog or a similar tool
  • Flag URLs that shouldn’t be indexed and remove or exclude them

A quick audit can often uncover issues that quietly hold back performance.

Final Takeaway: Less Is More

Sitemaps work best when they’re focused and intentional—not bloated. A clean sitemap improves crawl efficiency, indexing accuracy, and long-term SEO health.

Make sitemap reviews part of your regular SEO maintenance, especially after content changes or site updates. Keeping it clean is one of the simplest technical SEO wins available.

Frequently Asked Questions About Sitemap Content

Should category pages be included in a sitemap?

Only if they are intentionally optimized, provide real value, and are meant to rank. Most category pages should be excluded.

Do noindex pages belong in a sitemap?

No. Including noindexed URLs sends mixed signals and wastes crawl resources.

Should author pages be indexed or added to sitemaps?

In most cases, no. Author archives typically create thin or duplicate content.

How often should a sitemap be reviewed?

Any time major content is added, removed, or restructured—and at least periodically as part of routine SEO maintenance.

Is it bad to have too many URLs in a sitemap?

Yes. Bloated sitemaps dilute crawl focus and reduce indexing efficiency.

Share This Blog Post:

Follow Our Socials:

Contact Us


Recent Blog Posts

Google Reviews

Our Core Services

Trust The Envision Process

Like What You See?

Book a free consultation with us!

Need Something? We Got You.

Let's have a chat