Canonical URLs and How Agents Handle Duplicate Content
The same page can often be reached through multiple URLs. With and without www. Over HTTP and HTTPS. With tracking parameters, session IDs, or sort order in the query string. Each URL looks like a different page, but the content is identical. Canonical tags exist to solve this, and agents rely on them heavily.
The duplicate content problem
Consider a product page accessible at all of these URLs:
https://example.com/shoes/blue-trainers
https://www.example.com/shoes/blue-trainers
http://example.com/shoes/blue-trainers
https://example.com/shoes/blue-trainers?ref=homepage
https://example.com/shoes/blue-trainers?utm_source=newsletter
https://example.com/shoes/blue-trainers?sort=price
Without a canonical tag, an agent might visit all six URLs, parse all six pages, and end up with six copies of the same information. That wastes the agent's time and your server's resources. It also creates confusion: if the agent is building a product comparison, does it list the same shoe six times?
How canonical tags work
The canonical tag tells agents (and search engines) which URL is the authoritative version:
<link rel="canonical" href="https://example.com/shoes/blue-trainers" />
Every version of the page includes this tag, pointing to the same canonical URL. When an agent encounters a page with a canonical tag pointing elsewhere, it knows to treat the canonical URL as the source of truth and ignore the variant.
How agents use canonical URLs
Agents use canonical tags in several ways:
Deduplication during crawling. When an agent discovers multiple URLs for the same content (through sitemaps, internal links, or external references), it checks canonical tags to consolidate them. Instead of processing five versions of the same page, it processes one.
Citation accuracy. When an agent references or links back to your content, it uses the canonical URL. This means users following those links arrive at a clean, parameter-free URL rather than one loaded with tracking codes.
Crawl efficiency. Agents have time and resource budgets. Every duplicate page they process is a page of unique content they did not get to. Canonical tags help agents spend their budget on content that matters.
Common canonical tag mistakes
Self-referencing canonicals that are wrong. Every page should have a canonical tag, even if it points to itself. But make sure the URL in the canonical tag is the exact URL you want agents to use, including the correct protocol, domain, and path.
<!-- Wrong: HTTP instead of HTTPS -->
<link rel="canonical" href="http://example.com/shoes/blue-trainers" />
<!-- Right: matches the preferred URL exactly -->
<link rel="canonical" href="https://example.com/shoes/blue-trainers" />
Missing canonicals on paginated content. If your product category page has 10 pages of results, each page should have its own canonical URL. Page 2 should not canonicalise to page 1, or agents will think pages 2 through 10 are duplicates and skip them entirely.
<!-- Page 2 of results: canonical points to itself, not page 1 -->
<link rel="canonical" href="https://example.com/shoes?page=2" />
Canonicals pointing to non-existent pages. If you delete or move a page but forget to update the canonical tags on related pages, agents follow the canonical URL and get a 404. This is worse than no canonical at all.
Multiple canonical tags on the same page. Some CMS configurations or template layering can produce two <link rel="canonical"> tags with different URLs. Agents cannot resolve this conflict predictably. Most will use the first one, but you should not rely on that.
Canonical URLs and hreflang
If your site has international versions with hreflang tags, canonical URLs and hreflang need to work together. The canonical tag on each language version should point to itself (not to the English version), and each hreflang tag should point to the canonical URL of that language variant.
Checking your canonicals
Search for canonical tag issues across your site by looking at your sitemap and spot-checking pages. Every URL in your sitemap should have a canonical tag that matches the URL in the sitemap. If the sitemap says https://example.com/shoes/blue-trainers but the page's canonical tag says https://www.example.com/shoes/blue-trainers, there is a conflict that agents will notice.
Canonical tags are a small addition to your page templates, but they save agents significant effort. Get them right and agents can focus on reading your unique content instead of deduplicating your URL structure.