How to structure your HTML so AI agents can parse it
AI agents parse HTML differently from browsers. A browser is forgiving; it will render almost anything. An agent needs to extract meaning, identify interactive elements, and understand the page hierarchy, much like how LLMs interpret page layouts. Messy markup makes all of that harder.
Semantic elements are not optional
The most important thing you can do is use the right HTML elements for the job. Agents rely heavily on element semantics to understand what they are looking at.
<!-- Bad: everything is a div -->
<div class="nav">
<div class="nav-item">Home</div>
<div class="nav-item">Products</div>
</div>
<div class="main-content">
<div class="article-title">How to Buy Widgets</div>
<div class="article-body">...</div>
</div>
<!-- Good: semantic elements tell agents what is what -->
<nav aria-label="Main navigation">
<ul>
<li><a href="/">Home</a></li>
<li><a href="/products">Products</a></li>
</ul>
</nav>
<main>
<article>
<h1>How to Buy Widgets</h1>
<p>...</p>
</article>
</main>
The second version tells an agent: here is navigation, here is the main content, here is an article with a heading. The first version is just boxes inside boxes.
Heading hierarchy matters
Agents use heading levels to build a mental outline of the page. Skipping levels or using headings for styling breaks this.
<!-- Bad: heading levels are decorative -->
<h1>Our Shop</h1>
<h4>Latest Arrivals</h4> <!-- jumped from h1 to h4 -->
<h2>Footer Links</h2> <!-- h2 in footer makes no sense -->
<!-- Good: logical outline -->
<h1>Our Shop</h1>
<h2>Latest Arrivals</h2>
<h3>Electronics</h3>
<h3>Clothing</h3>
<h2>Popular Categories</h2>
An agent processing the good version can tell there are two main sections, with Electronics and Clothing as subsections under Latest Arrivals.
Landmark elements create a map
HTML5 landmarks (<header>, <nav>, <main>, <aside>, <footer>) act as a map for agents. They answer the question "where am I on this page?" without requiring the agent to analyse visual layout.
A well-landmarked page looks like this:
<body>
<header>
<nav aria-label="Primary">...</nav>
</header>
<main>
<article>...</article>
<aside aria-label="Related products">...</aside>
</main>
<footer>
<nav aria-label="Footer">...</nav>
</footer>
</body>
An agent can jump straight to <main> to find the primary content, check <nav> for site structure, and ignore <aside> if it is not relevant to the task.
Tables need proper markup
If your site displays tabular data, proper table markup helps agents read it correctly. The difference is significant:
<!-- Bad: visual table using divs and CSS grid -->
<div class="table">
<div class="row">
<div class="cell">Name</div>
<div class="cell">Price</div>
</div>
<div class="row">
<div class="cell">Widget A</div>
<div class="cell">12.99</div>
</div>
</div>
<!-- Good: actual table with headers -->
<table>
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Price</th>
</tr>
</thead>
<tbody>
<tr>
<td>Widget A</td>
<td>12.99</td>
</tr>
</tbody>
</table>
With the proper table, an agent knows that "12.99" is the price for "Widget A". With the div version, it has to guess based on position.
Links need descriptive text
Agents follow links based on their text content. "Click here" and "Read more" tell an agent nothing about the destination.
<!-- Bad -->
<a href="/returns">Click here</a> for our returns policy.
<!-- Good -->
Read our <a href="/returns">returns policy</a> for details.
When agents build a list of available actions on a page, descriptive link text becomes the label. "Click here" repeated six times is useless.
Data attributes for agent hints
Sometimes semantic HTML is not enough. You can add data-* attributes to give agents extra context:
<div data-component="product-card" data-product-id="12345">
<h3 data-field="product-name">Wireless Mouse</h3>
<span data-field="price" data-currency="GBP">24.99</span>
<button data-action="add-to-cart">Add to basket</button>
</div>
These attributes are ignored by browsers but give agents a clear, parseable structure they can work with programmatically.
Quick wins to implement today
Start with these changes, roughly in priority order:
- Replace
<div>wrappers with<header>,<nav>,<main>,<footer>where appropriate - Fix your heading hierarchy so it forms a logical outline
- Add
aria-labelto distinguish multiple<nav>elements - Replace "Click here" and "Read more" links with descriptive text
- Use proper
<table>markup for tabular data - Add
<label>elements to every form input
None of these changes affect visual appearance. They all improve agent parsing. Clean, semantic HTML is the cheapest, most effective improvement you can make for AI agent compatibility. You can run an audit to see how your site scores today.