Skip to content
Back to Blog
TechnicalHTMLWeb Standards

How to structure your HTML so AI agents can parse it

Agent Checker4 min read

AI agents parse HTML differently from browsers. A browser is forgiving; it will render almost anything. An agent needs to extract meaning, identify interactive elements, and understand the page hierarchy, much like how LLMs interpret page layouts. Messy markup makes all of that harder.

Semantic elements are not optional

The most important thing you can do is use the right HTML elements for the job. Agents rely heavily on element semantics to understand what they are looking at.

<!-- Bad: everything is a div -->
<div class="nav">
  <div class="nav-item">Home</div>
  <div class="nav-item">Products</div>
</div>
<div class="main-content">
  <div class="article-title">How to Buy Widgets</div>
  <div class="article-body">...</div>
</div>

<!-- Good: semantic elements tell agents what is what -->
<nav aria-label="Main navigation">
  <ul>
    <li><a href="/">Home</a></li>
    <li><a href="/products">Products</a></li>
  </ul>
</nav>
<main>
  <article>
    <h1>How to Buy Widgets</h1>
    <p>...</p>
  </article>
</main>

The second version tells an agent: here is navigation, here is the main content, here is an article with a heading. The first version is just boxes inside boxes.

Heading hierarchy matters

Agents use heading levels to build a mental outline of the page. Skipping levels or using headings for styling breaks this.

<!-- Bad: heading levels are decorative -->
<h1>Our Shop</h1>
<h4>Latest Arrivals</h4>  <!-- jumped from h1 to h4 -->
<h2>Footer Links</h2>     <!-- h2 in footer makes no sense -->

<!-- Good: logical outline -->
<h1>Our Shop</h1>
<h2>Latest Arrivals</h2>
  <h3>Electronics</h3>
  <h3>Clothing</h3>
<h2>Popular Categories</h2>

An agent processing the good version can tell there are two main sections, with Electronics and Clothing as subsections under Latest Arrivals.

Landmark elements create a map

HTML5 landmarks (<header>, <nav>, <main>, <aside>, <footer>) act as a map for agents. They answer the question "where am I on this page?" without requiring the agent to analyse visual layout.

A well-landmarked page looks like this:

<body>
  <header>
    <nav aria-label="Primary">...</nav>
  </header>
  <main>
    <article>...</article>
    <aside aria-label="Related products">...</aside>
  </main>
  <footer>
    <nav aria-label="Footer">...</nav>
  </footer>
</body>

An agent can jump straight to <main> to find the primary content, check <nav> for site structure, and ignore <aside> if it is not relevant to the task.

Tables need proper markup

If your site displays tabular data, proper table markup helps agents read it correctly. The difference is significant:

<!-- Bad: visual table using divs and CSS grid -->
<div class="table">
  <div class="row">
    <div class="cell">Name</div>
    <div class="cell">Price</div>
  </div>
  <div class="row">
    <div class="cell">Widget A</div>
    <div class="cell">12.99</div>
  </div>
</div>

<!-- Good: actual table with headers -->
<table>
  <thead>
    <tr>
      <th scope="col">Name</th>
      <th scope="col">Price</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Widget A</td>
      <td>12.99</td>
    </tr>
  </tbody>
</table>

With the proper table, an agent knows that "12.99" is the price for "Widget A". With the div version, it has to guess based on position.

Links need descriptive text

Agents follow links based on their text content. "Click here" and "Read more" tell an agent nothing about the destination.

<!-- Bad -->
<a href="/returns">Click here</a> for our returns policy.

<!-- Good -->
Read our <a href="/returns">returns policy</a> for details.

When agents build a list of available actions on a page, descriptive link text becomes the label. "Click here" repeated six times is useless.

Data attributes for agent hints

Sometimes semantic HTML is not enough. You can add data-* attributes to give agents extra context:

<div data-component="product-card" data-product-id="12345">
  <h3 data-field="product-name">Wireless Mouse</h3>
  <span data-field="price" data-currency="GBP">24.99</span>
  <button data-action="add-to-cart">Add to basket</button>
</div>

These attributes are ignored by browsers but give agents a clear, parseable structure they can work with programmatically.

Quick wins to implement today

Start with these changes, roughly in priority order:

  1. Replace <div> wrappers with <header>, <nav>, <main>, <footer> where appropriate
  2. Fix your heading hierarchy so it forms a logical outline
  3. Add aria-label to distinguish multiple <nav> elements
  4. Replace "Click here" and "Read more" links with descriptive text
  5. Use proper <table> markup for tabular data
  6. Add <label> elements to every form input

None of these changes affect visual appearance. They all improve agent parsing. Clean, semantic HTML is the cheapest, most effective improvement you can make for AI agent compatibility. You can run an audit to see how your site scores today.