What the Next Generation of Web Agents Will Look Like
Current web agents are capable but limited. They can fill in forms, click through multi-step flows, and extract information from pages. They struggle with complex tasks, fail on unusual interfaces, and cannot handle many real-world complications like payment processing or identity verification.
The next generation is being built right now. Based on what is happening in research labs, open-source projects, and production systems, here is where things are heading.
Faster Reasoning with Smaller Models
Today's web agents typically use large models like GPT-4 or Claude for every decision. That means each action, every click, every form field, requires a round trip to a cloud API that takes one to five seconds.
The shift toward smaller, specialised models will change this. Instead of a single large model handling everything, agents will use different models for different sub-tasks. A fast, small model handles routine actions like clicking obvious buttons or scrolling. A larger model gets called only for complex decisions like choosing between ambiguous options or recovering from errors.
This is already happening. Some agent frameworks use a lightweight model for page parsing and element identification, then escalate to a more capable model for planning and decision-making. The result is significantly faster task completion at lower cost.
Locally-running models are part of this picture too. A small model running on the user's machine can handle basic page analysis with zero latency and zero API cost, calling a cloud model only when local processing is not sufficient.
Better Planning and Decomposition
Current agents are mostly reactive. They look at the current page and decide what to do next. They do not plan ahead effectively. If booking a flight requires six steps, the agent discovers each step as it encounters it rather than planning the full sequence upfront.
Newer architectures separate planning from execution. A planning module analyses the task, breaks it into sub-goals, and creates an execution plan before the agent touches the browser. The execution module follows the plan, and the planning module adjusts if something unexpected happens.
This matters because planning reduces errors. An agent that knows it needs to select dates before searching will not accidentally trigger a search without dates and then have to recover. An agent that plans to compare three sites before deciding will not commit to the first result it sees.
Research on agent planning (like what is coming out of work on ReAct, Reflexion, and tree-of-thought approaches) is feeding directly into production agent frameworks. The gap between research capability and production capability is shrinking fast.
Persistent Personalisation
Current agents start fresh with each task, aside from basic session memory. Future agents will build persistent models of user preferences and behaviour.
An agent that has booked flights for you ten times will know you prefer window seats, that you always pick the earliest morning flight, and that you are willing to pay up to £50 more for a direct route. It will not ask every time. It will just book what you want, and ask only when something is genuinely ambiguous.
This personalisation extends to site-specific knowledge. The agent will remember that website A has a confusing checkout flow with a hidden promo code field, that website B tends to show higher prices if you visit from a cached session, and that website C gives better results if you search from the mobile version.
Multi-Agent Coordination
Single-agent architectures will give way to multi-agent systems for complex tasks. Instead of one agent doing everything, specialised agents will coordinate.
A travel planning task might involve a flight agent, a hotel agent, an itinerary agent, and a budget agent. Each specialises in its domain. The flight agent knows how to search every major booking site efficiently. The hotel agent understands location data and review analysis. The budget agent tracks spending constraints across all bookings and flags when a plan is getting too expensive.
The coordination layer is the hard part, deciding which agent to call when, how to pass context between agents, and how to resolve conflicts when agents disagree. MCP and similar protocols are building the infrastructure for this, but the orchestration logic is still an active area of development.
Predictive Pre-fetching
Current agents are sequential: load page, observe, decide, act, repeat. Future agents will pre-fetch likely next steps in parallel.
If an agent is searching for flights and knows it will need to check prices on three different sites, it can start loading all three sites simultaneously rather than visiting them one at a time. If it is filling out a multi-page form, it can pre-load the next page while the user reviews the current one.
This is technically possible with current browser automation tools. Playwright's multi-context support already allows parallel browsing. What is new is the intelligence layer that predicts which pages will be needed and starts loading them proactively.
Better Error Recovery
Current agents often get stuck in error loops. They click a button, get an unexpected result, try the same thing again, and repeat. Or they encounter an error state and do not know how to get back to a productive path.
Next-generation agents will handle errors more like experienced humans. They will recognise common error patterns (session timeout, form validation failure, out-of-stock product), have pre-built recovery strategies for each, and know when to abandon a failing approach and try an alternative.
This is partly a model capability improvement and partly a framework improvement. Agent frameworks are building error taxonomies and recovery playbooks that help the model make better decisions when things go wrong.
What This Means for Websites
The websites that will work best with next-generation agents share several qualities.
Progressive disclosure works well with planning agents. A site that shows a clear multi-step process (Step 1: Choose departure, Step 2: Choose dates, Step 3: Select flight) is easier for an agent to plan around than a site that dumps everything on one page.
Consistent API behaviour supports pre-fetching and caching. If the same query returns the same results within a reasonable time window, agents can cache responses and reduce redundant requests.
Clear error states feed directly into error recovery. An error message that says "This flight is no longer available at this price" is actionable. A generic "Error occurred" is not.
Structured data becomes even more important as agents get smarter. A well-structured page that takes an agent two seconds to parse will always beat a poorly structured page that takes thirty seconds of visual analysis by multi-modal agents.
The web is not going to stop being a visual medium designed for humans. But it is rapidly becoming a dual-purpose medium that serves both human and machine audiences. Sites that recognise this and run an audit to understand their readiness will have a real advantage as agent capabilities continue to improve.