Agents
Agentic AI goes mainstream: why 4-in-10 companies now run agents in production
In 2025 “AI agents” was a slide. In 2026 it is a budget line. Gartner predicts 40% of enterprise applications will feature integrated task-specific agents by the end of 2026 — up from under 5% in 2025. Here is what changed, what is actually working, and where teams are still getting burned.
What changed in twelve months
Three things lined up at the same time. First, frontier models got reliably good at multi-step planning. Second, the tooling around tool use, structured output, and long context matured. Third, the economics flipped: a competent agent run that cost $4 a year ago now costs cents, because models got cheaper, caching landed everywhere, and the smaller open-weights models are good enough for the routine half of any agent loop.
The result is that the question stopped being “can an agent do this?” and became “is the agent cheaper, faster, or more accurate than what we do today?” That is a procurement question, not a research one.
What “agentic” actually means now
The word has stretched. Three different things sit underneath it, and conflating them is how projects fail.
- Workflows with LLM steps. A pipeline where one or more steps call a model. The control flow is hand-coded. This is what most “agents in production” actually are. They work because the human picked the path.
- Tool-using agents. A model picks tools and arguments inside a bounded loop. Good for structured tasks like “fill in this form from this PDF,” “triage this ticket,” or “run a deterministic sequence of API calls with judgment in between.”
- Open-ended agents. A model chooses what to do next with broad freedom. Browser agents, autonomous coding agents, and computer-use agents live here. These are the demos. They are also the ones that fail in long horizons.
The shift from one big agent to a team of small ones
Through 2025, the dominant pattern was a single all-purpose agent with a long tool list. That pattern is being replaced. The 2026 winning architecture is a small orchestrator that routes work to specialized sub-agents — one for retrieval, one for code, one for browser, one for verification — each with a tight tool set and a narrow prompt. Some practitioners are calling it the microservices revolution for agents.
The reason is not theoretical. Single mega-agents fail because every additional tool widens the action space the model has to reason over. A specialist with three tools is sharper than a generalist with thirty.
Where teams are getting real ROI
- Customer support triage. Routing, drafting, and summarizing — with a human approving sends. 30–60% deflection of tier-one volume is the reported range.
- Internal knowledge agents. Search, summarize, and answer over the company wiki, code, and tickets. The boring win, but the most consistent.
- Coding agents on bounded tasks. Bug fixes with tests, dependency upgrades, lint sweeps, and routine PR review. Less impressive than “autonomous engineer” but more reliable.
- Sales-ops automation. Lead enrichment, meeting notes, CRM hygiene. Easy to scope, easy to measure.
Where they are still failing
The visible failures cluster in three places. Long-horizon open-ended agents that drift after twenty steps. Multi-agent systems where two agents argue with each other and burn tokens. And anything that touches money, permissions, or production data without a human in the loop.
The teams that ship cleanly tend to do four things: bound the tool list, force structured output, write evals before they write the prompt, and put a human between the agent and any irreversible action.
Cost is now an architectural concern
A year ago, agent cost was a footnote. In 2026 it is a first-class design constraint. The pattern that scales: route easy turns to a small open-weights model, escalate to a frontier model only when confidence drops, and cache aggressively at the prompt and embedding layers. Teams that retrofit cost controls after launch tend to find the agent is 5–10x more expensive than the workflow it replaced. Teams that design for cost from day one usually land at a fraction of the human-process cost.
The new security surface
Every agent with credentials is now an identity that needs the same controls as a human employee — least privilege, audit logs, scoped tokens, kill switches. “Double agent” risk, where a compromised agent acts as an insider, is the boardroom worry of the year. Practical hygiene: separate identities per agent, no shared service accounts, every tool call logged with the prompt and decision that produced it.