Weekly Briefing

Weekly AI roundup — May 4, 2026

The week in AI in five minutes. Frontier models converged, the open-weights field caught up, the EU deadline got closer, and there is one thing worth trying before next Monday.

1. The frontier converged

GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro now sit within four points of each other on every composite benchmark. No single model is obviously best anymore. The practical answer for most teams is a router across two providers, not a brand bet. Full breakdown →

2. Chinese open-weights kept climbing

DeepSeek V4, GLM-5.1, Kimi K2.6, and MiniMax M2.7 now hit the agentic-engineering ceiling at less than a third of frontier inference cost. Self-hosting is no longer a quality downgrade for a large share of workloads. Why it matters →

3. The EU AI Act clock is ticking

August 2 is the high-risk enforcement date. Penalties reach €15M or 3% of global turnover. The Act applies to US companies whose AI affects EU users — extraterritorial scope is real. 90-day checklist →

4. Agents in production are scaling fast

The shift from chat to action is no longer hypothetical. Gartner predicts 40% of enterprise applications will feature task-specific agents by the end of 2026. The teams shipping cleanly are the ones building small specialist agents behind a router, not big all-purpose ones. What is working →

5. The coding-agent landscape stabilized

Cursor for flow, Claude Code for hard tasks, Windsurf for value. Most senior engineers now use two tools, not one. Honest comparison →

The one thing to try this week

Set up a model router. Even a 30-line script that picks between two providers based on task type is enough to start. Send 5–10% of your traffic to a second model for one week and look at the disagreements. You will find a workload where the cheaper or smaller model is at least as good. That single change has been the highest-leverage move we have watched teams make in 2026.