Uber Blew Through Its AI Budget in 3 Months — And Every Solo Dev Should Read the Fine Print
Uber Blew Through Its AI Budget in 3 Months — And Every Solo Dev Should Read the Fine Print
Uber's CTO, Praveen Neppalli Naga, said the quiet part out loud this week: the company's 2026 AI budget is already blown, and it's only April. Uber spent $3.4 billion on R&D this year, pushed every engineer onto Claude Code and Cursor, ranked them on internal leaderboards by AI-tool usage, and watched the token meter run faster than finance could model. They're now, in his words, "back to the drawing board."
The first reaction to a story like this is usually some version of "big tech has more money than sense." That reaction is wrong, or at least insufficient. The numbers behind Uber's overspend are structurally the same as the numbers behind your personal Claude Code bill — just scaled. If you've been noticing your AI spend creeping up while telling yourself "but per-token prices keep dropping," this is the post.
What Actually Happened at Uber
The adoption numbers are the part that doesn't usually land. Around 95% of Uber engineers use AI coding tools every month. Roughly 70% of committed code now comes from them. About 11% of live backend updates — the stuff that actually runs in production and routes rides — is written by agents. That is a serious penetration rate, and it did not happen slowly. It happened over the past six months, after Uber explicitly encouraged and measured usage.
Claude Code is the dominant tool internally. Cursor is there too but has plateaued. Uber is now piloting OpenAI's Codex as a second vendor, partly for the capability mix and partly for negotiating leverage.
The overspend is not about wastefulness. It's about a pricing model that punishes the success of the rollout.
The Pricing Model Nobody Modeled Correctly
Anthropic's enterprise Claude Code product is hybrid pricing: per-seat fees plus required usage commitments. Companies pre-commit to a token volume and pay for that capacity whether they use it or not. In theory, that's a hedge for both sides. In practice, you have to forecast token consumption accurately, and nobody is good at that yet.
Here's why nobody is good at it:
Per-seat pricing assumes usage is roughly constant per seat. One developer, some amount of coding, some amount of AI tokens. That was fair for autocomplete-style tools. It is not fair for agents.
An agent running for 45 minutes on a hard task can burn through ten times the tokens of the same developer doing thirty small autocompletes. The seat didn't change. The workload did. And as the agents get better — longer context, more autonomy, more verification steps — per-seat token consumption keeps going up. Uber sized its commitment on Q4 2025 usage patterns. By March, tool capability had pushed actual usage well past the forecast.
There's a second dynamic. When your finance team knows you've pre-committed to a token volume, the internal pressure is to use that volume, because paying for unused capacity looks bad. So teams that might have rationed usage instead pushed harder. Usage-commit pricing creates its own consumption ramp.
The Solo Dev Version of This Trap
You are not pre-committing to token volumes. You are on a Max plan, or paying per-call against the API. So this story is about someone else's problem, right?
It is not. The underlying mechanism — "agent capability grows faster than per-token deflation, so your effective cost per month goes up" — applies exactly as much to a one-person operation. I can point to my own invoices.
My Claude Code bill six months ago, running mostly autocomplete and short agent loops, averaged around $85 a month. My bill this month, running Cowork sessions, MCP servers, longer agentic tasks, and background jobs via routines, is tracking past $240. Per-token prices have come down in that window. My usage pattern shifted because I trust the tool to do more.
On the Max plan specifically: Anthropic quietly tightened rate limits in April. The "unlimited" tier is still unlimited in the marketing copy, but there are soft caps that kick in during long agentic runs, and you will notice them if you're doing serious Claude Code work. The pricing page has not changed. The effective ceiling has.
What I Actually Do About It
Four things, in order of impact.
Treat AI spend like AWS spend. I have a spreadsheet that tags every MCP and every project by which model tier it uses. Not "Claude" — specifically "Haiku 4.5 for this loop, Sonnet 4.6 for these, Opus 4.7 only for planning." The reason this matters is that long-running loops, the kind that do 20 iterations of small tool calls, should never be on Opus. Putting Haiku in those loops roughly 10x'd my cost efficiency overnight.
Cap per-project budgets. For each project I have a rough monthly ceiling. If it trips, I get a notification (a Claude Code routine that reads my usage dashboard), and I switch that project to a cheaper tier or pause agent runs for the rest of the month. This sounds excessive for a solo operation. It prevents the quiet drift that Uber walked into.
Keep a fallback model. Not every task needs Claude. A lot of routine work — linting, short code transforms, boilerplate — is fine on whatever is cheapest. I keep Gemini 3.1 Pro and Haiku warm and route to them when the task doesn't need Claude's specific strengths. Vendor diversification is a cost strategy, not just a resilience strategy.
Verify the "unlimited" claim before you build on it. If your workflow depends on running agents all day, test it at full tilt for a week before committing. Find the soft caps. Know them. Build your pipeline assuming they exist.
The Bigger Lesson
The Uber story is not "AI is too expensive." Uber's engineers are writing 70% of code with AI; the productivity math almost certainly pencils even at the overspend. The story is that pricing models built for an autocomplete era do not survive an agent era, and the gap between "per-token cost is dropping" and "my monthly bill is rising" is where solo operators get surprised.
The advantage a solo operator has over Uber here is real: I can switch vendors in an afternoon. I am not locked into a two-year enterprise agreement. If Anthropic's pricing gets genuinely unfavorable, I am a config change away from routing through a different provider. That flexibility is the thing to protect.
The enterprise AI market is going to re-price in the next twelve months — usage-based commits are going to fragment, per-outcome pricing experiments will start, and we'll probably see some form of "agent-hour" SKU. Until that shakes out, assume your monthly spend keeps drifting up while per-token prices fall, plan for it, and instrument for it.
Uber had to go back to the drawing board with a $3.4 billion R&D budget. You have a Notion doc and a spreadsheet. Use them.