AI Agent Costs Are Rising Exponentially — Here's the Actual Math and What a Solo Dev Should Do
AI Agent Costs Are Rising Exponentially — Here's the Actual Math and What a Solo Dev Should Do
Toby Ord's "Are the Costs of AI Agents Also Rising Exponentially?" hit the Hacker News front page on April 18 with 295 points. The core finding is uncomfortable and worth sitting with.
Per-token API costs are dropping roughly 10x a year. LLM inference is one of the fastest-deflating commodities in the history of software. At the same time, the length and complexity of tasks that agents are being asked to perform is growing exponentially. Over the past seven years, the tasks a capable AI agent can handle have grown from "a few seconds of human work" to "a few hours of human work."
Those two curves interact in a way most solo operators have not priced in. The per-token cost is going down. The per-successful-outcome cost can still be going up, sharply, if task length and retries scale faster than per-token deflation. That is the paper's core point, and it matches what actually shows up on my API invoice.
If you have been told "Claude Code costs pennies per commit" and you are running longer, more autonomous agent tasks as the tools get better, this is the post that catches you up on why the bill is surprising you.
The Core Math, Plainly
Imagine an agent task that takes n steps to complete successfully. Each step has a probability p of succeeding and a cost c in tokens. Simple cases: n is small, p is near 1, and the total cost is roughly n times c.
As tasks get more complex, n gets bigger and p drops. Both directions hurt. If an agent needs to retry failed steps, or replan, or verify its own work, the effective cost per successful task is not linear in n. It compounds. If the agent gets halfway through and has to back up, it is not just paying for the forward work. It is paying for the verification, the replan, the new plan's context, and often a full re-execution of earlier steps.
Ord's framing — if agent cost per successful outcome scales exponentially with task length while human cost scales linearly, there is a sharp viability boundary — is slightly more abstract than this, but the shape is the same. There is a point where paying a human is cheaper than running the agent, and that point moves around as both costs change.
The deflationary force fighting the exponential is real. Per-token prices have dropped 10x per year for several years running. That is a lot of headroom. But it does not automatically offset a task-length curve that is also compounding.
What This Looks Like On My Actual Bill
A short, well-scoped agent task — "write a failing test for this function, then make it pass" — costs me a few cents on Claude Code. It runs in under two minutes. The token budget is bounded. The success rate is high. This is the regime where "pennies per commit" is basically true.
A medium-complexity agent task — "implement this small feature across four files, write tests, run them, fix failures" — costs me between a dollar and three dollars. It runs for 8 to 15 minutes. The success rate is still good but the retry rate is meaningful, and the planning tokens add up.
A long, open-ended agent task — "here is a feature request, investigate the codebase, propose an approach, implement it, handle edge cases, write tests, update docs" — has cost me anywhere from $6 to $40. It runs for 30 minutes to two hours. When it fails, it fails expensively. When it succeeds, it is genuinely impressive. The cost variance is wild.
The naive prediction was that per-token deflation would keep costs flat or falling over time. The actual pattern is that capability gains pull me toward longer, harder tasks, and those tasks grow cost faster than per-token prices shrink it. My monthly Claude Code bill is higher than it was six months ago despite per-token prices being lower.
This is the tension the paper is describing, in concrete dollars.
The Practical Rule Nobody Writes Down
If a single agent run takes more than about ten minutes of wall-clock time, you should budget for more than 10x the cost of a one-minute run. Verification and retries compound non-linearly.
This is the rule I wish someone had told me six months ago. I have broken it a dozen times. Every time I was surprised by the invoice.
The corollary: the cost ceiling for "one autonomous run" is higher than you think. Setting a hard spending cap per agent run — either via the API's budget controls or via a wrapper script — is not paranoia. It is the boring ops work that keeps a solo dev's bill from going feral on a bad day.
I run a hard cap of $5 per individual agent run on most of my workflows. On specific pipelines where I expect long runs, I bump it to $15 with an alert. Anything that would exceed those numbers gets a manual approval prompt. I would rather get paged at 2 AM than wake up to a $400 agent spiral.
The Optimization Everyone Underestimates
Explicit phase boundaries. Not one giant agent run. A sequence of short agent runs, each with a defined input, a defined output, and a human checkpoint between them.
This is the pattern the AGENTS.md / /plan → /implement → /review workflow enforces by convention. It is not magic. It is what happens when you convert a long, open-ended agent task into a pipeline of short, scoped agent tasks with explicit success criteria between stages.
The math is straightforward: a pipeline of five stages, each averaging two minutes, with a 95% success rate per stage, is wildly cheaper and more reliable than one fifteen-minute monolithic run with a 70% success rate. Even though the total token count can be similar, the retry-compounding cost drops sharply because each stage's failure only costs you that stage, not the whole thing.
This is why the "one big prompt" workflow is where bills go to die, and why the more structured harnesses like Claude Code plus an AGENTS.md file, or Factory's droid model with its phase separation, or a well-written /plan → /implement → /review flow, are quietly saving people money. Not because they use fewer tokens. Because they fail cheaper.
The Counterweight the Paper Understates
The 10x annual decline in per-token cost is a big deal and it is not slowing down. Inference is a commodity, the market is competitive, and the hardware keeps improving. The gap between "what a task cost in 2025" and "what the same task will cost in 2027" is enormous, even if task complexity grows at the same time.
The practical read: some tasks that are economically borderline today will be trivially cheap in eighteen months. You do not need to avoid agent workflows that are a little expensive right now if the value is there. You need to be careful about the compound curves in the specific tasks where the cost runs wild today.
My Actual Setup
A hard per-run cost cap on individual agent invocations, as described above.
An alert on daily total API spend, set at about 2x my baseline. If I trip it, I want to know why before I trip it again.
A sharp preference for phase-separated agent workflows over monolithic autonomous runs. I almost never ask a single Claude Code invocation to do an open-ended task end to end anymore. I break it into planning, implementation, and review, and I pay a human-in-the-loop tax between stages.
Two workflows I stopped automating after the math stopped working. One was a content-generation pipeline where the agent's tendency to replan endlessly drove the per-output cost past what I could charge for the content. The other was a codebase-wide refactor pipeline where the agent's behavior on large repositories was unpredictable enough that the retry cost ate all the savings. In both cases, going back to a shorter, more manual workflow was the right answer. Not everything should be autonomous.
The Honest Middle
Most coverage of AI agent economics lands in one of two camps. "Per-token costs are collapsing, agents will be basically free soon." Or: "Agent costs are exponential, the economics of autonomous work are doomed." Both takes are partly right and mostly wrong.
The solo operator read is the honest middle. Per-token cost drops help you, per-task cost does not automatically help you, and the gap between the two is where your bill surprises live. Structuring agent runs so you stay on the right side of that curve is the real work.
Concretely: short scopes, explicit phases, hard spending caps, and a healthy skepticism of the "one big autonomous run" fantasy that the agent-demo marketing keeps selling.
The math is not scary if you respect it. It is scary if you do not.