· 5 min read

AI Got Cheap Fast — My API Costs Dropped 90% in 12 Months

AI Got Cheap Fast — My API Costs Dropped 90% in 12 Months

A year ago, if you wanted to build an AI-powered feature into your SaaS, the math was rough. Frontier models charged enough per token that anything beyond a demo would eat your margins alive. Most indie projects that used AI were either losing money on every API call or restricting usage so aggressively that the feature felt useless.

That world is gone. In April 2026, you can get frontier-adjacent AI performance for a fraction of what you paid in 2025. And it changes what's possible for solo builders in a very concrete way.

The Numbers

Let's look at where pricing actually landed across the big three:

Google Gemini Flash-Lite comes in at $0.25 per million input tokens. That's not a typo. A quarter per million tokens. For context, a million tokens is roughly 750,000 words — more than the entire Lord of the Rings trilogy. You can process that for the cost of a gumball.

Claude Sonnet 4.6 — the model I use most — dropped significantly from where Sonnet was priced a year ago. The exact pricing shifts with plans and volume, but the trajectory is unmistakable: down and to the right.

GPT-4o and GPT-4o mini followed the same curve. OpenAI has been aggressive about pricing since competitors started undercutting them.

The pattern is consistent: what cost $15-20 per million tokens in early 2025 now costs $1-3 for comparable quality. And the budget tier — models like Gemini Flash-Lite and Claude Haiku — barely registers on a cost sheet at all.

What This Changes for Indie Products

Here's where it gets interesting for anyone building a product.

AI features are now profitable at small scale. A year ago, if you had 1,000 users each making 10 AI-assisted queries per day, you might be spending $300-500/month on API costs alone. Now that same usage costs $30-50. For a $10/month SaaS, that's the difference between a feature that's a loss leader and one that's profitable from user one.

You can be generous with AI usage limits. The worst version of AI in a product is the heavily rate-limited version — "You've used 3 of your 5 daily AI queries." Users hate it. It makes the feature feel like a demo. With current pricing, most indie products can offer generous or unlimited AI usage without it being a financial crisis.

The tiered model strategy actually works now. You don't have to run Claude Opus for everything. Use a cheap model (Haiku, Flash-Lite) for high-volume, simpler tasks — summarization, categorization, basic Q&A. Reserve the expensive models for complex reasoning where quality genuinely matters. This was theoretically possible before, but the gap between "cheap model" and "good model" was too wide. Now the cheap models are good enough for 80% of use cases.

Real Math for a Real Product

Let me make this concrete. Say you're building a SaaS tool with an AI feature that processes user text — summarization, analysis, whatever. Your typical request is about 2,000 input tokens and 500 output tokens.

At 10,000 monthly active users, 10 requests per user per day:

Using Gemini Flash-Lite ($0.25 input / $1.00 output per million tokens):

  • Input: 10,000 users × 10 requests × 2,000 tokens × 30 days = 6B tokens/month = $1.50
  • Output: 10,000 × 10 × 500 × 30 = 1.5B tokens/month = $1.50
  • Total: ~$3/month

Using Claude Sonnet 4.6 (roughly $3 input / $15 output per million tokens):

  • Input: 6B tokens = $18
  • Output: 1.5B tokens = $22.50
  • Total: ~$40/month

Using a frontier reasoning model ($15 input / $60 output per million tokens):

  • Input: 6B tokens = $90
  • Output: 1.5B tokens = $90
  • Total: ~$180/month

Even the expensive option is $180/month for 10,000 active users. If those users are paying $10/month, you're spending less than 2% of revenue on AI. A year ago, that same computation would have cost 5-10x more.

The Catch

Cheap inference isn't free AI. A few costs that haven't collapsed:

Fine-tuning is still expensive. If you need a custom model trained on your data, you're paying for compute time that hasn't dropped as dramatically as inference pricing.

Embeddings and vector storage add up. If your AI feature involves search over a large dataset, you're paying for embedding generation and vector database hosting. These costs are separate from inference and can surprise you.

The "good enough" trap. Cheap models tempt you into using AI for everything, even when a simple algorithm would work better. Every API call is a dependency on an external service, a potential point of failure, and latency you're adding to your user's experience. Just because AI is cheap doesn't mean it's always the right tool.

What I'm Doing Differently

The pricing collapse changed how I think about building. A few things I'm doing now that I wouldn't have done a year ago:

I'm more willing to ship AI features as core functionality rather than premium add-ons. When the cost is negligible, there's no reason to gate it behind a higher tier.

I'm routing different tasks to different models based on complexity instead of using one model for everything. The cost savings aren't huge at my current scale, but it's a good habit for when they will be.

I'm less worried about token optimization. A year ago, I spent real effort minimizing prompt length to save on costs. Now I write prompts for clarity and let the model do its job. The extra tokens cost fractions of a cent.

The bottom line: if you had an idea for an AI-powered product but shelved it because the unit economics didn't work, run the numbers again. They probably work now.

Stay in the Loop

Get new posts delivered to your inbox. No spam, unsubscribe anytime.

Related Posts