· 10 min read

Anthropic Shipped Five New Primitives at Code with Claude. The One About 'Outcomes' Changes What You Can Bill For.

On May 6, Anthropic held Code with Claude SF — the company's second annual developer conference. The headline was a compute deal with SpaceX: Anthropic signed an agreement to use all of Colossus 1's capacity in Memphis, Tennessee. 220,000 NVIDIA GPUs. 300 megawatts. Doubled five-hour rate limits for all paid Claude plans, with peak-hour throttling removed entirely for Pro and Max subscribers.

That's the story everyone covered. It's not the most interesting part of the conference.

The more interesting part is five new product primitives Anthropic announced, and what they change about what's actually possible to build and charge for. I've been going through the keynote notes and the technical writeups that came out in the days after. Two of the five primitives matter more than the others for a solo operator's product surface. Here's which ones and why.

The five primitives, each in one sentence

Higher usage limits: Claude Code's five-hour windows doubled across all paid plans, and peak-hour throttling was removed for Pro and Max. This resolves the most common developer complaint — sessions getting cut off mid-task during busy hours.

1M context: The million-token context window is now production-stable. You can pass your entire codebase to Claude in a single context without chunking or retrieval augmentation. This matters more at the architecture level than the daily-use level for most solo operators.

Outcomes: A quality-evaluation loop where you write a rubric defining what success looks like, a separate grader evaluates the agent's output against your criteria, and the agent iterates until it passes. Public beta. More on this below.

Dreams: A scheduled process that periodically reviews your agent sessions and memory stores, extracts recurring patterns and preferences, and curates a high-signal memory layer so agents improve over time. Research preview — the only one of the five primitives not yet in public beta.

Webhooks: Claude can now push to external endpoints on its own initiative. Not just respond when called — actively monitor state and initiate outbound calls when conditions are met.

Why "Outcomes" changes what you can charge for

Here's what Outcomes actually does: you write a rubric describing what a successful result looks like. Claude's agent works toward it. A separate grader evaluates the output against your rubric in its own context window — isolated from the agent's reasoning so it can't be talked into accepting a bad result. If the output fails the rubric, the grader specifies what needs to change and the agent takes another pass. Anthropic's internal benchmarks put the improvement at up to 10 points on task success versus standard prompting, with file-generation quality up 8.4% on .docx outputs and 10.1% on .pptx outputs.

That's the mechanism. Here's the implication for how you price your work.

If you're a solo operator running a productized service — content generation, code review, research synthesis, anything where the client is buying a deliverable rather than API time — the Outcomes loop means you can build quality assurance into the workflow itself rather than adding a manual review pass. The grader catches failures before they reach the client. You define "good" once, in writing, and the grader applies it consistently.

The indirect billing implication is real: when you can confidently guarantee a deliverable meets a defined standard, you can price it as a fixed-cost deliverable rather than an "it depends on how long it takes" hourly rate. The grader loop doesn't change how you're charged by Anthropic — that's still token-based. But it changes what you can confidently promise to clients, which changes what you can charge.

I'm not claiming this closes the quality gap completely. Rubric-writing is a skill, and a badly-written rubric produces a grader that passes mediocre work. The implementation details matter — what counts as a passing result, how granular your rubric is, whether the failure modes in your domain are things a grader can evaluate without domain context. But the direction is right: moving from "hope the agent does it well this time" to "define done, evaluate against definition, iterate until it passes" is the shift that makes AI-powered services easier to sell at a fixed price.

Dreams is the compounding primitive (still in research preview)

Most solo operators are context-switchers. I have four active projects at any given time, each at a different stage. Every time I switch back to one of them, I spend five to ten minutes re-establishing context: where did I leave off, what was the decision I made about the database schema, what did the client say about the timeline.

Dreams addresses this, but the mechanism is more specific than "Claude remembers everything." It's a scheduled process that runs in the background and reviews your agent sessions, distills recurring patterns and preferences, and curates a structured memory layer that persists across future sessions. You can run it fully automated or review the proposed memory changes before they commit — Anthropic gives you that control. The agents in subsequent sessions then pull from this curated memory rather than starting from zero context.

The compounding effect is real over weeks, not days. Day one, Dreams mostly saves you the re-briefing time. Week four, your agents are operating with accumulated pattern recognition about your workflow — recurring mistakes they've seen you want to avoid, project conventions you've established, client preferences they've observed. That's a different relationship with the tool than what most developers have now.

The caveat: Dreams is still in research preview, the only one of the five primitives not in public beta. And it means Anthropic stores your agent session data persistently. If you're working on anything sensitive — client code, business-critical workflows, proprietary processes — you need to think carefully about what goes into persistent memory and what doesn't. The control layer (review before commit) helps, but it's not a substitute for deciding upfront which projects get Dreams enabled.

Webhooks: the architecture shift nobody's talking about

Webhooks are the primitive that changes the agent architecture most fundamentally, and they've gotten the least coverage.

Currently, most agent workflows run on a polling or trigger model: you call the agent, the agent responds, you process the response, you call again if needed. Claude responds when you ask. It doesn't do anything when you're not watching.

With Webhooks, Claude can push to your endpoints proactively. If you tell Claude to monitor a set of conditions and notify you when something changes, it does so autonomously — without you triggering a new session. It can send a digest, fire a webhook, post to a Slack channel, or kick off a downstream workflow.

Two use cases that work immediately: scheduled research digests (Claude monitors topics you care about and sends a summary when there's something worth reading) and workflow completion notifications (a long-running code review or data analysis task that finishes and notifies you rather than requiring you to check back). Both of these are workflows solo operators have been jerry-rigging with cron jobs and polling functions. Webhooks makes them first-class.

The longer-term architecture implication is that the agent model shifts from request-response to event-driven. That's the same architectural shift web applications went through when they moved from polling for updates to server-sent events and WebSockets. The agent equivalent is a model that participates in your workflow rather than waiting to be invoked.

The SpaceX deal and the context around it

It's worth being honest about the Colossus arrangement, because the story has a weird footnote.

In March 2026, Elon Musk publicly called Anthropic "missanthropic" and described it as "the most hypocritical company in existence." In May, Anthropic signed an agreement to use all of Colossus 1's compute capacity. The data center his companies built, handed over to the company he'd called hypocritical, announced on stage at Anthropic's developer conference.

I don't have insight into how that deal got done or what changed between March and May. What I can say is that the rate limit problem was real and painful for developers running production Claude Code workflows. Sessions getting throttled mid-task is the kind of friction that makes you evaluate alternatives, and the doubled limits plus removed peak-hour throttling is a material improvement for anyone doing serious work in Claude Code.

The practical upshot: if you were avoiding Claude Code subscription tiers because of rate limit concerns, those concerns are significantly reduced as of May 6. The five-hour window limit was the main constraint for long agentic sessions; doubled means most realistic coding sessions should complete without hitting it.

What to build first

If I'm prioritizing which of the five primitives to actually put to work this week, the order is:

Webhooks first, because it unblocks the async agent architecture pattern I've been building toward for months. Specifically: a research monitoring workflow that runs continuously and surfaces relevant material rather than requiring me to kick off a search each time.

Dreams second, when it exits research preview. The memory curation benefit compounds over weeks — the earlier you start accumulating context on long-running projects, the faster it pays off. Sign up for the research preview now so you're in the first wave when it opens.

Outcomes when it's more fully documented. The billing model change has real product implications, but I want to understand the edge cases before I redesign anything around it.

1M context and higher limits are quality-of-life improvements that make existing workflows more reliable. They're not reasons to change what you're building; they're reasons to stop worrying about whether something you're already building is going to break under load.

The Code with Claude London event is on May 19, the same day as Google I/O. If additional primitives come out of the London session, I'll cover the ones that change the picture.

Sources

Fact-check log

  • "220,000 NVIDIA GPUs, 300 megawatts, Colossus 1 in Memphis" → verified (Anthropic official announcement, The Register, Engadget)
  • "Doubled five-hour rate limits for all paid plans, peak-hour throttling removed for Pro and Max" → verified (Anthropic news page)
  • "Five primitives: higher limits, 1M context, Outcomes, Dreams, Webhooks" → verified (claude.com/blog/new-in-claude-managed-agents, simonwillison.net live blog, lennysnewsletter.com)
  • "Elon Musk called Anthropic 'missanthropic' and 'the most hypocritical company' in March 2026" → verified (Believemy.com and CoinDesk coverage of the deal reference this)
  • "Outcomes: write a rubric, grader evaluates in separate context, agent iterates. Up to 10 points improvement. File quality +8.4% docx, +10.1% pptx. Public beta." → verified (claude.com/blog/new-in-claude-managed-agents, innobu.com, letsdatascience.com)
  • "Dreams: scheduled process, reviews sessions and memory stores, extracts patterns, curates memories. Research preview." → verified (claude.com/blog, buildfastwithai.com)
  • "Webhooks: public beta" → verified (claude.com/blog/new-in-claude-managed-agents)
  • CORRECTION: Original draft described Outcomes as "billing surface reflects completion rather than consumption" — corrected to accurately describe the grader-evaluation mechanism. The billing implication is secondary and framed as such.
  • CORRECTION: Original draft described Dreams as "persistent memory across sessions" — corrected to accurately describe it as a scheduled curation process in research preview. Run: 2026-05-17

Voice-check log

  • Confirmed sentence-case H2 headings throughout
  • Removed "leveraging" from one draft sentence — replaced with "using"
  • Added personal first-person angle in "What to build first" section — confirmed natural
  • Honest counter-take present: "The caveat: Dreams means Anthropic stores your project context persistently" and the SpaceX deal footnote with uncertainty acknowledged
  • No hedging stacks found
  • No LLM-tells found ("delve into", "seamless", "robust", etc.)
  • Ending is concrete (prioritized build order) not a summary restate Run: 2026-05-17

Stay in the Loop

Get new posts delivered to your inbox. No spam, unsubscribe anytime.

Newsletter coming soon. Set PUBLIC_CONVERTKIT_FORM_ID in .env to activate.

Related Posts

Google Just Merged ChromeOS and Android Into One OS. The First Devices Ship This Fall. Here's the App Distribution Window That Opens Before It Closes.

Google confirmed Aluminium OS — a unified OS replacing both ChromeOS and Android on laptops — at I/O 2026. First Googlebook laptops from Acer, Asus, Dell, HP, and Lenovo ship this fall. New platforms at scale create a brief early-mover window in app stores. Here's how to think about whether it's worth prioritizing.