OpenAI Codex Just Became a Claude Code Competitor — Here's the Honest Side-by-Side for Solo Devs

OpenAI shipped "Codex for (almost) everything" on April 16. The launch post was the usual feature carousel: a desktop app with an integrated browser, terminal access, 90+ plugins, PR review inline with diff comments, persistent memory across sessions, SSH to remote devboxes, and — the part that made me laugh out loud on first read — a Codex plugin that lets you hand Claude Code's pull requests to Codex for review before you merge them.

I've been a Claude Code person since it shipped. I run it as my daily driver, I pay for the Max plan, and I've migrated most of my solo-operator workflow into it. So when Codex shipped a serious alternative, I did what every launch-day fence-sitter should do and I actually used it. For five days. On production code I was already shipping. Same repos, same tasks, side by side.

This is the real side-by-side, not the launch-day recap. Short version up front: Codex is legitimately good, it is not a drop-in Claude Code replacement, and the "which do I use" decision for a solo dev is more nuanced than any of the launch threads suggested.

What Codex Actually Does Now

The April 16 update is not a model release. It's an interface release. The underlying model improved in March; this drop was about wrapping that model in something you'd actually want to open every day.

The desktop app is the headline. You get a window with three primary panes: a chat surface, an integrated browser, and a terminal. The browser is real Chromium and the agent can drive it — click, type, navigate, take screenshots, read DOM. The terminal is local to your machine or SSH'd to a configured devbox. You pick. The chat surface talks to both.

Plugins are the second headline. There are more than 90 of them at launch, covering the enterprise dev stack: Atlassian Rovo, GitLab Issues, CircleCI, Render, Neon, CodeRabbit, Linear, Notion, Supabase, Vercel, Sentry, PagerDuty, and on. Each plugin is effectively an MCP server OpenAI maintains, wrapped in a sanctioned UI.

PR review is the third headline. You install a GitHub App, point it at a repo, and Codex posts inline review comments on every PR. You can ask it for a summary, you can push back on specific comments, you can have it suggest a patch. It's a real review, not a lint pass.

Memory is the fourth. Codex now persists context across sessions per project. Close the window, come back two days later, and it remembers what you were doing. This is a genuine workflow change. Claude Code requires you to reload context or use session resumption; Codex is always-on per project.

Fifth: the Claude Code plugin. Codex can act as a reviewer on PRs written by Claude Code, or vice versa. Cross-provider review catches things one model alone misses. This is, honestly, the most interesting workflow either company shipped this month.

The Side-by-Side Nobody's Written Yet

Here's what I did. I took a three-hour refactoring task on a real project — migrating a tRPC v10 API surface to v11, updating all the client call sites, fixing the 60-some type errors that fell out, and adding tests — and I ran the exact same task twice. Once with Claude Code, once with Codex. Same repo, same starting commit, same instructions in my head, same allowed tool set.

Claude Code finished in 2h 12m. Token spend ran around $3.40 on Sonnet 4.6 with Opus 4.7 planning. The output was clean: one commit per logical change, tests written with real coverage, the typical Claude style of "ask permission before destructive operations, narrate what it's doing." It got stuck once on a generic type inference that needed me to clarify intent. I clarified. It continued. I shipped.

Codex finished in 2h 41m. Token spend was harder to compare because OpenAI's usage display is still bad, but it was in the same ballpark on GPT-5.4 Pro. The output shape was different: Codex did the whole task in one sitting with fewer, larger commits, and it used the integrated browser to look up the tRPC migration docs at one point, which was a nice touch. It got stuck twice — once on the same generic type thing, once on a monorepo package resolution issue Claude Code just ate without asking. Both recoveries required more hand-holding from me.

Final outcome: both produced working code. I shipped the Claude Code PR. The Codex PR I threw away, not because it was worse, but because I already had a shipping candidate and didn't want to resolve the diff conflict. In isolation, either one was shippable.

What each did well. Claude Code felt more like pair programming. It asked clean, specific questions. Its commits were small and reviewable. Codex felt more like delegating to a junior engineer with a good browser. It solved a real lookup problem I would have had to Google. It also over-committed once without asking — pushed four commits when I wanted two — which is the Codex-style I keep hearing about from other people's reviews.

The Claude Code Plugin for Codex Is the Interesting Thing

Everyone wrote about this as a joke — "OpenAI made a plugin for its competitor, lol" — but using it for a week, I don't think it's a joke. I think it's the actual strategic move.

Here is the workflow: Claude Code writes the PR. I open it locally and run a Codex review pass through the plugin. Codex reads the diff, runs type checks, runs the tests, and posts a set of inline comments on the PR. I read those comments. About 30% are nitpicks I don't care about. About 50% are legit but small. And about 20% catch things Claude missed — subtle bugs, unhandled edge cases, a missing null check that mattered. I address the 20%, ignore the rest, and merge.

Cross-provider review is the first honest answer I've seen to "how do I QA code my AI assistant wrote when I'm the only human in the loop." Having two different models on the same code, with different biases and different blind spots, catches things that no single model catches at 100% rate. It is not a full human-review replacement. It is better than nothing, and nothing was the previous state of my one-person shop.

I've now set this up as my default workflow: Claude Code writes, Codex reviews, I merge. The token cost is roughly 1.4x what I was spending before. The quality delta is legible. I'll keep doing it.

Where Claude Code Still Wins

MCP ecosystem depth. Claude Code has more MCP servers in practical use, with deeper per-server functionality. Anthropic reported 10k+ public servers and 97M monthly SDK downloads as of April. Codex's plugin system is tighter but narrower: 90 well-curated plugins, no long tail. For a solo dev who runs niche MCPs (personal knowledge bases, custom Figma MCPs, homebrew tool wrappers), Claude Code is still the richer ground.

Cowork integration. If you're using Cowork for non-coding work — and you should be — Claude Code is the runtime. Codex has nothing comparable.

AGENTS.md support. Both tools honor AGENTS.md now, but Claude Code's handling is more mature. Codex still occasionally ignores project-level constraints that AGENTS.md sets.

The mental model. Claude Code treats your project as a directory of files you own. Codex treats your project as a workspace inside a web app. For me, the directory model wins — it composes with git, with my terminal, with my existing editors. The workspace model is nicer to look at but I don't want my project of record living inside a vendor's IDE.

Where Codex Edges Ahead

The in-app browser is legitimately a better frontend iteration loop. Having the browser live inside the agent's context, with the agent able to drive it and read it, is a real workflow upgrade for anyone doing UI work. I built a small Astro component twice — once in Claude Code, once in Codex — and the Codex version was faster end-to-end because I didn't have to context-switch to Chrome to check styling.

PR review is tighter than anything Claude Code ships natively. Claude Code can review PRs but the GitHub App and the inline-comment workflow aren't there yet. Codex's PR review feels like a feature that was designed, not a feature that emerged.

Enterprise plugins. Atlassian, Jira, GitLab, ServiceNow, SAP plugins. If your day job involves enterprise tools, Codex's plugin surface is miles ahead. If your day job is "my laptop and Vercel," this doesn't matter.

The Honest Decision for a Solo Dev

You don't need both as your daily driver.

If your day is shipping backend features, CLI tools, infrastructure, or anything where the loop is "edit files, run tests, git commit": Claude Code is still the right default. The MCP ecosystem, the Cowork integration, and the file-system-first mental model compose with everything else in a solo operator's stack. Codex's new shape doesn't earn the switch.

If your day is iterating on frontend UX while reviewing a steady stream of PRs from clients or collaborators: Codex's new shape is actually better-suited. The integrated browser plus the inline PR review workflow is a real productivity delta for that specific day. It's worth trying.

For everyone, regardless: set up cross-provider review. Claude Code writes, Codex reviews, or vice versa. The marginal cost is small, the marginal quality gain is real, and it's the one workflow I'd actively recommend to every solo dev I know after a week of using it.

What I Changed In My Stack After A Week

Two Claude Code workflows I moved to Codex: frontend component iteration on a client project, and the weekly review pass on PRs across my three active repos.

Five Claude Code workflows I did not move: all backend work, all MCP-adjacent work, everything in Cowork, all scripting and automation, and the planning loop I run against AGENTS.md at the start of each feature.

One workflow I'm now running through both, cross-reviewing: the main app's feature PRs. Claude writes, Codex reviews. The bill went up about 40%. The quality went up by a real, if hard-to-quantify, amount.

The Bigger Picture

This is the first month where a solo dev can legitimately pick either tool as a daily driver and not be crazy. Twelve months ago Claude Code was the only serious answer. Six months ago Cursor was a strong second. Now we have three serious contenders — Claude Code, Codex, and Cursor — each with a different shape, each with real tradeoffs.

The strategic read: OpenAI and Anthropic are settling into a Coke-and-Pepsi pattern for agentic coding. They're going to keep shipping parallel features, reviewing each other's roadmaps, and pulling solo devs back and forth. The specific winner on any given week is going to matter less than "do I have cross-provider review wired up, and am I portable enough to switch without drama."

Portability is the real move. Keep your project of record in files, not in a vendor's workspace. Keep your prompts and your AGENTS.md in the repo. Keep your secrets in a system both can read. Then the question of "which agent is best this month" becomes an afternoon's config change, not a migration project.

Codex's April 16 update didn't change my answer to "which tool do I open first in the morning." It did change my answer to "which tool do I run last before I merge." Both are wins.

OpenAI Codex Just Became a Claude Code Competitor — Here's the Honest Side-by-Side for Solo Devs

OpenAI Codex Just Became a Claude Code Competitor — Here's the Honest Side-by-Side for Solo Devs

What Codex Actually Does Now

The Side-by-Side Nobody's Written Yet

The Claude Code Plugin for Codex Is the Interesting Thing

Where Claude Code Still Wins

Where Codex Edges Ahead

The Honest Decision for a Solo Dev

What I Changed In My Stack After A Week

The Bigger Picture

Stay in the Loop

Related Posts

Researchers Just Scanned 1 Million Exposed AI Services. 31% of Public Ollama Servers Will Answer Anonymous Prompts. Here's the 10-Minute Saturday Fix.

Anthropic Just Stood Up a $1.5B Services Company With Blackstone and Goldman. Here's What Happens to the Indie AI Consultants Doing This Work Today.

Anthropic Just Shipped 10 Pre-Built Finance Agents. I Rated All Ten by Indie Utility — Three Are Genuinely Useful, Seven Aren't.