I spent the past two weeks building an internal dashboard with both Claude Code and Codex. The dashboard pulls from our NocoDB instance, renders fund performance charts, and presents LP-level portfolio breakdowns. The kind of operational tooling we build at Stears every quarter.
Claude Code handled the backend beautifully. State management, API integration, data transformation pipelines, all flawless. Then I asked it to make the thing look good.
Inter font. Purple gradient. Card layout. Safe neutrals.
I've seen this exact output from every Claude Code frontend project I've touched since 2025.
- Codex gets UI tasks right on the first try far more consistently than Claude Code, particularly on layout, spacing, and component styling
- Claude Code burns 3-4x more tokens per task. For iterative frontend work, that cost compounds fast
- The Figma-Codex bidirectional MCP integration is a genuine workflow advantage Claude Code hasn't matched
- METR's updated study (Feb 2026, 800+ tasks) found AI tools now likely provide productivity benefits. But where those benefits land in the stack matters
The consensus in every developer community I follow is clear: Claude Code is the superior coding tool. Blind tests from March 2026 across 500+ Reddit developers show a 67% win rate. SWE-bench scores lead. The 200K context window is enormous. Claude Code now accounts for roughly 4% of all public GitHub commits, according to METR.
I agree with all of that, for backend work, for architectural reasoning, for complex multi-file refactoring.
But nobody is saying the quiet part out loud.
For frontend design, the thing users actually see, Codex has been quietly winning since GPT-5.3 dropped in February.
The "AI slop" problem is a Claude Code problem
If you've built even one frontend with Claude Code, you know the aesthetic fingerprint. Inter font. Rounded cards with purple accents. Generous padding. Safe neutral backgrounds. A blue CTA button with slightly too much border-radius.
Firecrawl's recent analysis of the top Claude Code skills put it bluntly: the default output is technically correct but visually interchangeable with every other AI-generated interface. The AI coding community has a name for this. They call it AI slop.
Anthropic knows it. They maintain an official Frontend Design skill that explicitly bans Inter, Roboto, Arial, and Space Grotesk. The skill forces the model to commit to a specific visual direction, brutalist, editorial, retro-futuristic, whatever the project calls for, before generating a single line of CSS. That skill has over 110,000 weekly installs across Claude Code, Codex, Gemini CLI, Cursor, and Copilot.
You don't get 110,000 weekly installs for a skill that solves a minor problem.
Codex doesn't need that crutch as badly. In a VERTU benchmark comparing GPT-5.3-Codex against Claude Opus 4.6 on a 150K-node React codebase, Codex performed better on UI/UX tasks, with what the researchers described as stronger awareness of recent web design trends and faster iteration on Tailwind and CSS-in-JS. A TensorLake test had Codex producing a complete, working dashboard UI (HTML, CSS, sensible typography defaults, responsive layout) in three minutes and 53 seconds. No manual fixes required.
Quick disclaimer: I'm not arguing that Codex produces beautiful design. It produces current design. There's a difference. But in 2026, current beats beautiful if you're shipping internal tools on a Thursday afternoon.
The token economics of frontend iteration
This is where the comparison gets uncomfortable for Claude Code advocates.
According to Anthropic's own documentation, Claude Code's average API cost runs approximately $6 per developer per day, with 90% of users staying under $12. Those numbers sound reasonable. They also obscure what happens during frontend iteration cycles.
Claude Code doesn't send a single prompt and wait for a response. Each interaction is a multi-turn conversation that carries the accumulated context: system prompt, conversation history, file contents, tool-use tokens from bash commands and file reads. A seemingly simple "edit this file" command can consume between 50,000 and 150,000 tokens in a single API call once the full context window is assembled, according to SitePoint's technical analysis from March 2026. Follow-up messages in the same session append to this context, meaning token consumption per request grows over the course of a session.
For backend work, where you ask once, get a thorough answer, and move on, that model is fine. For frontend work, where you're iterating on hover states, responsive breakpoints, animation timing, and colour tweaks across ten rounds of feedback? You're compounding tokens at a rate that burns through a $20 Pro limit by lunch.
One Redditor in the DEV Community's 500-developer analysis put it memorably: "Claude Code is objectively the better tool. 67% blind test win rate. But a $20 plan that runs out after 12 prompts isn't your daily driver, no matter how good the quality."
The economics matter more than people admit. The Register reported in January 2026 that developers were claiming a roughly 60% reduction in token usage limits. Anthropic attributed this to the expiration of a holiday bonus, but the underlying tension is structural: Claude Code's agentic loop model is fundamentally more expensive per interaction than Codex's leaner approach.
Here's the arithmetic for my team at Stears. Three analysts building internal dashboards. At ~$6 per day per person on the API, that's $360 per month before anyone hits a complex frontend iteration cycle. If even half their time is frontend work, and for internal tooling, it often is, the token-per-task premium of Claude Code over Codex is a real line item.
The Figma round-trip: Codex's genuine workflow advantage
In February 2026, OpenAI and Figma announced a bidirectional MCP integration. The partnership lets Codex pull design context, layouts, component structures, design tokens, directly from Figma files. More significantly, it lets developers push running code back to the Figma canvas as editable layers.
That round-trip matters because it closes the gap between design intent and implementation. Demonstrations showed Codex extracting Figma links and creating prototypes with 80-90% accuracy against the original design system, with support for hot reloading and interactive testing.
Claude Code also has a Figma MCP integration. Figma announced Claude Code-to-Figma in the same week, a clear signal of Figma's AI-agnostic platform strategy. But the workflows are not equivalent. Builder.io's teardown of the Claude Code-Figma pipeline was honest about the limitation: once Claude generates a component, there's no way to iterate on the output visually. You can keep prompting, but each prompt carries the growing context, burns more tokens, and AI can only understand so much from screenshots.
Figma's CEO, Dylan Field, framed the bidirectional integration in terms of escaping "tunnel vision": code-first teams getting stuck iterating on the first version without exploring alternatives. The design canvas gives them room to zoom out.
That's a compelling pitch. But it also reveals the asymmetry. Codex users get a continuous loop: design-to-code-to-canvas-to-code. Claude Code users get a one-way trip with manual checkpoints.
Where Codex is actually winning: the evidence
I want to be specific about this, because "Codex is better at frontend" is a broad claim and I'm making a narrower one. The evidence comes from three independent sources.
A Medium post by Jacob Vendramin, a self-described Claude stan who switched to Codex, captured the pattern precisely: Codex appears to excel most at UI-based tasks compared to Claude, and usually gets it right first try. He also noted that Codex spends significantly more time thinking and reasoning before acting, which means it arrives at a more considered first output. Claude Code, by contrast, gets tasks done faster but often needs a second or third pass on UI work.
Dan Cleary tested GPT-5.3-Codex, Sonnet 4.6, and Gemini 3.1 head-to-head on vibe coding tasks in February 2026. The results were mixed overall. Sonnet won on immersive simulation quality and a ChatGPT clone build. But Codex took the win on backend implementation for a tower defence game. For landing page redesign (pure frontend), the results were closely contested, with a slight nod to Sonnet.
The UX Collective published a detailed workflow comparison by Iasonas Georgiadis. His conclusion: Claude Code excelled at structuring projects, setting up governance files, and defining workflows. Codex CLI specialised in code refinement, improving readability, accessibility, and performance.
The split is clean. Claude Code wins on tasks that require holding a large mental model of the codebase in context. Codex wins on tasks that require speed, visual fidelity, and iteration without burning through limits.
The productivity paradox
METR's randomised controlled trial from July 2025 (16 experienced developers, 246 real tasks) found something that rattled the AI coding community: developers using AI tools took 19% longer to complete tasks while believing they were 20% faster. That's a 39-percentage-point perception-reality gap.
The study has been contested, fairly. The confidence interval was wide. In February 2026, METR published an update with a larger cohort (800+ tasks, 57 developers) that found a much smaller -4% slowdown, with METR's own conclusion being that AI likely provides productivity benefits in early 2026.
But the original study's most interesting finding wasn't the headline number. It was the observation that developers spent much of their AI time cleaning up generated code. The tasks where AI felt most helpful were often different from the tasks where AI was measurably helpful.
This maps directly to the Claude Code vs Codex frontend question. Claude Code's thorough approach (reading 40 files, building a deep context model, producing a carefully reasoned output) is exactly the workflow that burns tokens and feels productive but might not be faster for a CSS tweak. Codex's leaner approach might be the more honest tool for frontend work, even if it's the "lower quality" one on benchmarks.
The concession: Claude Code is still the better engineer
I should be direct about what I'm not saying.
Claude Code's architectural reasoning is unmatched. On a 150K-node React repository, it maintained a 94% success rate identifying cross-component state bugs. For structural refactoring (extracting billing logic into a separate package without breaking a monolith), Claude Code's understanding of dependency graphs makes it significantly more reliable than Codex. These are not marginal advantages. They are structural ones.
The February 2026 vibe coding comparison by Dan Cleary is worth citing here. When he asked all three models to build a full-featured ChatGPT clone with streaming, cross-thread memory, file upload, and rich formatting, Sonnet 4.6 crushed it. One shot. Working streaming, message history, multimodal input. Codex got something running in the browser but basic message handling didn't work.
That's not a close contest.
So the picture is more nuanced than "Codex beats Claude Code at frontend." It's closer to: Codex beats Claude Code at the routine frontend work that makes up 80% of a typical web dev sprint (layout scaffolding, component styling, responsive tweaks). Claude Code beats Codex at the hard 20% (complex interactive builds, architectural reasoning, and anything that requires holding a large codebase in working memory).
The hybrid workflow that nobody's recommending
Several experienced developers have converged on the same pattern independently. One r/ClaudeCode user described it: "My global CLAUDE.md tells it to send diffs to Gemini and Codex for review before committing. High catch rate." Another formulation from the DEV Community analysis: "2026 power stack: Codex for keystroke, Claude Code for commits."
My version: Codex for the initial frontend scaffold, component styling, and Figma-to-code translation. Claude Code for the data layer, state management, and any refactoring that touches more than two files. Both tools running in parallel terminal windows.
At $20/month for Codex Plus and $20/month for Claude Code Pro, you're paying $40/month for a workflow that uses each tool where it's strongest. That's still less than a single Claude Code Max subscription at $100/month, and you get more total throughput across both tools than you'd get from either alone.
It's not elegant. But it ships.
The question nobody's asking
The developer community frames this as "which tool is better?" A single answer to a compound question. Every comparison post runs the same playbook: benchmarks table, pricing table, verdict. Pick one.
The more useful framing is: which layer of the stack are you working on right now?
For the backend and the architecture, Claude Code wins on reasoning depth. SWE-bench, OSWorld, the 94% cross-component bug detection rate. These are real, verified advantages that translate directly to fewer production incidents and cleaner codebases.
For the frontend, the thing your users see, click, and judge you by, Codex has been quietly outperforming for months. Better first-try accuracy on UI tasks. Leaner token consumption for iteration-heavy work. A bidirectional Figma integration that closes the design-to-code feedback loop.
My hot take isn't that Codex is a better coding tool. It isn't. My hot take is that in the specific domain where software meets human eyes, Claude Code has a design taste problem that no amount of benchmark scores can paper over. And Codex, the tool everyone dismisses as "the fast, slightly lower-quality one," has been silently eating that lunch.
The most productive developers in 2026 aren't picking sides. They're using both. Claude Code for the architecture. Codex for the interface.
Your frontend is not your benchmark score. And the users clicking through your dashboard at 3pm on a Tuesday don't care which model reasoned harder about your state management patterns.
They care if the button is where they expect it. They care if the typography is readable. They care if it loads fast.
Codex gets that right on the first try. That matters.
