Battle-tested in production. Build on it with confidence.
OpenAI Codex
The most capable autonomous coding agent on benchmarks, with both cloud and local modes, but it's locked inside ChatGPT's pricing and the cloud agent can't be steered mid-task.
Agentic·DevTool·LLM·Infrastructure
openai.comOur Take
What It Is
OpenAI Codex is a two-part coding agent: a cloud-based system that runs tasks in isolated sandbox environments preloaded with your repository, and an open-source CLI (built in Rust) for local terminal workflows. It's powered by GPT-5.3-Codex, a model trained via reinforcement learning on real-world coding tasks. GPT-5.3-Codex goes beyond code writing into full computer-use, setting industry highs on SWE-Bench Pro, Terminal-Bench, OSWorld, and GDPval.
Why It Matters
The cloud sandbox architecture is meaningfully different from local coding agents. Each task runs in its own isolated container, and you can fire off multiple tasks in parallel. The model's iterative test execution (trained to run tests repeatedly until passing) closes the write-test-fix loop autonomously. With GPT-5.4 announced in March 2026 combining Codex capabilities with stronger reasoning, OpenAI is pushing the boundary on what autonomous coding agents can handle.
Key Developments
- Mar 2026: GPT-5.4 announced, combining GPT-5.3-Codex capabilities with stronger reasoning and tool use.
- Feb 2026: Codex macOS desktop app launched; double rate limits offered temporarily for all paid subscribers.
- Feb 2026: GPT-5.3-Codex released with computer-use capability, setting new highs on SWE-Bench Pro and Terminal-Bench.
- Late 2025: GPT-5.2-Codex released with software engineering optimisation via reinforcement learning.
- Mid 2025: Codex cloud agent launched with parallel task execution in sandboxed environments.
What to Watch
The fire-and-forget nature of the cloud agent is the biggest limitation. You can't course-correct it mid-task, which means a wrong approach on a 20-minute task wastes all that time. Watch whether OpenAI adds mid-task intervention. Also track the pricing: Codex requires a ChatGPT subscription ($20-$200/month), and rate limits vary unpredictably (30-150 messages per 5 hours on Plus). The Codex CLI being open-source is a hedge worth considering.
Strengths
- Benchmark leader: GPT-5.3-Codex sets industry highs on SWE-Bench Pro and Terminal-Bench for complex, multi-step engineering tasks.
- Cloud sandbox architecture: Each task runs in its own isolated container. Parallel task execution lets you fire off multiple bugs or features simultaneously.
- Open-source CLI: Built in Rust, runs locally, works with your files directly. Different trust model from the cloud version.
- End-to-end task completion: GPT-5.3-Codex can operate a computer, deploy software, and complete multi-step professional workflows beyond pure coding.
Considerations
- Locked to ChatGPT subscriptions: No standalone Codex plan. Need ChatGPT Plus ($20/mo) for basic access or Pro ($200/mo) for serious usage.
- Fire-and-forget cloud agent: Can't course-correct while it's working. Wrong approach on a 20-minute task means waiting and starting over.
- No image input: The cloud agent can't process screenshots or mockups. Meaningful gap for frontend-heavy work.
- macOS only for desktop app: Windows support planned but unscheduled. CLI works cross-platform.
Resources
Repositories
Documentation
More in Developer Experience
OpenAI Codex· Gemini CLI· LiteLLM· Coding Agents· Cursor· Google Antigravity· OpenRouter· Windsurf· Xcode Agentic Coding· Claude Code· GitHub Copilot· Prompt Caching
Back to AI Radar