OpenAI GPT-5.3-Codex and Codex-Spark: Real-Time Coding AI [2026]

OpenAI shipped two models in February 2026, and they tell different stories. GPT-5.3-Codex is the next step in their agentic coding line, building on GPT-5.2-Codex with stronger reasoning and a 25% speed bump. Codex-Spark is something new entirely: a smaller model designed from scratch for real-time coding, running at over 1,000 tokens per second on Cerebras hardware.

Both models live inside the OpenAI Codex platform, a cloud-based coding environment where AI agents can read entire repositories, write code, run tests, and iterate on results. The platform works through a CLI tool and VS Code extensions, with everything running in sandboxed environments.

But the real story isn’t just about coding. These releases point to a direction in AI development that matters for any product built on language models, including voice AI agents.

GPT-5.3-Codex: Stronger Reasoning Meets Faster Execution

GPT-5.2-Codex already topped the charts when it launched. It was OpenAI’s first model built specifically for agentic coding, meaning it wasn’t just generating code snippets. It could work through multi-step software engineering tasks like a developer: reading code, understanding architecture, making changes, running tests, and fixing failures.

GPT-5.3-Codex keeps that agentic ability and adds better reasoning on top. The model posted top scores on SWE-Bench Pro, the harder version of the standard SWE-bench that uses more complex, real-world GitHub issues. It also leads on Terminal-Bench 2.0, which tests whether a model can operate effectively in a terminal environment, running commands, interpreting output, and deciding what to do next.

The 25% speed improvement over GPT-5.2-Codex matters more than it sounds. When an agentic coding model works on a task, it’s not making one prediction. It’s running through cycles of reading, reasoning, writing, testing, and revising. Each cycle involves multiple model calls. A 25% speedup across all of those calls compounds. A task that took 8 minutes now takes 6. Over hundreds of tasks per day, that’s hours saved.

For developers, the practical difference is that Codex can now handle larger refactoring jobs, longer test suites, and more complex multi-file changes without timing out or losing coherence partway through.

Codex-Spark: 1,000 Tokens Per Second and Real-Time Interaction

Codex-Spark is the more surprising release. OpenAI calls it the first model designed for real-time coding, and the numbers back that up.

Most large language models produce output at somewhere between 30 and 150 tokens per second. That’s fast enough to feel responsive for chat, but it creates noticeable delays when generating longer code blocks. You ask for a function, wait a few seconds, and the code streams in.

Codex-Spark runs at over 1,000 tokens per second. At that speed, a 200-line function appears in under a second. The experience shifts from “waiting for the AI to finish” to “the AI keeps up with your thinking.”

This speed comes from a partnership with Cerebras and their Wafer Scale Engine 3. Unlike traditional GPU clusters where data moves between separate chips, the Cerebras architecture puts everything on a single wafer-scale chip. The result is dramatically lower latency and higher throughput for inference. OpenAI built Codex-Spark specifically to take advantage of this hardware.

At launch, Codex-Spark is a research preview available to ChatGPT Pro users. It comes with a 128K context window and handles text only (no image input). The model is smaller than full GPT-5.3-Codex, so it trades some depth of reasoning for speed. Think of it as the model you’d use for rapid iteration, autocomplete on steroids, quick edits, and interactive pair programming. When you need the model to carefully reason through a complex architectural decision, you’d still reach for full GPT-5.3-Codex.

How Coding AI Connects to Voice AI

This is a blog about an AI phone assistant, so why cover coding models? Because the tools that build AI products shape how fast those products improve.

Voice agents are complex software systems. A product like Safina involves real-time speech processing, language model inference, text-to-speech synthesis, telephony integration, conversation state management, and dozens of edge cases around accents, background noise, and caller behavior. The architecture behind real-time voice AI has many moving parts.

Building and maintaining that kind of system is hard. When a coding AI can read the entire codebase, understand how components interact, and produce correct changes across multiple files, the development team moves faster. Bug fixes happen in minutes instead of hours. New features get prototyped in a day instead of a week. Test coverage expands because the AI writes the tests too.

This isn’t theoretical. Teams that adopt agentic coding tools report shipping 2-3x more code per week, with fewer regressions. For AI products specifically, that feedback loop is even more valuable. Every improvement to a voice agent needs testing with real conversations, real edge cases, and real telephony conditions. Faster development means more iterations, and more iterations mean a better product.

There’s also a parallel that goes beyond coding. Codex-Spark’s move to real-time interaction (1,000+ tokens/sec) mirrors exactly what’s happening in voice AI. Phone conversations can’t wait. When a caller asks a question, the AI needs to respond within a few hundred milliseconds or the conversation feels broken. The entire voice AI industry is chasing lower latency, from speech-to-text to text-to-speech to the language model in between.

Both fields are converging on the same insight: AI that works in real-time is a different product than AI that works in batch. A coding model at 100 tokens/sec is a tool you query. At 1,000 tokens/sec, it’s a collaborator. A voice model at 500ms latency is a robot you talk at. At 200ms latency, it’s a conversation partner.

The Cerebras hardware approach behind Codex-Spark is interesting for voice AI too. If specialized silicon can push language model inference to 1,000+ tokens per second for coding, similar hardware-software co-design could push voice AI latency even lower. We’re not there yet for phone conversations, but the direction is clear.

For a broader look at how AI model improvements affect business communication, see our comparison of AI voice solutions and our analysis of what the latest model releases mean for businesses.

Sources

Introducing GPT-5.3-Codex - OpenAI
Introducing GPT-5.3-Codex-Spark - OpenAI
OpenAI Codex - OpenAI

OpenAI GPT-5.3-Codex and Codex-Spark: Real-Time Coding AI [2026]

GPT-5.3-Codex: Stronger Reasoning Meets Faster Execution

Codex-Spark: 1,000 Tokens Per Second and Real-Time Interaction

How Coding AI Connects to Voice AI

Sources

Say goodbye to your old-fashioned voicemail.

Safina Support Bot