OpenAI GPT-5.3-Codex and Codex-Spark: Real-Time Coding AI [2026]

OpenAI's GPT-5.3-Codex brings stronger reasoning to agentic coding. Codex-Spark hits 1000+ tokens/sec for real-time interaction. What it means for AI development.

David Schemm David Schemm

OpenAI shipped two models in February 2026, and they tell different stories. GPT-5.3-Codex is the next step in their agentic coding line, building on GPT-5.2-Codex with stronger reasoning and a 25% speed bump. Codex-Spark is something new entirely: a smaller model designed from scratch for real-time coding, running at over 1,000 tokens per second on Cerebras hardware.

Both models live inside the OpenAI Codex platform, a cloud-based coding environment where AI agents can read entire repositories, write code, run tests, and iterate on results. The platform works through a CLI tool and VS Code extensions, with everything running in sandboxed environments.

But the real story isn’t just about coding. These releases point to a direction in AI development that matters for any product built on language models, including voice AI agents.

GPT-5.3-Codex: Stronger Reasoning Meets Faster Execution

GPT-5.2-Codex already topped the charts when it launched. It was OpenAI’s first model built specifically for agentic coding, meaning it wasn’t just generating code snippets. It could work through multi-step software engineering tasks like a developer: reading code, understanding architecture, making changes, running tests, and fixing failures.

GPT-5.3-Codex keeps that agentic ability and adds better reasoning on top. The model posted top scores on SWE-Bench Pro, the harder version of the standard SWE-bench that uses more complex, real-world GitHub issues. It also leads on Terminal-Bench 2.0, which tests whether a model can operate effectively in a terminal environment, running commands, interpreting output, and deciding what to do next.

The 25% speed improvement over GPT-5.2-Codex matters more than it sounds. When an agentic coding model works on a task, it’s not making one prediction. It’s running through cycles of reading, reasoning, writing, testing, and revising. Each cycle involves multiple model calls. A 25% speedup across all of those calls compounds. A task that took 8 minutes now takes 6. Over hundreds of tasks per day, that’s hours saved.

For developers, the practical difference is that Codex can now handle larger refactoring jobs, longer test suites, and more complex multi-file changes without timing out or losing coherence partway through.

Codex-Spark: 1,000 Tokens Per Second and Real-Time Interaction

Codex-Spark is the more surprising release. OpenAI calls it the first model designed for real-time coding, and the numbers back that up.

Most large language models produce output at somewhere between 30 and 150 tokens per second. That’s fast enough to feel responsive for chat, but it creates noticeable delays when generating longer code blocks. You ask for a function, wait a few seconds, and the code streams in.

Codex-Spark runs at over 1,000 tokens per second. At that speed, a 200-line function appears in under a second. The experience shifts from “waiting for the AI to finish” to “the AI keeps up with your thinking.”

This speed comes from a partnership with Cerebras and their Wafer Scale Engine 3. Unlike traditional GPU clusters where data moves between separate chips, the Cerebras architecture puts everything on a single wafer-scale chip. The result is dramatically lower latency and higher throughput for inference. OpenAI built Codex-Spark specifically to take advantage of this hardware.

At launch, Codex-Spark is a research preview available to ChatGPT Pro users. It comes with a 128K context window and handles text only (no image input). The model is smaller than full GPT-5.3-Codex, so it trades some depth of reasoning for speed. Think of it as the model you’d use for rapid iteration, autocomplete on steroids, quick edits, and interactive pair programming. When you need the model to carefully reason through a complex architectural decision, you’d still reach for full GPT-5.3-Codex.

How Coding AI Connects to Voice AI

This is a blog about an AI phone assistant, so why cover coding models? Because the tools that build AI products shape how fast those products improve.

Voice agents are complex software systems. A product like Safina involves real-time speech processing, language model inference, text-to-speech synthesis, telephony integration, conversation state management, and dozens of edge cases around accents, background noise, and caller behavior. The architecture behind real-time voice AI has many moving parts.

Building and maintaining that kind of system is hard. When a coding AI can read the entire codebase, understand how components interact, and produce correct changes across multiple files, the development team moves faster. Bug fixes happen in minutes instead of hours. New features get prototyped in a day instead of a week. Test coverage expands because the AI writes the tests too.

This isn’t theoretical. Teams that adopt agentic coding tools report shipping 2-3x more code per week, with fewer regressions. For AI products specifically, that feedback loop is even more valuable. Every improvement to a voice agent needs testing with real conversations, real edge cases, and real telephony conditions. Faster development means more iterations, and more iterations mean a better product.

There’s also a parallel that goes beyond coding. Codex-Spark’s move to real-time interaction (1,000+ tokens/sec) mirrors exactly what’s happening in voice AI. Phone conversations can’t wait. When a caller asks a question, the AI needs to respond within a few hundred milliseconds or the conversation feels broken. The entire voice AI industry is chasing lower latency, from speech-to-text to text-to-speech to the language model in between.

Both fields are converging on the same insight: AI that works in real-time is a different product than AI that works in batch. A coding model at 100 tokens/sec is a tool you query. At 1,000 tokens/sec, it’s a collaborator. A voice model at 500ms latency is a robot you talk at. At 200ms latency, it’s a conversation partner.

The Cerebras hardware approach behind Codex-Spark is interesting for voice AI too. If specialized silicon can push language model inference to 1,000+ tokens per second for coding, similar hardware-software co-design could push voice AI latency even lower. We’re not there yet for phone conversations, but the direction is clear.

For a broader look at how AI model improvements affect business communication, see our comparison of AI voice solutions and our analysis of what the latest model releases mean for businesses.

Sources

9:41

Safina handled 51 calls this week

46

Trustworthy

4

Suspicious

1

Dangerous

Last 7 days
Filter
EM
Emma Martin 67s 15:30

Wants to discuss the offer for the new campaign and has questions about the timeline.

LS
Laura Smith 54s 14:45

Asking about the order status and when the delivery arrives.

TH
Tim Miller 34s 13:10

Schedule a meeting for the project discussion next week.

Unknown 44s 11:30

Prize promise – probably spam.

SK
Sarah King 10s 09:15

Complaint about the last order, asks for a callback.

MM
Mike Mitchell 95s Dec 13

Wants to discuss a potential collaboration.

AR
Amy Roberts 85s Dec 13

Is your colleague and wants to discuss the project.

JK
Jack Kennedy 42s Dec 12

Asking about available appointments next week.

LB
Lisa Brown 68s Dec 12

Has questions about the invoice and asks for clarification.

Calls
Safina
Contacts
Profile
9:41
Call from Emma Martin
Dec 12
11:30
67s
+12125551234

Wants to discuss the offer for the new campaign and has questions about the timeline.

Key points

  • Call back Emma Martin
  • Clarify timeline & pricing questions
Call back
Edit contact

AI Insights

Caller mood Very good

The caller was cooperative and provided the needed information.

Urgency Low

The caller can wait for a response.

Audio & Transcript

0:16

Hello, this is Safina AI, Peter's digital assistant. How can I help you?

Hi Safina, this is Emma Martin. I wanted to discuss the offer and the timeline.

Thanks, Emma. Are you mainly deciding between the Standard and Pro package for the launch?

Exactly. We need the Pro package and would like to start next month if onboarding is possible in week one.

Say goodbye to your old-fashioned voicemail.

Try Safina for free and start managing your calls intelligently.