OpenAI unveils ultra-fast “Codex-Spark” model for real-time coding

OpenAI has launched a research preview of GPT-5.3-Codex-Spark, a new version of its Codex coding model designed to respond quickly enough for real-time software work.^[1]

The company says the model is optimized for interactive tasks such as quick edits, refactoring, and rapid iteration, and that it can produce more than 1,000 tokens per second when served on ultra-low latency hardware.

OpenAI has emphasized that Spark is a “research preview,” meaning capabilities, access, and limits may change as it gathers feedback and usage data.

At launch, OpenAI said Codex-Spark supports text-only input with a 128k context window, and plans to expand the “ultra-fast” line with larger models, longer context, and multimodal input over time.

Prior to the official announcement, OpenAI CEO Sam Altman posted on X:“We have a special thing launching to Codex users on the Pro plan later today. It sparks joy for me.”

openai-GPT‑5.3‑Codex‑Spark-benchmark-results — Image: OpenAI

In its announcement, OpenAI described Codex-Spark as a smaller model built specifically for a tight back-and-forth workflow, where delays can break concentration.

OpenAI said the system benefits from changes across the serving pipeline, including reductions in “time-to-first-token” and per-token overhead, alongside a shift toward persistent connections for lower-latency interactions.

The company positioned Spark as complementary to longer-running Codex work, where an agent can spend more time reasoning through bigger tasks.

Codex-Spark is also the first public product tied to OpenAI’s recent partnership with Cerebras, a chip company known for wafer-scale processors.

Cerebras said Codex-Spark runs on its Wafer Scale Engine 3 systems and that the model is rolling out as a research preview to ChatGPT Pro users across the Codex app, the Codex CLI, and a VS Code extension, with API access beginning with a limited set of partners.^[2]

OpenAI and Cerebras announced their broader relationship in January, describing it as an effort to add ultra-low latency compute to OpenAI’s platform.^[3]

The Spark release follows OpenAI’s introduction of the Codex app for macOS earlier this month, which the company describes as a hub for working with coding agents across different environments, including the CLI and IDE integrations.

The release comes as AI coding tools become more common in professional software development, with companies competing to make assistants not only more capable, but also faster and more “live” in how they collaborate with developers.

AI performance is increasingly tied to infrastructure and hardware choices, not only model training. OpenAI’s use of a Cerebras-based serving tier signals that “fast enough to feel instant” may require specialized deployment paths, especially as users expect agents to operate continuously inside their tooling.

OpenAI unveils ultra-fast “Codex-Spark” model for real-time coding

References

Gemini 3.1 Pro claims top-tier reasoning gains

"Fast Mode" for Claude Opus 4.6, 2.5x Speed at 6x the Price