OpenAI unveils ultra-fast “Codex-Spark” model for real-time coding
Written by Joseph Nordqvist/February 13, 2026 at 2:42 AM UTC
3 min read
OpenAI has launched a research preview of GPT-5.3-Codex-Spark, a new version of its Codex coding model designed to respond quickly enough for real-time software work.[1]
The company says the model is optimized for interactive tasks such as quick edits, refactoring, and rapid iteration, and that it can produce more than 1,000 tokens per second when served on ultra-low latency hardware.
OpenAI has emphasized that Spark is a “research preview,” meaning capabilities, access, and limits may change as it gathers feedback and usage data.
At launch, OpenAI said Codex-Spark supports text-only input with a 128k context window, and plans to expand the “ultra-fast” line with larger models, longer context, and multimodal input over time.
Prior to the official announcement, OpenAI CEO Sam Altman posted on X:“We have a special thing launching to Codex users on the Pro plan later today. It sparks joy for me.”

In its announcement, OpenAI described Codex-Spark as a smaller model built specifically for a tight back-and-forth workflow, where delays can break concentration.
OpenAI said the system benefits from changes across the serving pipeline, including reductions in “time-to-first-token” and per-token overhead, alongside a shift toward persistent connections for lower-latency interactions.
The company positioned Spark as complementary to longer-running Codex work, where an agent can spend more time reasoning through bigger tasks.
Codex-Spark is also the first public product tied to OpenAI’s recent partnership with Cerebras, a chip company known for wafer-scale processors.
Cerebras said Codex-Spark runs on its Wafer Scale Engine 3 systems and that the model is rolling out as a research preview to ChatGPT Pro users across the Codex app, the Codex CLI, and a VS Code extension, with API access beginning with a limited set of partners.[2]
OpenAI and Cerebras announced their broader relationship in January, describing it as an effort to add ultra-low latency compute to OpenAI’s platform.[3]
The Spark release follows OpenAI’s introduction of the Codex app for macOS earlier this month, which the company describes as a hub for working with coding agents across different environments, including the CLI and IDE integrations.
The release comes as AI coding tools become more common in professional software development, with companies competing to make assistants not only more capable, but also faster and more “live” in how they collaborate with developers.
AI performance is increasingly tied to infrastructure and hardware choices, not only model training. OpenAI’s use of a Cerebras-based serving tier signals that “fast enough to feel instant” may require specialized deployment paths, especially as users expect agents to operate continuously inside their tooling.
Written by
Joseph Nordqvist
Joseph founded AI News Home in 2026. He studied marketing and later completed a postgraduate program in AI and machine learning (business applications) at UT Austin’s McCombs School of Business. He is now pursuing an MSc in Computer Science at the University of York.
This article was written by the AI News Home editorial team with the assistance of AI-powered research and drafting tools. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines
References
- 1.
- 2.
Introducing OpenAI GPT-5.3-Codex-Spark Powered by Cerebras — James Wang, Cerebras, February 12, 2026
- 3.
OpenAI Partners with Cerebras to Bring High-Speed Inference to the Mainstream — Andrew Feldman, Cerebras, January 16, 2026
Was this useful?