GPT-5.4 arrives with native computer use, 1M context, and a new tool search capability

OpenAI released GPT-5.4 today, billing it as its most capable model to date for professional work, coding, and automated tasks. Two days ago it also updated ChatGPT's everyday model, GPT-5.3 Instant, in response to widespread complaints about the previous version's condescending tone.

All benchmark results below come from OpenAI's own testing and have not been independently verified.

GPT-5.4: what's new and what it means

GPT-5.4 is available in ChatGPT as GPT-5.4 Thinking, in the API, and in Codex.

Benchmark comparison table from OpenAI showing GPT-5.4 Thinking and GPT-5.4 Pro results alongside GPT-5.3 Codex, GPT-5.2 Thinking, Claude Opus 4.6, and Gemini 3.1 Pro across eight evaluations including OSWorld-Verified, GDPval, BrowseComp, SWE-Bench Pro, GPQA Diamond, FrontierMath, and Toolathlon. All models were run at maximum available reasoning effort. Anthropic and Google figures are grayed out. — OpenAI's benchmark comparison for GPT-5.4, published March 5, 2026.

OpenAI's central claim for GPT-5.4 is that it's better at doing real work. That is, not just answering questions, but completing multi-step professional tasks like building spreadsheets, drafting documents, and using software.

On a benchmark that asked models to produce actual work products across 44 occupations, OpenAI reports GPT-5.4 matched or exceeded human professionals in 83% of comparisons, up from 71% for the previous model. The same benchmark puts Claude Opus 4.6 at 78%, though (important to note) that figure also comes from OpenAI's testing.

The other major addition is native computer use. GPT-5.4 is the first general-purpose OpenAI model that can directly operate a computer (clicking, typing, navigating desktop apps and browsers) without relying on a separate specialized model to handle that layer. OpenAI reports a 75% success rate on a desktop navigation benchmark, which it says surpasses both GPT-5.2 (47.3%) and a human baseline (72.4%) from the original benchmark paper. Anthropic's Claude Opus 4.6 scores 72.7% on the same test, per OpenAI's figures.

In ChatGPT, GPT-5.4 Thinking now outlines its approach before it starts working on a longer task, and users can interrupt mid-response to redirect it — available now on web and Android, with iOS coming soon. The model includes “experimental” support for up to 1 million tokens of context in the API and Codex, matching context windows offered by Anthropic and Google.

OpenAI also says GPT-5.4 produces fewer errors: individual factual claims are 33% less likely to be false compared to GPT-5.2, and full responses are 18% less likely to contain any errors. These figures come from internal evaluations.

One of the first external data points comes from CodeRabbit, an AI code review tool that said it had early access to GPT-5.4 and tested it against its own evaluation harness. The company reported the model performed well on agentic coding and review tasks, and observed that it uses tokens more efficiently than GPT-5.3-Codex on simple tasks, though it noted that token usage rises on more complex ones as the model's thinking traces grow longer.

That trade-off adds some texture to OpenAI's efficiency claims, which don't distinguish between task types. CodeRabbit has a commercial relationship with the AI developer ecosystem, so its findings are best read as an early signal rather than independent validation.

Availability: GPT-5.4 Thinking is rolling out today to ChatGPT Plus, Team, and Pro users, replacing GPT-5.2 Thinking. GPT-5.4 Pro is available on Pro and Enterprise plans. Free tier users also get GPT-5.4, but only when auto-routed. GPT-5.2 Thinking will be retired June 5, 2026. In the API, GPT-5.4 is gpt-5.4 and GPT-5.4 Pro is gpt-5.4-pro.

Model	Input	Output
gpt-5.2	$1.75 / M tokens	$14 / M tokens
gpt-5.4	$2.50 / M tokens	$15 / M tokens
gpt-5.2-pro	$21 / M tokens	$168 / M tokens
gpt-5.4-pro	$30 / M tokens	$180 / M tokens

Faster coding

In Codex, with /fast mode enabled, OpenAI says GPT-5.4 runs 1.5x faster with the same intelligence and reasoning.

GPT-5.3 Instant: fixing the "cringe"

The update to ChatGPT's everyday model, the one most users interact with, was driven almost entirely by user frustration.

GPT-5.2 Instant had a habit of responding to ordinary questions as if the person asking was in some kind of distress, offering unsolicited reassurances and reminders to calm down, even when the user was just looking for information. Users described feeling condescended to, and the complaints spread widely enough that OpenAI acknowledged them directly; its own announcement tweet said the new model "reduces the cringe."

GPT-5.3 Instant is designed to skip the preamble and answer the question. It also handles web search results better, synthesizing what it finds with its own knowledge rather than returning loose lists of links. OpenAI says it's better at creative writing too, producing prose that conveys emotion through specific detail rather than just stating it.

Whether these changes hold up in day-to-day use is something users will be able to judge now — GPT-5.3 Instant is available to all ChatGPT users and in the API as gpt-5.3-chat-latest. GPT-5.2 Instant will be retired June 3, 2026.

GPT-5.4 arrives with native computer use, 1M context, and a new tool search capability

GPT-5.4: what's new and what it means

Faster coding

GPT-5.3 Instant: fixing the "cringe"

References