Claude's 1M context window is now generally available with no long-context pricing premium

Anthropic has moved the 1 million token context window for Claude Opus 4.6 and Sonnet 4.6 from beta to general availability, and eliminated the long-context pricing premium entirely. A 900,000-token request is now billed at the same per-token rate as a 9,000-token one.

The announcement, made on 13 March 2026, represents two changes rolled into one. First, the 1M context window no longer requires a beta header — requests exceeding 200,000 tokens work automatically on the API. Second, and more consequentially for developers, the tiered pricing structure that applied during the beta period has been removed.

The pricing change, explained

During the beta, any API request exceeding 200,000 input tokens triggered premium rates: 2x on input tokens and 1.5x on output tokens. For Opus 4.6, that meant $10 per million input tokens and $37.50 per million output tokens once the threshold was crossed. The premium applied to all tokens in the request, not just those above 200K, creating a sharp cost cliff at the boundary.

That pricing structure is now gone. Standard rates apply across the full 1M window: $5/$25 per million tokens for Opus 4.6 and $3/$15 for Sonnet 4.6, regardless of request size. For developers who were already working with long contexts, this halves input costs and cuts output costs by roughly a third.

Existing code that includes the beta header will continue to function — Anthropic has maintained backward compatibility by simply ignoring the header rather than returning an error.

What else changed

Beyond the pricing, two other updates shipped alongside the GA announcement.

Media limits have increased sixfold. Requests can now include up to 600 images or PDF pages, up from the previous limit of 100. This expanded media limit is available on the Claude Platform, Microsoft Foundry, and Google Cloud's Vertex AI, with Amazon Bedrock support coming soon.

For Claude Code users on Max, Team, and Enterprise plans, Opus 4.6 sessions now default to the full 1M context window automatically. Anthropic says this means fewer compactions — the process by which the system summarises older context to free up space — and more of the conversation kept intact. During the beta, 1M context in Claude Code required extra usage.

Long-context recall benchmarks

A large context window is only useful if the model can still retrieve and reason over information placed throughout it. Anthropic cites two benchmarks to support the claim that its models hold up at scale.

Opus 4.6 scores 78.3% on MRCR v2, a needle-in-a-haystack style benchmark that hides multiple pieces of key information within a 1M token context and asks the model to find all of them. Sonnet 4.6 scores 68.4% on GraphWalks BFS at the same context length. Both are reported as the highest scores among frontier models at full context length, per Anthropic's own measurements.

For comparison, Anthropic previously reported that Sonnet 4.5 scored just 18.5% on the 8-needle 1M variant of MRCR v2, making the Opus 4.6 result a significant generational improvement. (The Opus 4.6 launch announcement in February cited 76%, which the Opus 4.6 system card identifies as the max-effort thinking score; 78.3% is the same model's score with a 64k thinking budget. The GA blog uses the higher figure.) These are Anthropic's self-reported figures and have not been independently verified by a third party at the time of writing.

It is worth noting that benchmark recall does not automatically translate to equivalent performance on real-world synthesis tasks. Princeton NLP's HELMET benchmark (ICLR 2025), which evaluated over 50 models from 8K to 128K tokens across seven task categories, found that even frontier models degrade significantly on complex tasks like re-ranking and generation with citations as context length increases — though simpler retrieval tasks held up better. Whether Claude maintains comparable quality on production workloads requiring synthesis across 500K+ tokens remains an open question that needle-in-a-haystack benchmarks do not fully address.

Competitive context

Google's Gemini 2.5 Pro offers a 1M token context window but continues to apply premium pricing above 200K tokens. OpenAI's GPT-5.2, launched in December 2025, supports a 400,000-token context window at $1.75/$14 per million tokens. The newer GPT-5.4 introduced a 1.05M token window but applies a 2x input and 1.5x output premium for prompts exceeding 272K tokens — essentially the same pricing structure Anthropic just dropped. By removing the long-context premium entirely, Anthropic has made a pricing move that neither of its direct competitors has matched at the time of this announcement.

The practical significance will depend on how many developers were actually hitting the 200K threshold regularly. For the subset building agentic systems, processing large codebases, or working with lengthy document sets — the use cases Anthropic highlights in its announcement — the cost reduction is material.

Availability

The 1M context window at standard pricing is available today on the Claude Platform, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry. The increased media limit of 600 images or PDF pages is available on all of these except Amazon Bedrock, where it is listed as coming soon. No usage tier restrictions were mentioned in the GA announcement, suggesting the previous tier 4 requirement from the beta period has been lifted, though Anthropic's API documentation had not yet been updated to reflect the GA pricing at the time of writing — developers may still see references to beta pricing and tier requirements in the docs.