Anthropic releases Claude Sonnet 4.6, bringing near-flagship performance to its mid-tier model

Anthropic released Claude Sonnet 4.6 on Tuesday, upgrading its mid-tier AI model with improved coding, computer use, and long-context reasoning capabilities. The company says the model approaches the performance of its flagship Opus line on several benchmarks, while remaining priced at $3 per million input tokens and $15 per million output tokens (unchanged from Sonnet 4.5). ^[1]

The release comes twelve days after the launch of Claude Opus 4.6 on February 5 ^[2], and represents the first Sonnet upgrade since version 4.5 arrived on September 29, 2025 .^[3] Sonnet 4.6 is now the default model for Free and Pro plan users in claude.ai and Claude Cowork.^[1]

YouTube video

YouTube

What changed

Anthropic describes Sonnet 4.6 as "a full upgrade of the model's skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design."

The model features a 1 million token context window in beta. ^[4]

On the coding side, Anthropic reports that early testers in Claude Code preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. Users reported that the model more effectively read context before modifying code and consolidated shared logic rather than duplicating it. ^[1]

Users even preferred Sonnet 4.6 over Claude Opus 4.5 (Anthropic's previous version of its flagship model from November 2025) 59% of the time. Testers rated the model as significantly less prone to overengineering and "laziness," and meaningfully better at instruction following. They reported fewer false claims of success, fewer hallucinations, and more consistent follow-through on multi-step tasks. ^[1]

The model's training data knowledge cutoff is May 2025. Anthropic's system card states that Sonnet 4.6 "was trained on a proprietary mix of publicly available information from the internet up to May 2025". ^[5]

Benchmarks

Anthropic published a benchmark comparison table as an image in its announcement.^[1]

claude sonnet 4.6 benchmark table — Image: Anthropic

On SWE-bench Verified, a standard measure of real-world software engineering ability, Sonnet 4.6 scored 79.6% — close to Opus 4.6's 80.8%. Anthropic's footnotes note that this score was averaged over 10 trials, and that a prompt modification yielded 80.2%. ^[1]

On ARC-AGI-2, a test designed to measure general reasoning, Anthropic's benchmark table shows Sonnet 4.6 at 58.3%. Anthropic's footnotes clarify that this reflects "max effort," while a "high effort" run achieved 60.4%, which is an unusual case where lower effort produced a better result on this particular benchmark.

These benchmarks are self-reported by Anthropic. The company provides methodological footnotes for each benchmark, including configuration details and limitations. Independent verification may produce different results.

Computer use

One of the more notable aspects of Sonnet 4.6 is its progress on computer use, which is the ability of a model to interact with software by clicking, typing, and navigating interfaces, without relying on APIs.

Anthropic first introduced this capability in October 2024, calling it at the time "still experimental — at times cumbersome and error-prone".

The company says its Sonnet models have made "steady gains" on OSWorld across the sixteen months since.

Anthropic's chart in the announcement (image shown above) shows the full trajectory: Sonnet 3.5 scored 14.9%, Sonnet 3.7 reached 28.0%, Sonnet 4 hit 42.2%, Sonnet 4.5 climbed to 61.4% in October 2025, and Sonnet 4.6 now scores 72.5%.

Anthropic notes that scores prior to Sonnet 4.5 were measured on the original OSWorld benchmark, while scores from Sonnet 4.5 onward use OSWorld-Verified, an updated version released in July 2025.^[1]

Anthropic acknowledges that the model "certainly still lags behind the most skilled humans at using computers." However, the company says early users are seeing human-level capability in tasks like navigating complex spreadsheets, filling out multi-step web forms, and pulling information together across multiple browser tabs [1].

Enterprise response

Several companies with early access provided testimonials in Anthropic's official announcement [1].

Box CTO Ben Kus said the model "demonstrated significant improvements, outperforming Claude Sonnet 4.5 in heavy reasoning Q&A by 15 percentage points." Databricks CTO of Neural Networks Hanlin Tang said Sonnet 4.6 "matches Opus 4.6 performance on OfficeQA," which measures how well a model reads enterprise documents, pulls facts, and reasons from them. Pace CEO Jamie Cuffe said the model "hit 94% on our insurance benchmark, making it the highest-performing model we've tested for computer use". ^[1]

Cursor CEO Michael Truell described it as "a notable improvement over Sonnet 4.5 across the board, including long-horizon tasks." GitHub VP of Product Joe Binder said the model is "excelling at complex code fixes, especially when searching across large codebases is essential." Windsurf CEO Jeff Wang said that "for the first time, Sonnet brings frontier-level reasoning in a smaller and more cost-effective form factor". ^[1]

These are company-provided testimonials published in Anthropic's announcement, not independent evaluations.

Ecosystem adoption

Within hours of the announcement, all three major cloud platforms confirmed Sonnet 4.6 availability. Amazon Web Services announced the model on Amazon Bedrock, calling it "Anthropic's most advanced computer use model".

Google Cloud Tech announced availability on Vertex AI.

And Microsoft Azure announced availability in Microsoft Foundry.

The developer tools ecosystem moved just as quickly. GitHub announced that Sonnet 4.6 is generally available and rolling out in GitHub Copilot, saying early testing shows it "excels on agentic coding" and "is particularly successful in search operations".

Cursor confirmed same-day availability, with its own benchmarks showing the model as "a notable improvement over Sonnet 4.5 on longer tasks, but below Opus 4.6 for intelligence," a more measured assessment than the testimonial Cursor CEO Michael Truell provided for Anthropic's blog.

Kiro, Amazon's AI-powered IDE, also added Sonnet 4.6, noting that the model "delivers reasoning closer to Opus and improves token efficiency over Sonnet 4.5".

Warp, the AI-native terminal, added Sonnet 4.6 with extended thinking and a "Max" mode for high-intelligence tasks.

Beyond developer tools, Figma announced Sonnet 4.6 support in Figma Make, its AI design tool.

Notion added the model to its AI model picker.

Perplexity made Sonnet 4.6 available to all Pro and Max subscribers, with a "Thinking" toggle enabled.

The breadth and speed of third-party adoption is notable. All ten announcements were made within hours of Anthropic's blog post on the same day, suggesting coordinated early access and pre-launch integration work across the ecosystem.

Free tier and availability

Anthropic has upgraded its free tier to default to Sonnet 4.6, now including features previously reserved for paid plans: file creation, connectors, skills, and context compaction. The model is available through the Claude API using the identifier claude-sonnet-4-6, as well as through Claude Code, Claude Cowork, and all major cloud platforms.

On the developer platform, Sonnet 4.6 supports both adaptive thinking and extended thinking, with context compaction available in beta. Context compaction automatically summarizes older context as conversations approach limits, increasing effective context length.

Anthropic also announced that its web search and web fetch tools now automatically write and execute code to filter and process search results, keeping only relevant content in context. Several additional developer tools have moved to general availability alongside this release: code execution, memory, programmatic tool calling, tool search, and tool use examples. ^[1]

For Claude in Excel users, the add-in now supports MCP connectors, enabling Claude to pull external data from tools like S&P Global, FactSet, PitchBook, Moody's, LSEG, and Daloopa without leaving the spreadsheet. This is available on Pro, Max, Team, and Enterprise plans. ^[1]

Pricing in context

Sonnet 4.6 remains priced at $3 per million input tokens and $15 per million output tokens. Anthropic's flagship Opus 4.6 is priced at $5 per million input tokens and $25 per million output tokens. Sonnet 4.6 is therefore roughly 60% the cost of the current Opus model.

Anthropic stated in its announcement that "performance that would have previously required reaching for an Opus-class model — including on real-world, economically valuable office tasks — is now available with Sonnet 4.6."

The company also noted that "Opus 4.6 remains the strongest option for tasks that demand the deepest reasoning, such as codebase refactoring, coordinating multiple agents in a workflow, and problems where getting it just right is paramount".

Safety

The company reported that Sonnet 4.6 is a "major improvement" over Sonnet 4.5 in resistance to prompt injection attacks during computer use, and performs similarly to Opus 4.6 in this area.

Business context

The release arrives during a period of rapid growth for Anthropic. On February 12, the company announced a $30 billion Series G funding round at a $380 billion post-money valuation. The round was led by Coatue and Singapore sovereign wealth fund GIC.^[6]

Anthropic reported that its annualized revenue has reached $14 billion, growing more than tenfold annually over the past three years. Customers spending over $100,000 annually on Claude have grown sevenfold in the past year, and the company said it now has more than 500 customers spending at least $1 million annually.

Why this matters

The practical significance of this release lies in the narrowing gap between Anthropic's mid-tier and flagship models. Sonnet 4.6 performs within a few percentage points of Opus 4.6, while costing roughly 60% as much. For businesses deploying AI agents or coding assistants at scale, that changes the cost-performance calculation.

The speed of ecosystem adoption underscores this. All three major cloud platforms, four coding-focused tools, and three productivity platforms announced Sonnet 4.6 integration within hours of launch, suggesting that partners had been testing the model in advance and were ready to ship on day one.

For individual users, the upgrade to the free tier is notable. Features like file creation, connectors, and context compaction (previously available only on paid plans) are now accessible to everyone.

Whether benchmark improvements translate proportionally into real-world gains remains to be tested more broadly by the developer and enterprise community. Cursor's independent assessment, that the model is improved over Sonnet 4.5 but "below Opus 4.6 for intelligence," suggests the reality may be more nuanced than the benchmarks alone convey.

Anthropic releases Claude Sonnet 4.6, bringing near-flagship performance to its mid-tier model

What changed

Benchmarks

Computer use

Enterprise response

Ecosystem adoption

Free tier and availability

Pricing in context

Safety

Business context

Why this matters

References

Claude Code now remembers what it learns between sessions

Google launches Nano Banana 2, bringing pro-level image generation to its Flash model

Anthropic launches Remote Control for Claude Code, enabling mobile access

Google rolls out Lyria 3 music generation in the Gemini app