The pressure test: What February 2026 revealed about AI's safety promises
Written by Joseph Nordqvist/March 2, 2026 at 4:53 AM UTC
11 min readIn February 2026, Anthropic dismantled the most concrete safety promise it had ever made to the public. The very same month, it was ordered out of every U.S. federal agency by presidential directive and designated as a "supply chain risk".
The timing was not coincidental. But the story is more complicated, and ultimately more instructive, than it first appears.
February 2026 was the month the hardest commitments voluntary AI safety governance had ever produced came under their most direct pressure and didn't hold.
It arrived against a backdrop that had been forming for months. In November 2025, OpenAI's 2024 IRS tax filing became public and it revealed that the word "safely" had disappeared from the company's mission statement. Where it had read to build general-purpose artificial intelligence “that safely benefits humanity, unconstrained by a need to generate financial return," it now read simply: "ensure that artificial general intelligence benefits all of humanity."
OpenAI addressed the change when announcing its restructuring, saying the rephrasing was intended to serve the same goal. Critics disagree. As Alnoor Ebrahim, a professor of international business at Tufts' Fletcher School who first flagged the change, said: "In my view, these changes explicitly signal that OpenAI is making its profits a higher priority than the safety of its products.”
Then came Anthropic.
The RSP: what it was, and what replaced it
Anthropic's case is harder to categorize, and deserves careful examination rather than easy condemnation.
In 2023, Anthropic introduced its Responsible Scaling Policy. The RSP is a framework that committed the company to never training or deploying an AI system unless it could guarantee in advance that its safety measures were adequate. This was not vague corporate language. It was a specific, binary, tripwire commitment: if AI capabilities outstripped safety research, development would stop. Full stop. Anthropic's leaders cited this policy repeatedly as evidence that the company would resist the competitive pressures pushing every other lab to ship faster and worry about consequences later.
The policy was revised three times between 2024 and 2025 (adjustments to capability thresholds, security standards, and insider threat provisions) but its core commitment remained intact.
But on February 24, 2026, Anthropic gutted it.
The new Responsible Scaling Policy (version 3.0 [PDF]) replaces hard stops with "Frontier Safety Roadmaps" (public documents outlining planned safety milestones) and "Risk Reports" published every three to six months.
Rather than a binary halt trigger, Anthropic will now only delay development if it believes it holds a significant lead over competitors, or if there is strong evidence that all competitors are already implementing robust safety measures. If competitors are advancing with weaker safeguards, Anthropic says it "will not necessarily delay AI development and deployment." "Rather than being hard commitments," the company acknowledged in its blog post, "these are public goals that we will openly grade our progress towards."
Translation: the unconditional guarantee is gone. What replaces it is transparency: a commitment to publish, not a commitment to necessarily stop.
The Pentagon backdrop
Here is where the story becomes genuinely difficult to read cleanly.
The day Anthropic published its new RSP, Defense Secretary Pete Hegseth was in a meeting with Anthropic CEO Dario Amodei issuing an ultimatum: drop its restrictions and allow the military to use Claude for any lawful purpose without Anthropic being able to object to specific use cases, or face the cancellation of Anthropic's $200 million Pentagon contract.
Anthropic has two specific red lines, which were: Claude must not be used for the mass domestic surveillance of Americans, and it must not be used in fully autonomous weapons systems.
But the red lines themselves were not the only point of contention and arguably not the most significant one. The core of Hegseth's demand was that Anthropic drop its restrictions and allow Claude to be used for any “lawful” purpose. Anthropic's refusal was a rejection of that framing itself.
By Friday, February 27, those negotiations had collapsed. President Trump posted on Truth Social calling every federal agency to "immediately cease all use of Anthropic's technology" — though he included a six-month phase-out period for agencies like the DoD already using Claude. Hegseth designated Anthropic a “supply chain risk to national security,” which is a label typically reserved for foreign adversaries and one Anthropic says it will challenge in court as legally unsound.
OpenAI, within hours, announced it had struck its own deal with the Pentagon.
Now here is the complication. While Anthropic gutted its binding RSP commitment (the one that said development would halt if safety couldn't be guaranteed) it held firm to its core beliefs under enormous government pressure. The company threatened to sue. Amodei published a lengthy public statement. Even employees at rival companies publicly rallied behind the position with over 700 Google and OpenAI employees signing an open letter titled "We Will Not Be Divided" protesting the government pressure.
So which is the real Anthropic? The one that walked back its industry-defining safety promise? Or the one that absorbed a federal blacklisting rather than allow its AI to be used in fully autonomous weapons?
The honest answer is: both. And understanding why both can be true at once tells us something important about the state of AI governance.
Why voluntary commitments were always fragile
Anthropic's own explanation for why it gutted the RSP's core commitment is worth taking seriously, even if its critics are right to be skeptical. The company cited three specific forces: a growing "zone of ambiguity" around what capability thresholds actually mean in practice; an increasingly anti-regulatory political climate in Washington; and key requirements at higher RSP levels that were essentially impossible to meet without industry-wide coordination that never materialized.
That last point is the most revealing. When Anthropic introduced the RSP in 2023, part of the theory was that it would encourage competitors to adopt similar policies — a "race to the top" rather than a race to the bottom. Instead, competitors adopted broadly similar frameworks but without the hard halt commitment, and Anthropic found itself in the position of being the only major lab theoretically willing to halt development unilaterally. Its chief science officer, Jared Kaplan, put it bluntly in an interview with Time: "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead."
This is the fundamental structural flaw in voluntary safety commitments, and it has always been there. A commitment that only you are willing to keep is not a constraint on the industry, it is a competitive disadvantage. Reputational consequences exist, and they are not nothing: Anthropic's blacklisting generated significant public criticism, and the open letters from rival employees suggest the industry is watching.
But reputation is a soft constraint. As long as there is no binding regulation, no enforceable standard, no mechanism by which a company that walks back its safety commitments faces consequences beyond bad press, every voluntary commitment is subject to revision the moment the cost of keeping it becomes real.
Chris Painter, the director of policy at AI evaluation nonprofit METR, who reviewed an early draft of Anthropic's new RSP, told Time that the shift signals that the company "believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities." That is a sobering assessment from a sympathetic reviewer.
The transparency defense, examined
Anthropic has offered transparency as compensation for what it gave up in hard commitments. The new RSP commits to publishing detailed risk reports every three to six months and frontier safety roadmaps laying out concrete milestones. The company argues that continuous disclosure is more meaningful than binary tripwires.
There is a genuine argument here, and it would be intellectually dishonest to dismiss it entirely. Hard capability thresholds are difficult to define and game-able by labs motivated to skirt them. A regime of regular, detailed public disclosure (if it is actually rigorous and independently reviewed) could in theory provide more useful accountability than a binary halt trigger that everyone knows will never actually be pulled.
But the argument only holds if the transparency is genuine, if the review is independent, and if there are consequences for falling short. Do those conditions currently exist? Anthropic grades its own progress. The "third-party review" mechanisms are not yet triggered for any model, and without enforcement authority even when they are. And critically: a commitment to publish is structurally different from a commitment to stop. Publishing a report that says your safety research is lagging behind your capabilities does not slow your capabilities.
The cynical read is available: that Anthropic has discovered the language of transparency serves the same reputational function as the language of hard commitments, at considerably lower cost. Publish a roadmap. Hold a press call. Explain thoughtfully why the old commitment was actually less rigorous than the new one. What makes this read difficult to dismiss is also what makes it difficult to confirm: the mechanisms Anthropic has built are detailed enough to be genuine, and untested enough that we don't yet know. The argument sounds reasonable. Whether it is cover won't be clear until the first Risk Report lands under real pressure.
What the Pentagon fight actually tells us
The Anthropic-Pentagon standoff is, in an odd way, more instructive than the RSP change.
Here we have a company that walked back a binding commitment under market pressure but refused to drop specific ethical limits under direct government coercion. That is not nothing. The red lines Anthropic held are exactly the kind of limits that, if abandoned, could cause immediate and irreversible harm.
OpenAI's agreement with the Pentagon, announced the same evening, took a different approach.
In an official company post, OpenAI published three red lines: no mass domestic surveillance, no direction of autonomous weapons, and no high-stakes automated decisions such as social credit systems; this includes one more prohibition than Anthropic had demanded.
Crucially though, OpenAI's contract does include the 'lawful use' framing Anthropic refused; the agreement explicitly permits the Pentagon to use its systems 'for all lawful purposes'. But OpenAI argues it has layered contractual protections, a cloud-only deployment architecture, and cleared OpenAI personnel in the loop to enforce its red lines within that framing. The company also called on the DoD to offer the same terms to all AI companies, including Anthropic.
Whether OpenAI's layered approach provides stronger guarantees than Anthropic's outright refusal, or whether accepting 'lawful use' as the baseline represents the concession Anthropic would not make, is the genuinely contested question the two deals leave open.
What the Pentagon fight makes clear is that the pressure on AI companies to remove ethical constraints is not coming only from market competition. It is now coming from government, with the full weight of federal contracting, legislation, and executive authority behind it. That changes the calculus for every company in this space. The question is no longer just whether voluntary commitments survive market pressure. It's whether they survive state pressure.
What should we make of this?
The honest assessment of February 2026 is this: both companies moved in the direction of weaker hard commitments and stronger disclosure frameworks, under pressure from both market competition and government. In Anthropic's case, that movement happened the same week the company simultaneously held firm on specific ethical red lines and paid a significant price for doing so. In OpenAI's case, the removal of safety language from its mission statement coincided with a restructuring that created direct financial incentives to prioritize growth.
Neither of these stories is simple. The RSP was always an imperfect instrument. OpenAI's structural changes include governance provisions that proponents argue preserve meaningful safety oversight. Transparency regimes, if robust and independent, can provide real accountability.
But the direction of travel is clear. The hard commitments are softening. The governance vacuum remains. And the external pressure (from competition, from private capital, and now from government) is pushing in one direction.
The companies that built their identities around safety are now asking us to trust their judgment about when to act on it. That may be the most honest description of where we are. It is not, for the record, what they promised.
Written by
Joseph Nordqvist
Joseph founded AI News Home in 2026. He studied marketing and later completed a postgraduate program in AI and machine learning (business applications) at UT Austin’s McCombs School of Business. He is now pursuing an MSc in Computer Science at the University of York.
View all articles →This article was written by the AI News Home editorial team with the assistance of AI-powered research and drafting tools. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines
References
- 1.
OpenAI has deleted the word ‘safely’ from its mission – and its new structure is a test for whether AI serves society or shareholders — Alnoor Ebrahim, The Conversation, February 13, 2026
- 2.
- 3.
Exclusive: Hegseth gives Anthropic until Friday to back down on AI safeguards — Dave Lawler , Maria Curi, Axios, February 24, 2026
- 4.
Statement from Dario Amodei on our discussions with the Department of War — Dario Amodei, Anthropic, February 26, 2026
- 5.
- 6.
- 7.
Was this useful?