跪拜 Guibai
← Back to the summary

Claude Fable 5 Returned After an 18-Day Government Ban—With a Stricter Safety Classifier That Breaks Normal Code

Banned by the US Government for 18 Days, Claude Fable 5 Is Back—But at What Cost?

Author: AGIPlayer Tags: Artificial Intelligence

The strongest model in history was taken offline by the government 3 days after launch, cutting off service for users worldwide, and returned 18 days later with stricter safety reviews. This is not science fiction; this really happened in June 2026.

Claude Fable 5 official key visual—a "5" formed by butterflies Image source: https://www.anthropic.com/news/claude-fable-5-mythos-5

1. How Did It Happen?

On June 9, Anthropic dropped a bombshell—Claude Fable 5 and Claude Mythos 5 were released simultaneously.

Fable 5's positioning was clear: Anthropic's most powerful publicly released model ever. 1M token context, 128k max output, adaptive thinking always on, priced at $10/$50—double the cost of Opus 4.8. But was it worth it? Looking at the benchmark data, it led almost across the board.

Fable 5 benchmark comparison table Image source: https://www.anthropic.com/news/claude-fable-5-mythos-5

14 companies endorsed it—Cursor, GitHub, Vercel, Cognition, Replit, Databricks… practically every developer tool you can think of showed up. Cursor said it could write systems in one go that previously required days of iteration; GitHub said its code review capabilities significantly exceeded Opus 4.8.

Honestly, I was excited when I saw this release. As someone who writes code with Claude Code every day, Fable 5's "long-range autonomous Agent" capability was exactly what I needed—let it run a complex task, and it could keep going for hours or even days without derailing.

Then, 3 days later, everything stopped.

2. Three Days: From Launch to Global Ban

At 5:21 PM ET on June 12, the US Department of Commerce issued an export control directive to Anthropic.

The reason: researchers at Amazon discovered a method to bypass Fable 5's safety protections, allowing the model to identify software vulnerabilities and generate exploit code.

Anthropic's statement that day was restrained in wording, but you could feel the dissatisfaction. They said:

But the conclusion was: because real-time verification of user nationality was impossible, Anthropic chose a global service shutdown. Not a specific country, not a specific region—all users globally, including those in the United States.

Anthropic's statement on suspending access Image source: https://www.anthropic.com/news/fable-mythos-access

This decision made me think about a question: who should define the safety boundary of an AI model? The company that made the model, or the government?

Anthropic itself stated that Fable 5's jailbreak was classified by them as Category C (minor jailbreak)—it breached the safety margin but did not touch core harmful behaviors. Moreover, over 1000 hours of external red-teaming found no generalizable jailbreak.

But the government didn't see it that way.

3. The 18-Day Tug-of-War

Fable 5 event timeline: from launch to ban to return Claude Fable 5 Event Timeline (June 9 - July 1, 2026)

Over the next 18 days, things progressed step by step:

Anthropic's post data on X speaks volumes about the attention:

60K likes for a 13-second video. You can feel how long users had been waiting.

4. It's Back, But Different

Fable 5 is back, but with a stricter safety classifier.

Anthropic trained a new classifier specifically targeting the bypass method reported by Amazon, claiming it can block over 99% of cases. Blocked requests are forwarded to Opus 4.8 for processing.

Fable 5 safety classifier architecture diagram Image source: https://www.anthropic.com/news/redeploying-fable-5

But at what cost? Anthropic itself admitted: more false positives.

The new classifier will produce more false positives on "normal coding and debugging tasks." What does this mean? You're writing a normal feature, the classifier thinks your code might be related to vulnerability exploitation, rejects it outright, and then throws the request to Opus 4.8.

These four classifiers monitor:

Classifier What it blocks False positive risk
cyber Offensive cybersecurity techniques Normal security work may trigger
bio Dangerous lab methods Beneficial life science research may trigger
frontier_llm Assisting development of competing AI models Normal ML work may trigger
reasoning_extraction Asking the model to repeat internal reasoning Prompts like "show your thinking process" trigger

The last one is particularly tricky. If your previous prompts or skills contain instructions like "show your thinking" or "explain your reasoning," Fable 5 will directly trigger reasoning_extraction rejection and downgrade to Opus 4.8. I checked the official documentation; they specifically wrote a warning:

"Audit existing skills and system prompts, removing any reflection or show-thinking instructions."

Damn, several of my custom instructions for Claude Code need to be changed.

Jailbreak interaction and classification mechanism diagram Image source: https://www.anthropic.com/news/redeploying-fable-5

5. What Exactly Is the Relationship Between Fable 5 and Mythos 5?

This is something many people didn't understand. Fable 5 and Mythos 5 are actually the same model, just with different safety postures.

Claude Fable 5 Claude Mythos 5
Capability Same as Mythos 5 Same as Fable 5
Safety Classifier Yes No
Availability Public release Project Glasswing invite only
Use Case General knowledge work, coding, Agent Defensive cybersecurity
Pricing $10/$50 $10/$50

Mythos 5 has no safety classifier but is only available to cybersecurity defenders approved through Project Glasswing. This project has already helped partners discover over 10,000 high-risk or critical security vulnerabilities, including a 27-year-old bug in OpenBSD and a 16-year-old bug in FFmpeg.

So essentially, Anthropic's strategy is: the strongest capability without guardrails, but only for trusted defenders; the publicly released version adds a safety classifier, accepting false positives in exchange for safety.

I understand this logic, but as a developer, I'm more concerned about: if I pay $10/$50 for Fable 5 and get downgraded to Opus 4.8 by the classifier, why am I paying Fable 5 prices?

Anthropic thought of this and created a Fallback Credit mechanism—when a rejection retries on Opus 4.8, the cache write price difference is refunded. But honestly, this feels more like "sorry for the inconvenience" than a real solution.

6. The Jailbreak Severity Framework—Something New This Time

With this return, Anthropic also brought something new: a Jailbreak Severity Classification Framework proposed jointly with Amazon, Microsoft, and Google.

Jailbreak resistance comparison chart Image source: https://www.anthropic.com/news/claude-fable-5-mythos-5

Four scoring dimensions:

  1. Capability Gain—Does the jailbreak enable the AI to do something existing tools cannot?
  2. Capability Gain Breadth—Is it limited to a single narrow target, or across multiple attack tasks?
  3. Weaponization Difficulty—How much specialized skill is needed to exploit it?
  4. Discoverability—How easy is it to find?

Three severity levels:

Fable 5 cybersecurity assessment results Image source: https://www.anthropic.com/news/claude-fable-5-mythos-5

A HackerOne bug bounty program was also launched, specifically soliciting cyber jailbreaks for Fable 5. A 24/7 monitoring team is already in place.

This is the most valuable output of the whole event, in my opinion. Previously, the industry had no unified jailbreak classification standard; each company spoke its own language. Now, four major AI companies sat down together and at least have a common vocabulary.

7. Four Commitments—The Deal with the Government

As a condition for restoration, Anthropic made four commitments to the US government:

  1. Pre-release Government Access—Before frontier model releases, designated government partners receive expanded early evaluation access
  2. Rapid Information Sharing—Promptly notify the government when significant jailbreaks or abuse patterns are discovered, share new safeguards for independent testing
  3. Dedicated Joint Research Resources—Dedicated Anthropic teams and compute allocated for government testing
  4. Common Industry Standards—Promote shared safety evaluation standards among frontier model providers

To put it bluntly, this is a transaction: if you want to restore service, you have to let the government review your model before release.

This makes me very uncomfortable, but I can understand it. If you're a policymaker, an AI model that can autonomously discover 10,000 security vulnerabilities is something you can't help but be nervous about.

The question is where the boundary lies. This time, a "narrow jailbreak" triggered a global shutdown. Next time? A Category C jailbreak can cut off service for hundreds of millions of users for 18 days—is this level of response reasonable?

8. What Does This Mean for Developers?

Let's talk practicalities. If you're a developer using the Claude API or Claude Code daily, what does this have to do with you?

First, you need to take Fable 5's false positive problem seriously. Especially if you're doing security-related development, or if your prompts contain instructions like "explain your reasoning process," you need to configure fallbacks in advance. The official docs provide three methods: server-side fallbacks parameter (simplest), SDK middleware, and manual retry.

Second, Sonnet 5 might be a more pragmatic choice. Released the same day, Sonnet 5 is priced at $3/$15, with performance close to Opus 4.8, and its cybersecurity protections are much looser than Fable 5's—the official statement clearly says "Sonnet 5's cybersecurity risk is overall lower." It can't even develop a Firefox exploit (0.0%), so it doesn't need such an extreme classifier.

Third, Fable 5 is still worth it for long-range Agent tasks. The official documentation states clearly: "Teams seeing the best results are using Claude Fable 5 on their hardest unsolved problems. Testing it only on simple tasks will severely underestimate its capability range." But you have to accept a reality—on safety-related tasks, it may downgrade.

Fourth, pay attention to the behavioral change in thinking. Fable 5's raw chain of thought is never returned, only summaries or omissions. You cannot turn off thinking; performance on low effort settings still "often exceeds previous models' xhigh." This is good, but it means you can no longer debug the model's reasoning process like before.

9. My Judgment

Let's start with the conclusion: Fable 5's return is good, but this event marks a new phase in AI governance—the government can shut down the world's strongest AI model for 18 days because of a minor jailbreak.

Technical judgment: Fable 5 is indeed the strongest publicly released model currently, especially for long-range autonomous Agents and complex coding scenarios. But the stricter safety classifier means its actual usability is discounted. You have to be prepared for false positives.

Industry judgment: This event sets a dangerous precedent. Amazon discovers a Category C jailbreak → government orders a global shutdown → 18 days later, it returns with stricter scrutiny. This chain is too short. If every minor jailbreak goes through this process, the AI industry will be strangled by regulation. The good news is that the establishment of the jailbreak classification framework at least gives the discussion a common foundation.

Practical judgment: If you're not doing security-related development, Fable 5 is still the best choice. If you are doing security development, seriously configure fallbacks and consider Sonnet 5 as an alternative. But no matter which you choose, don't treat Fable 5 as infrastructure that "will never go down"—it has already proven it can be shut down.

(Honest addition) Honestly, I am quite worried about the government pre-release access commitment. A commercial AI model has to be reviewed by the government before release—how would this be technically implemented? Will it turn into an approval system? Where is the boundary? I haven't figured these out yet, but I think this is the most important variable to watch going forward.

10. What to Watch Next?

A few points I think are worth continuously monitoring:

  1. How high is the false positive rate really? Anthropic says the safety classifier trigger rate is "less than 5% of sessions," but what about the new classifier? Let's wait a few weeks for the community to run it.
  2. When will Mythos 5 be restored globally? Currently only US critical infrastructure operators can use it; what about other countries?
  3. Can the jailbreak classification framework actually be implemented? Four companies proposed a framework, but there are no detailed execution standards or response processes yet.
  4. What about Chinese developers? 30-day data retention, no opt-out for zero data retention—what does this mean for domestic compliance?
  5. Will Sonnet 5 become the de facto first choice? Better price-performance ratio, fewer restrictions—it might be the "good enough" choice for most people.

18 days, a model went from launch to ban to return. The significance of this event isn't about how powerful Fable 5 itself is, but that it reveals: when AI capability reaches a certain level, its fate is no longer determined by technology alone.


References


Topic Tags: #ClaudeFable5 #Anthropic #AIExportControls #AISafety #AIGovernance #ClaudeCode