Claude Fable 5 Returned After an 18-Day Government Ban—With a Stricter Safety Classifier That Breaks Normal Code
A Category C jailbreak—one that doesn't touch core harmful behaviors—can now trigger a global shutdown of frontier AI infrastructure for weeks. Developers paying premium prices for Fable 5 will see normal coding and debugging requests silently downgraded to a weaker model, and any prompt that asks the model to explain its reasoning gets rejected outright.
Anthropic launched Claude Fable 5 on June 9 as its most capable public model—1M-token context, 128k max output, always-on adaptive thinking, priced at $10/$50. Three days later, the US Commerce Department issued an export control directive after Amazon researchers found a narrow jailbreak that could produce exploit code for known vulnerabilities. Anthropic classified the jailbreak as Category C (minor) and noted that weaker models exhibited the same behavior, but the company shut down access globally because it could not verify user nationality in real time.
The model returned on July 1 with a new safety classifier trained to block the reported bypass method in over 99% of cases. Blocked requests fall back to Opus 4.8, and Anthropic introduced a Fallback Credit to refund the cache-write price difference. Four classifiers now monitor cyber, bio, frontier LLM, and reasoning-extraction risks—the last one rejects any prompt that asks the model to show its thinking, forcing developers to strip reflection instructions from their system prompts and skills.
Alongside the redeployment, Anthropic published a jailbreak severity framework co-developed with Amazon, Microsoft, and Google, and made four commitments to the US government including pre-release model access for government partners. The same-day release of Sonnet 5 at $3/$15, with looser safety restrictions and near-Opus-4.8 performance, offers a pragmatic alternative for developers who hit Fable 5's classifier walls.
Fable 5 and Mythos 5 are the same model with different safety postures, which means the publicly available version is deliberately crippled by classifiers while the uncensored version is reserved for government-vetted defenders—a two-tier access regime baked into the product line.
The reasoning_extraction classifier creates a direct conflict with prompt engineering best practices: developers who built skills around chain-of-thought transparency must now strip those instructions or get silently downgraded to a weaker model.
Anthropic's Fallback Credit only refunds the cache-write price delta, not the full request cost, so developers still pay a premium for Fable 5 even when their work gets routed to Opus 4.8.
The government's trigger for a global shutdown was oral evidence of a narrow, non-generalizable jailbreak—a standard far below what would normally justify infrastructure-level intervention, and one that sets a precedent for future regulatory action against any frontier model.
Sonnet 5's simultaneous release looks less like a coincidence and more like a hedge: a cheaper, less restricted model that keeps developers on the platform if Fable 5's classifiers prove too aggressive for daily work.
The jailbreak severity framework is the most durable outcome of this crisis—four major AI companies now share a common taxonomy, which could accelerate standardized safety evaluations across the industry, but the framework still lacks concrete response thresholds and enforcement mechanisms.
Pre-release government access to frontier models blurs the line between safety review and de facto licensing; if this becomes normalized, model release timelines become subject to political and bureaucratic calendars rather than engineering readiness.