跪拜 Guibai
← All articles
Backend · Frontend

Ponytail Forces AI Coding Agents to Climb a 7-Step 'Laziness Ladder' Before Writing Anything

By 也无风雨也雾晴 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Agent tooling is splitting into two philosophies: addition (memory, preferences, context) and subtraction (constraints, decision trees, minimalism). Subtraction rules fail visibly—the code breaks—while addition rules fail silently, producing plausible but wrong output based on stale context. Ponytail makes the case that for controlling agent behavior, a decision ladder you can audit is safer than a memory you can't trust.

Summary

Ponytail is not a vague prompt to "write clean code." It is a concrete, 7-step yes/no decision tree that an agent must climb before writing anything. Each rung asks a specific, verifiable question: does the standard library already do this? Can a native HTML element handle it? The ladder runs after the agent has fully understood the codebase, not as a substitute for reading.

The project ships with three intensity levels—lite, full, and ultra—and an architecture that injects one canonical rule file across 16 different agent adapters, from Claude Code hooks to MCP servers. A CI script ensures all static copies stay in sync. Benchmarking on a real FastAPI+React repo shows a 54% reduction in lines of code and a 20% cost drop, with a perfect security score.

Crucially, Ponytail represents a "subtraction rule" for agents. Unlike memory-adding tools where stale preferences silently corrupt output, a subtraction rule fails loudly: the code simply doesn't work, and the fix is obvious. The project also introduces a `ponytail:` comment convention that logs the upper limit of a simplification and the trigger condition for upgrading it later, turning intentional shortcuts into auditable decisions rather than hidden debt.

Takeaways
Ponytail's core is a 7-level decision ladder: skip, reuse, stdlib, native, existing dep, one-liner, minimum code—each rung is a yes/no check, not a matter of taste.
Three intensity levels (lite, full, ultra) let teams dial the constraint from a gentle reminder to YAGNI extremism that questions the requirement itself.
A single canonical rule file (SKILL.md) feeds 16 adapters—Claude Code hooks, MCP servers, static rule copies—with CI enforcing synchronization.
Benchmarks on a FastAPI+React template show -54% lines of code, -22% tokens, -20% cost, -27% time, and 100% security score against a no-skill baseline.
The original benchmark was bugged: the SessionStart hook accidentally injected rules into the baseline arm too. The author publicly corrected it and rebuilt the benchmark.
The `ponytail:` comment convention requires two pieces of information: the current simplification's upper limit and the trigger condition for upgrading it later.
A "no-lazy list" explicitly marks what must never be cut—trust-boundary validation, error handling, security, accessibility—turning gut checks into category checks.
Subtraction rules fail explicitly (code doesn't run); addition rules fail implicitly (code runs but is wrong). This makes subtraction safer for agent system prompts.
Ponytail is a brake, not a steering wheel—it constrains output but doesn't replace domain judgment, and it should not be used in payment or security-critical paths without review.
Five designs can be adopted without installing the tool: a 3-level ladder in AGENTS.md, a `lean:` comment convention, a no-lazy list, multi-level intensity, and code-first output format.
Conclusions

The ladder model solves the "write clean code" problem by replacing an undefined aesthetic with a sequence of grep-able, MDN-checkable yes/no questions. An agent doesn't need taste; it needs a checklist.

Ponytail's real innovation is not the prompt but the architecture: one source of truth, many injection points, and a CI guard that prevents rule drift. Most agent-rule projects skip the synchronization problem entirely.

The `ponytail:` comment convention is a lightweight alternative to Architecture Decision Records. It bakes the "why" and the "when to revisit" directly into the code, where it can't be ignored.

Admitting a benchmark was bugged and rebuilding it from scratch is rare in open source. That honesty makes the -54% figure more credible, not less.

Subtraction rules align with how senior engineers actually work: they eliminate options before they write code. Encoding that elimination as a system prompt turns a cognitive habit into a reproducible constraint.

The anti-pattern of treating "one line if possible" as "swallow errors if possible" reveals a deeper problem: any optimization rule, applied without category exceptions, will optimize away safety. The no-lazy list is the necessary counterweight.

Ponytail's lite mode—where the agent writes normally but mentions a lazier alternative—is a training mechanism. It teaches the agent the ladder without enforcing it, which may produce better long-term behavior than jumping straight to full mode.

Memory-adding projects like Claude Mem suffer from a trust decay problem: you can't tell when a memory is stale. Ponytail's subtraction approach avoids this because its rules are universal ("use the standard library") rather than personal ("I prefer React Query").

Concepts & terms
Decision Ladder
A sequence of yes/no checks an agent must climb before writing code. Each rung asks a specific, verifiable question (e.g., "Does the standard library already do this?") rather than relying on vague notions of code quality.
Subtraction Rule vs. Addition Rule
In agent system prompts, a subtraction rule constrains behavior by removing options (e.g., "don't add new dependencies"), failing visibly when code is insufficient. An addition rule adds context (e.g., memories, preferences), failing silently when that context becomes stale.
YAGNI (You Ain't Gonna Need It)
A principle from Extreme Programming that advises against building features until they are actually needed. Ponytail's ultra mode applies this to the extreme, questioning whether a requirement itself is necessary.
Flag File
A small file on disk (e.g., `~/.claude/.ponytail-active`) that persists state across agent sessions. Ponytail uses it to share the current mode (lite/full/ultra/off) between all hook nodes and sub-agents.
SubagentStart Hook
A lifecycle hook that fires when a main agent spawns a sub-agent to handle a subtask. Without it, the sub-agent runs without the parent's constraints—a common oversight in multi-agent architectures.
No-Lazy List
An explicit, categorized list of things an agent must never simplify, regardless of the ladder level. Examples include trust-boundary input validation, error handling that prevents data loss, and accessibility features.
MCP (Model Context Protocol)
An open protocol that standardizes how AI agents connect to external tools and data sources. Ponytail ships an MCP server so any MCP-compatible agent can query its rules.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗