跪拜 Guibai
← All articles
Claude · AI Coding

I Installed 50 Claude Code Skills. Only 20 Survived the Purge.

By 码哥字节 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

The Claude Code Skill marketplace is growing fast, but quality is wildly uneven. Installing indiscriminately bloats context, burns tokens, and introduces agents that modify code without asking. A filtering framework—based on stage gates, installation cost, and autonomy boundaries—prevents the tooling from becoming the bottleneck.

Summary

The Claude Code Skill ecosystem has exploded to over 150 modules, but most are prompt wrappers dressed as tools. A six-month audit of 50 installed Skills found that only about a third solve real engineering pain points. The ones that stayed enforce stage gates—hard checkpoints Claude cannot skip—rather than offering suggestions it can ignore.

The survivors fall into four categories: engineering efficiency (Superpowers' TDD and debugging sub-modules, Karpathy Guidelines, gstack, Frontend Design, Document Skills, Trail of Bits Security), multi-agent orchestration (TDD, parallel agent dispatching), memory and context management (Claude Mem, Claude Context, CC Switch), and documentation (Graphify, Planning with Files).

What got purged is more instructive. Skills with overly narrow scenarios, complex multi-step installations yielding trivial gains, excessive autonomy that modifies production code without confirmation, and pure prompt wrappers lacking any structural enforcement all failed the test. The filtering heuristic is blunt: if a Skill doesn't save more than five minutes of manual work per use, it's probably dead weight.

Takeaways
Superpowers (213K stars) is a 20-module workflow framework whose TDD, systematic debugging, and plan-writing sub-modules enforce stage gates Claude cannot skip.
Karpathy Guidelines changes Claude's behavior to read and understand before modifying code, rather than jumping straight to a refactor.
gstack bundles TypeScript type-checking and Supabase schema validation that runs before every commit, catching low-level errors early.
Frontend Design provides a visual decision framework so AI-generated UIs use contrasting colors, hierarchy, and asymmetry instead of default Tailwind sameness.
Document Skills let Claude output formatted DOCX, PDF, PPTX, and XLSX binaries directly, eliminating manual Markdown-to-format conversion.
Trail of Bits Security proactively flags SQLi, CSRF, and privilege-escalation paths during code review, adding a layer Claude's defaults omit.
TDD as a Skill locks Claude into Red → Green → Refactor with verification checkpoints, preventing the common pattern of writing implementation code first and tests never.
Ruflo's multi-agent pipeline solves 100-step AI workflows but adds more complexity than it saves for ordinary business development.
Dispatching Parallel Agents cut waiting time by 40% during a cross-module refactor, but only when sub-tasks are truly independent.
Loki Mode's 37-agent autonomous system modified production files and committed code without confirmation, making it too dangerous for live codebases.
Claude Mem auto-extracts session decisions into structured memory files, eliminating the need to re-explain design choices in every new session.
CC Switch routes simple tasks to Haiku, complex reasoning to Opus, and code generation to Sonnet, cutting monthly bills by roughly 35% with no quality drop.
Graphify generates function-call, module-dependency, and data-flow diagrams from a codebase in minutes, replacing hours of manual code reading.
Planning with Files forces Claude to output a Markdown plan with goals, dependencies, and verification criteria before executing multi-step tasks.
Skills that are pure prompt wrappers lack stage gates and verification; Claude often ignores them after the first trigger.
Excessively autonomous Skills that modify code without a confirmation prompt are a liability in production environments.
Installation cost must match benefit: a Skill requiring five setup steps to add an emoji to replies is not worth it.
Skills serving scenarios that appear less than once a week in daily work are too narrow to justify the context overhead.
Multiple Skills with similar trigger descriptions can conflict; priority markers in frontmatter or streamlined descriptions resolve this.
awesome-skills.com and travisvn's awesome-claude-skills repo are the two best discovery directories, with the latter emphasizing quality filtering.
Conclusions

The gap between a prompt and a Skill is structural, not semantic. A prompt is a suggestion Claude can ignore; a Skill with stage gates—'test must fail before proceeding,' 'plan must be a file before coding'—changes execution behavior measurably.

Star count correlates weakly with daily utility. Superpowers earned its 213K stars by solving Claude's worst habit (coding without thinking), but many high-star Skills are prompt wrappers with great READMEs and zero enforcement.

Multi-agent orchestration is the fastest-growing and most overhyped category. Most projects don't have enough independent sub-tasks to justify the orchestration overhead, and the ones that do need airtight dependency management to avoid parallel-turned-serial errors.

Autonomy is a spectrum, not a feature. Loki Mode's 37 agents can complete complex tasks, but the absence of a confirmation gate before modifying production code turns capability into liability.

Context-window management is the unglamorous bottleneck that determines whether Claude remains useful across long projects. Claude Mem and Claude Context attack this from opposite angles—rich memory vs. minimal token injection—and both are needed.

Cost-aware model routing (CC Switch) is the closest thing to a free lunch in this ecosystem: a 35% bill reduction with no perceptible quality loss simply by matching task complexity to model tier.

Documentation Skills (Graphify, Planning with Files, Claudian) have low visibility among pure coders but solve the highest-friction handoff points: understanding legacy code, making plans auditable, and keeping architecture decisions current.

The best filtering heuristic is brutally simple: 'Without this Skill, how much extra time would this task cost me?' Under five minutes is a gimmick; 'I have to do this manually every time and it's annoying' is the sweet spot.

Installation friction is a leading indicator of value. A Skill that requires a local service, API keys, and webhooks to deliver a trivial output is a design smell; the best Skills are one command and noticeable by the next day.

Pure prompt wrappers are identifiable by opening SKILL.md. If the entire file is descriptive prose without 'Step N: verify X before continuing' structures, Claude will treat it as optional advice, not a workflow constraint.

Concepts & terms
Stage gates
Hard checkpoints embedded in a Skill's workflow that Claude cannot skip—for example, 'the red-light test must fail before proceeding to the next step' or 'a plan must be output as a Markdown file before coding begins.' They transform a Skill from a suggestion into an enforced process.
TDD (Test-Driven Development) as a Skill
A Skill that locks Claude into the Red → Green → Refactor cycle with verification between each phase. Without it, Claude's default behavior is to write implementation code first and defer or skip tests entirely.
Multi-agent orchestration
A pattern where separate AI agents handle distinct sub-tasks (research, coding, verification) in a pipeline or in parallel. Effective only when sub-tasks are truly independent; otherwise, the orchestration overhead exceeds the time saved.
Context window bloat
The degradation of Claude's performance as conversation history grows and consumes the available token budget. Skills like Claude Context and Claude Mem mitigate this by compressing or selectively loading relevant project knowledge.
Lazy loading (Skills)
Claude Code only loads Skill files when it deems them relevant to the current task. This keeps token consumption low for unused Skills, but means a poorly described trigger in SKILL.md can cause a Skill to never activate.
Model routing
Directing tasks to different Claude models based on complexity and cost: Haiku for simple tasks (cheapest), Sonnet for code generation (balanced), Opus for complex reasoning (most expensive but strongest). CC Switch automates this decision.
Pure prompt wrapper
A Skill that is merely a block of descriptive text formatted as SKILL.md, with no stage gates, verification mechanisms, or input/output constraints. Claude treats these as optional advice and often ignores them after the first use.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗