I Installed 50 Claude Code Skills. Only 20 Survived the Purge.
The Claude Code Skill marketplace is growing fast, but quality is wildly uneven. Installing indiscriminately bloats context, burns tokens, and introduces agents that modify code without asking. A filtering framework—based on stage gates, installation cost, and autonomy boundaries—prevents the tooling from becoming the bottleneck.
The Claude Code Skill ecosystem has exploded to over 150 modules, but most are prompt wrappers dressed as tools. A six-month audit of 50 installed Skills found that only about a third solve real engineering pain points. The ones that stayed enforce stage gates—hard checkpoints Claude cannot skip—rather than offering suggestions it can ignore.
The survivors fall into four categories: engineering efficiency (Superpowers' TDD and debugging sub-modules, Karpathy Guidelines, gstack, Frontend Design, Document Skills, Trail of Bits Security), multi-agent orchestration (TDD, parallel agent dispatching), memory and context management (Claude Mem, Claude Context, CC Switch), and documentation (Graphify, Planning with Files).
What got purged is more instructive. Skills with overly narrow scenarios, complex multi-step installations yielding trivial gains, excessive autonomy that modifies production code without confirmation, and pure prompt wrappers lacking any structural enforcement all failed the test. The filtering heuristic is blunt: if a Skill doesn't save more than five minutes of manual work per use, it's probably dead weight.
The gap between a prompt and a Skill is structural, not semantic. A prompt is a suggestion Claude can ignore; a Skill with stage gates—'test must fail before proceeding,' 'plan must be a file before coding'—changes execution behavior measurably.
Star count correlates weakly with daily utility. Superpowers earned its 213K stars by solving Claude's worst habit (coding without thinking), but many high-star Skills are prompt wrappers with great READMEs and zero enforcement.
Multi-agent orchestration is the fastest-growing and most overhyped category. Most projects don't have enough independent sub-tasks to justify the orchestration overhead, and the ones that do need airtight dependency management to avoid parallel-turned-serial errors.
Autonomy is a spectrum, not a feature. Loki Mode's 37 agents can complete complex tasks, but the absence of a confirmation gate before modifying production code turns capability into liability.
Context-window management is the unglamorous bottleneck that determines whether Claude remains useful across long projects. Claude Mem and Claude Context attack this from opposite angles—rich memory vs. minimal token injection—and both are needed.
Cost-aware model routing (CC Switch) is the closest thing to a free lunch in this ecosystem: a 35% bill reduction with no perceptible quality loss simply by matching task complexity to model tier.
Documentation Skills (Graphify, Planning with Files, Claudian) have low visibility among pure coders but solve the highest-friction handoff points: understanding legacy code, making plans auditable, and keeping architecture decisions current.
The best filtering heuristic is brutally simple: 'Without this Skill, how much extra time would this task cost me?' Under five minutes is a gimmick; 'I have to do this manually every time and it's annoying' is the sweet spot.
Installation friction is a leading indicator of value. A Skill that requires a local service, API keys, and webhooks to deliver a trivial output is a design smell; the best Skills are one command and noticeable by the next day.
Pure prompt wrappers are identifiable by opening SKILL.md. If the entire file is descriptive prose without 'Step N: verify X before continuing' structures, Claude will treat it as optional advice, not a workflow constraint.