Every AI Skill You Add Is a Tax on the Context Window
Context-window economics are the hard ceiling on agent capability. Every Skill description loaded at startup is working memory the model cannot use for reasoning, tool output, or conversation history. Teams scaling past 50 Skills without a lazy-loading or semantic-routing strategy will hit degradation that looks like model failure but is actually a budget problem.
A Skill system's index layer is a global, unconditional tax on the context window. A single description might cost 100 tokens, but at 500 Skills the index consumes 25% of a 200K window before any real work begins. The damage shows up as forgotten context, missed specifications, and sluggish responses on long conversations—not because the model degraded, but because working memory shrank.
Short-term triage rewrites descriptions as 30-word routing triggers with exclusion words to cut false matches by 67%. Mid-term, a tiered index loads only a domain-level summary at startup and expands into full Skill lists on demand, saving 75% of the index overhead at 100 Skills. For 500+ Skills, keyword matching collapses under collision probability and O(n) traversal; a semantic routing engine vectorizes user intent and runs ANN search against a Skill embedding library, while a federation model lets each domain hub operate as an independent microservice.
A real-world audit of 17 Skills cut total token consumption by 40% and dropped the route mismatch rate from 12% to 4% using compression and exclusion intents alone. The companion `skill-token-audit.sh` script produces a health report in one command.
The token-tax problem is architectural, not cosmetic. Compressing descriptions buys time, but the real scaling break comes from changing when Skills are loaded, not how they are described.
Exclusion words are an underused primitive. Most routing systems optimize for recall; adding a deny-list before the match step is a cheap way to raise precision without touching the model.
Anthropic's progressive-disclosure principle and Perplexity's 'Every Skill is a Tax' maxim converge on the same engineering conclusion: the cheapest token is the one never loaded.
The jump from keyword matching to semantic search is not just a scale fix—it changes the failure mode from silent misrouting to graceful degradation with confidence scores.