Every AI Skill You Add Is a Tax on the Context Window
The More AI Skills You Have, the Dumber It Gets? A Survival Guide for Token Context Explosion
Frontend AI Skill System · Architecture Chapter
When the number of Skills grows from 20 to 100+, you'll find the AI becomes sluggish—not because the model has degraded, but because the Skill index has quietly eaten up half of the context window.
Table of Contents
- A Counter-Intuitive Phenomenon
- The Essence of the Token Tax: Every Skill is a Tax
- Quantifying Your Token Budget
- Short-Term Triage: Compressing Your Skill Descriptions
- Mid-Term Solution: Domain-Level Lazy Loading
- Long-Term Solution: Semantic Routing Engine
- In Practice: How I Cut Token Consumption by 40% for 20 Skills
- Summary: A Phased Survival Strategy
A Counter-Intuitive Phenomenon
Let me ask you a question first:
You've equipped your AI with 50 Skills. In theory, it should be more powerful. But why, as conversations deepen, does it start "forgetting" things, missing specifications, and responding slower?
I ran into this problem while managing 20 self-developed Skills. At first, it was just occasional context loss. Then I did a calculation—and found the token consumption far exceeded expectations.
The core contradiction:
More Skills → Thicker index layer → Less context left for actual tasks → The AI gets "dumber"
This isn't a model problem—it's a physical limitation of the context window.
The Essence of the Token Tax: Every Skill is a Tax
Perplexity has a core viewpoint in its Skill design philosophy:
"Every Skill is a Tax" — Every Skill you add levies a tax on the entire system's context window.
Where does this tax come from? When an AI Agent starts up, it needs to load the index information (descriptions, intents, trigger keywords) for all registered Skills. All this text combined is the Index layer consumption.
Here's a concrete example. A typical Skill description is about 80-120 tokens:
- fe-engineer-pack: Load when user says "技术方案"、"生成组件"、
"review 代码"、"接口联调"、"排查 bug"...
Capability pack: 10 sub-Skills covering full frontend
engineering workflow (S1-S10).
These 100 tokens don't seem like much, but multiplied by the number of Skills, it's no small amount.
Quantifying Your Token Budget
I made an actual estimation table:
| Scale | Index Layer Consumption | % of 200K Window | User Available Space | Perceived Experience |
|---|---|---|---|---|
| 20 Skills | ~2,000 tokens | 1% | Ample, imperceptible | Normal |
| 50 Skills | ~5,000 tokens | 2.5% | Ample | Occasional context loss |
| 100 Skills | ~10,000 tokens | 5% | Acceptable | Long conversations start to lag |
| 300 Skills | ~30,000 tokens | 15% | Tight | Noticeable degradation on complex tasks |
| 500 Skills | ~50,000 tokens | 25% | Critical | User input gets truncated |
| 1000 Skills | ~100,000 tokens | 50% | Collapse | Unusable |
Key insight: A 200K window sounds huge, but the Token Tax is a global tax, a hidden tax, an unconditional tax—even if you never use a particular Skill in this conversation, its description still occupies space.
It's like having 100 reference books on your desk. Each takes up only a little space, but combined, you don't even have room for your laptop.
Short-Term Triage: Compressing Your Skill Descriptions
Applicable stage: Currently ~100 Skills
The most direct optimization: write the description as a routing trigger, not as documentation.
Principle: The 30-Word Router
A good description should satisfy:
- Within 30 words (English; 50 characters for Chinese)
- Only write trigger words, don't explain functional details
- Have clear boundaries, know when not to trigger
Before vs After
Before (Verbose, ~150 tokens):
- fe-engineer-pack: This is a capability pack that contains 10
sub-Skills covering the full frontend engineering workflow
including technical solution design, component generation,
code review, API integration, bug troubleshooting, performance
optimization, documentation, 3D scenes, data dashboards, and
refactoring. Load when user says "技术方案"、"生成组件"、
"review 代码"、"接口联调"、"排查 bug"、"性能优化"、
"技术文档"、"3D 场景"、"数据大屏"、"重构"、"提炼公共组件"
or similar engineering tasks.
After (Router-style, ~60 tokens):
- fe-engineer-pack: Load when: "技术方案"、"生成组件"、
"review 代码"、"接口联调"、"排查 bug"、"性能优化"、
"3D 场景"、"数据大屏"、"重构".
Frontend engineering pack (S1-S10).
Optimization effect: Saves ~60% tokens per Skill; 20 Skills cumulatively save ~1,200 tokens.
Three Compression Rules
| Rule | Explanation | Example |
|---|---|---|
| Delete function descriptions | The AI doesn't need to know what a Skill can do, only when to trigger it | ❌ "contains 10 sub-Skills covering..." → ✅ Delete |
| Merge synonymous triggers | Keep only the most common trigger for semantically similar phrases | ❌ "生成组件"+"创建组件"+"写组件" → ✅ "生成组件" |
| Add exclusion words | Preventing false triggers is more important than adding more triggers | ✅ "NOT: design review, PRD review" |
The Power of Exclusion Words (exclude_intents)
When the number of Skills reaches 50+, the most common problem isn't "what should trigger didn't trigger," but what shouldn't trigger did trigger falsely.
# Collision example: User says "性能优化" (performance optimization)
- fe-engineer-pack ← Wants to trigger this
- fe-base-skill ← Also matched
- meta-hub ← Also matched (because it has the keyword "性能")
Solution: Add exclude_intents to each Skill:
{
"name": "meta-hub",
"intents": ["知识管理", "体系管理", "Skill 统计"],
"exclude_intents": ["写代码", "性能优化", "生成组件"]
}
Principle: Exclusion words have higher priority than trigger words. Exclude first, then match. Collision rate drops by 67%.
Mid-Term Solution: Domain-Level Lazy Loading
Applicable stage: 100 ~ 500 Skills
When description optimization has hit its limit, you need to tackle the architecture level—not optimizing how many tokens each Skill occupies, but reducing the number of Skills loaded simultaneously.
Core Idea: Tiered Index
┌─────────────────────────────────────────────────┐
│ L0: Domain Index (~5 entries, ~500 tokens) │ ← Always loaded
│ Frontend / Backend / Product / Design / Meta │
├─────────────────────────────────────────────────┤
│ L1: Intra-domain Skill Index (~20 entries/domain, ~2,000 tokens) │ ← Loaded on demand
│ fe-hub / fe-engineer-pack / fe-test-pack / ... │
├─────────────────────────────────────────────────┤
│ L2: Full Skill Content (~5,000 tokens each) │ ← Loaded after trigger
│ SKILL.md full text + modules/ │
└─────────────────────────────────────────────────┘
Workflow:
- Agent starts → Only loads L0 domain index (~500 tokens)
- User input → Matches domain (e.g., "写组件" → Frontend domain)
- Loads Frontend domain L1 index → Matches specific Skill
- Triggers target Skill → Loads L2 full content
Effect comparison:
| Strategy | Token Consumption at 100 Skills | Savings |
|---|---|---|
| Full load | ~10,000 tokens | Baseline |
| Domain-level lazy load | ~2,500 tokens (L0 + one domain's L1) | 75% |
Implementation
Organize by domain in the registry; the Agent only reads domain-level summaries at startup:
{
"domains": [
{
"key": "frontend",
"summary": "Frontend development: coding, component generation, Review, testing",
"skillCount": 12,
"hub": "fe-hub"
},
{
"key": "product",
"summary": "Product management: PRD, competitive analysis, prioritization, user stories",
"skillCount": 3,
"hub": "pm-hub"
}
]
}
Only when routing hits a specific domain is the full Skill list under that domain expanded and loaded.
Anthropic's Progressive Disclosure Mechanism
This idea didn't come out of thin air. Anthropic explicitly states in its Agent design guidelines:
"Progressive Disclosure: Give the Agent the minimum context first, and only progressively expand when needed."
Domain-level lazy loading is the engineering implementation of this principle—turning a "full index" into an "on-demand index."
Long-Term Solution: Semantic Routing Engine
Applicable stage: 500+ Skills
When the number of Skills reaches 500+, keyword matching hits physical limits:
- Collision probability grows exponentially: The intent keyword pool for 500 Skills exceeds 5,000 words; collisions are inevitable
- Traversal efficiency collapses: O(n) keyword traversal is no longer acceptable
- "Action at a Distance": Adding one Skill can worsen the routing of another unrelated Skill
Solution: Replace Keyword Traversal with Semantic Search
User input: "这个按钮的 hover 状态颜色不对" (The hover state color of this button is wrong)
↓
Semantic vectorization → [0.23, -0.15, 0.87, ...]
↓
ANN search against Skill semantic library (top-3)
↓
Results: [fe-engineer-pack(0.92), designer-pack(0.78), fe-base-skill(0.65)]
↓
Route to fe-engineer-pack (highest confidence)
No longer relies on "precise keyword hits," but understands semantic intent. Even if the user's description shares zero overlapping words with the trigger words, correct routing happens as long as the semantics are close.
Skill Federation
Organizational architecture at 500+ Skills:
meta-hub (Federation Coordinator)
├── Maintains only domain-level metadata (5-10 entries)
├── Holds no complete index of any Skill
└── Routes requests to domain hubs
Each domain hub (Federation Member)
├── Independently maintains its domain's Skill registry
├── Independently executes intra-domain routing
└── Independently handles publishing and version management
Each domain hub becomes an independent "microservice"; meta-hub degrades to a pure routing gateway. This also aligns with the "decentralized governance" principle in microservice architecture.
In Practice: How I Cut Token Consumption by 40% for 20 Skills
Sharing the actual optimization I did while managing 17 self-developed Skills.
Step 1: Measure the Baseline
First, figure out how many tokens each Skill's description currently occupies:
# Rough estimation method:
# English word count × 1.3 ≈ token count
# Chinese character count × 2 ≈ token count
# My 17 Skill descriptions accumulated about 2,400 tokens
Step 2: Grade by Trigger Frequency
| Frequency | Skill | Strategy |
|---|---|---|
| High (daily triggers) | fe-hub, fe-engineer-pack, meta-hub | Keep full intents, no compression |
| Medium (weekly triggers) | fe-test-pack, content-publish | Streamline description, delete redundant explanations |
| Low (monthly triggers) | fe-developer-distill, skill-scout | Minimal router, keep only core trigger words |
Step 3: Rewrite Low-Frequency Skill Descriptions
# Before (fe-developer-distill, ~120 tokens)
Load when user says "蒸馏编码风格"、"分析编码习惯"、"生成开发者 Skill"、
"提取编码风格"、"开发者画像"、"团队风格基线"、"更新团队蒸馏"、
"团队风格变化"、"团队编码规范更新"、"对比团队风格"、"团队蒸馏持续跟踪".
Distills developer coding DNA from Git history into a personal
Skill profile. Supports team-level continuous evolution (Phase 3/4).
# After (~50 tokens)
Load when: "蒸馏编码风格"、"开发者画像"、"团队风格基线".
Git 编码 DNA 蒸馏 → 个人/团队 Skill 画像.
Step 4: Add exclude_intents to Prevent Collisions
{
"name": "meta-hub",
"exclude_intents": ["写代码", "fix bug", "生成组件", "写测试"]
}
Final Results
| Metric | Before Optimization | After Optimization | Change |
|---|---|---|---|
| Total Token Consumption | ~2,400 | ~1,440 | -40% |
| Avg. Skill Consumption | ~141 tokens | ~85 tokens | -40% |
| Route Mismatch Rate | ~12% | ~4% | -67% |
Summary: A Phased Survival Strategy
| Phase | Skill Count | Core Action | Token Effect |
|---|---|---|---|
| Now | 20 | No optimization needed, 1% overhead imperceptible | Baseline |
| P1 | 50 | Compress description + exclude_intents | Save 40% |
| P2 | 100 | Tiered index + domain-level lazy loading | Save 75% |
| P3 | 500+ | Semantic routing + Skill federation + vector search | Save 95% |
One-Sentence Summary
Tokens are the AI's working memory—the more Skills you equip it with, the less space it has to think. The key to optimization isn't making Skills smaller, but making the AI see them only when needed.
Final Words
Token context management is the first major hurdle in scaling an AI Skill system. The core principle for crossing this hurdle is just one:
Load on demand; always pay the tax only for the current task.
If you're also building your own AI Skill system, feel free to share your Skill count and the problems you've encountered in the comments. The next article will cover the "spillover effect" between Skills—why adding a new Skill can make another unrelated Skill dumber.
🛠️ Companion Tool:
skill-token-audit.sh(Token audit script), outputs a health report for your Skill system with a single command. DM for the full version, or reply "审计" in the comments.
Top 1 from juejin.cn, machine-translated. The original thread is authoritative.
Audit