Every AI Skill You Add Is a Tax on the Context Window

The More AI Skills You Have, the Dumber It Gets? A Survival Guide for Token Context Explosion

Frontend AI Skill System · Architecture Chapter

When the number of Skills grows from 20 to 100+, you'll find the AI becomes sluggish—not because the model has degraded, but because the Skill index has quietly eaten up half of the context window.

A Counter-Intuitive Phenomenon
The Essence of the Token Tax: Every Skill is a Tax
Quantifying Your Token Budget
Short-Term Triage: Compressing Your Skill Descriptions
Mid-Term Solution: Domain-Level Lazy Loading
Long-Term Solution: Semantic Routing Engine
In Practice: How I Cut Token Consumption by 40% for 20 Skills
Summary: A Phased Survival Strategy

A Counter-Intuitive Phenomenon

Let me ask you a question first:

You've equipped your AI with 50 Skills. In theory, it should be more powerful. But why, as conversations deepen, does it start "forgetting" things, missing specifications, and responding slower?

I ran into this problem while managing 20 self-developed Skills. At first, it was just occasional context loss. Then I did a calculation—and found the token consumption far exceeded expectations.

The core contradiction:

More Skills → Thicker index layer → Less context left for actual tasks → The AI gets "dumber"

This isn't a model problem—it's a physical limitation of the context window.

The Essence of the Token Tax: Every Skill is a Tax

Perplexity has a core viewpoint in its Skill design philosophy:

"Every Skill is a Tax" — Every Skill you add levies a tax on the entire system's context window.

Where does this tax come from? When an AI Agent starts up, it needs to load the index information (descriptions, intents, trigger keywords) for all registered Skills. All this text combined is the Index layer consumption.

Here's a concrete example. A typical Skill description is about 80-120 tokens:

- fe-engineer-pack: Load when user says "技术方案"、"生成组件"、
  "review 代码"、"接口联调"、"排查 bug"...
  Capability pack: 10 sub-Skills covering full frontend
  engineering workflow (S1-S10).

These 100 tokens don't seem like much, but multiplied by the number of Skills, it's no small amount.

Quantifying Your Token Budget

I made an actual estimation table:

Scale	Index Layer Consumption	% of 200K Window	User Available Space	Perceived Experience
20 Skills	~2,000 tokens	1%	Ample, imperceptible	Normal
50 Skills	~5,000 tokens	2.5%	Ample	Occasional context loss
100 Skills	~10,000 tokens	5%	Acceptable	Long conversations start to lag
300 Skills	~30,000 tokens	15%	Tight	Noticeable degradation on complex tasks
500 Skills	~50,000 tokens	25%	Critical	User input gets truncated
1000 Skills	~100,000 tokens	50%	Collapse	Unusable

Key insight: A 200K window sounds huge, but the Token Tax is a global tax, a hidden tax, an unconditional tax—even if you never use a particular Skill in this conversation, its description still occupies space.

It's like having 100 reference books on your desk. Each takes up only a little space, but combined, you don't even have room for your laptop.

Short-Term Triage: Compressing Your Skill Descriptions

Applicable stage: Currently ~100 Skills

The most direct optimization: write the description as a routing trigger, not as documentation.

Principle: The 30-Word Router

A good description should satisfy:

Within 30 words (English; 50 characters for Chinese)
Only write trigger words, don't explain functional details
Have clear boundaries, know when not to trigger

Before vs After

Before (Verbose, ~150 tokens):

- fe-engineer-pack: This is a capability pack that contains 10
  sub-Skills covering the full frontend engineering workflow
  including technical solution design, component generation,
  code review, API integration, bug troubleshooting, performance
  optimization, documentation, 3D scenes, data dashboards, and
  refactoring. Load when user says "技术方案"、"生成组件"、
  "review 代码"、"接口联调"、"排查 bug"、"性能优化"、
  "技术文档"、"3D 场景"、"数据大屏"、"重构"、"提炼公共组件"
  or similar engineering tasks.

After (Router-style, ~60 tokens):

- fe-engineer-pack: Load when: "技术方案"、"生成组件"、
  "review 代码"、"接口联调"、"排查 bug"、"性能优化"、
  "3D 场景"、"数据大屏"、"重构".
  Frontend engineering pack (S1-S10).

Optimization effect: Saves ~60% tokens per Skill; 20 Skills cumulatively save ~1,200 tokens.

Three Compression Rules

Rule	Explanation	Example
Delete function descriptions	The AI doesn't need to know what a Skill can do, only when to trigger it	❌ "contains 10 sub-Skills covering..." → ✅ Delete
Merge synonymous triggers	Keep only the most common trigger for semantically similar phrases	❌ "生成组件"+"创建组件"+"写组件" → ✅ "生成组件"
Add exclusion words	Preventing false triggers is more important than adding more triggers	✅ "NOT: design review, PRD review"

The Power of Exclusion Words (exclude_intents)

When the number of Skills reaches 50+, the most common problem isn't "what should trigger didn't trigger," but what shouldn't trigger did trigger falsely.

# Collision example: User says "性能优化" (performance optimization)

- fe-engineer-pack ← Wants to trigger this
- fe-base-skill ← Also matched
- meta-hub ← Also matched (because it has the keyword "性能")

Solution: Add exclude_intents to each Skill:

{
  "name": "meta-hub",
  "intents": ["知识管理", "体系管理", "Skill 统计"],
  "exclude_intents": ["写代码", "性能优化", "生成组件"]
}

Principle: Exclusion words have higher priority than trigger words. Exclude first, then match. Collision rate drops by 67%.

Mid-Term Solution: Domain-Level Lazy Loading

Applicable stage: 100 ~ 500 Skills

When description optimization has hit its limit, you need to tackle the architecture level—not optimizing how many tokens each Skill occupies, but reducing the number of Skills loaded simultaneously.

Core Idea: Tiered Index

┌─────────────────────────────────────────────────┐
│ L0: Domain Index (~5 entries, ~500 tokens)        │  ← Always loaded
│   Frontend / Backend / Product / Design / Meta    │
├─────────────────────────────────────────────────┤
│ L1: Intra-domain Skill Index (~20 entries/domain, ~2,000 tokens) │  ← Loaded on demand
│   fe-hub / fe-engineer-pack / fe-test-pack / ... │
├─────────────────────────────────────────────────┤
│ L2: Full Skill Content (~5,000 tokens each)       │  ← Loaded after trigger
│   SKILL.md full text + modules/                   │
└─────────────────────────────────────────────────┘

Workflow:

Agent starts → Only loads L0 domain index (~500 tokens)
User input → Matches domain (e.g., "写组件" → Frontend domain)
Loads Frontend domain L1 index → Matches specific Skill
Triggers target Skill → Loads L2 full content

Effect comparison:

Strategy	Token Consumption at 100 Skills	Savings
Full load	~10,000 tokens	Baseline
Domain-level lazy load	~2,500 tokens (L0 + one domain's L1)	75%

Implementation

Organize by domain in the registry; the Agent only reads domain-level summaries at startup:

{
  "domains": [
    {
      "key": "frontend",
      "summary": "Frontend development: coding, component generation, Review, testing",
      "skillCount": 12,
      "hub": "fe-hub"
    },
    {
      "key": "product",
      "summary": "Product management: PRD, competitive analysis, prioritization, user stories",
      "skillCount": 3,
      "hub": "pm-hub"
    }
  ]
}

Only when routing hits a specific domain is the full Skill list under that domain expanded and loaded.

Anthropic's Progressive Disclosure Mechanism

This idea didn't come out of thin air. Anthropic explicitly states in its Agent design guidelines:

"Progressive Disclosure: Give the Agent the minimum context first, and only progressively expand when needed."

Domain-level lazy loading is the engineering implementation of this principle—turning a "full index" into an "on-demand index."

Long-Term Solution: Semantic Routing Engine

Applicable stage: 500+ Skills

When the number of Skills reaches 500+, keyword matching hits physical limits:

Collision probability grows exponentially: The intent keyword pool for 500 Skills exceeds 5,000 words; collisions are inevitable
Traversal efficiency collapses: O(n) keyword traversal is no longer acceptable
"Action at a Distance": Adding one Skill can worsen the routing of another unrelated Skill

Solution: Replace Keyword Traversal with Semantic Search

User input: "这个按钮的 hover 状态颜色不对" (The hover state color of this button is wrong)
    ↓
Semantic vectorization → [0.23, -0.15, 0.87, ...]
    ↓
ANN search against Skill semantic library (top-3)
    ↓
Results: [fe-engineer-pack(0.92), designer-pack(0.78), fe-base-skill(0.65)]
    ↓
Route to fe-engineer-pack (highest confidence)

No longer relies on "precise keyword hits," but understands semantic intent. Even if the user's description shares zero overlapping words with the trigger words, correct routing happens as long as the semantics are close.

Skill Federation

Organizational architecture at 500+ Skills:

meta-hub (Federation Coordinator)
├── Maintains only domain-level metadata (5-10 entries)
├── Holds no complete index of any Skill
└── Routes requests to domain hubs

Each domain hub (Federation Member)
├── Independently maintains its domain's Skill registry
├── Independently executes intra-domain routing
└── Independently handles publishing and version management

Each domain hub becomes an independent "microservice"; meta-hub degrades to a pure routing gateway. This also aligns with the "decentralized governance" principle in microservice architecture.

In Practice: How I Cut Token Consumption by 40% for 20 Skills

Sharing the actual optimization I did while managing 17 self-developed Skills.

Step 1: Measure the Baseline

First, figure out how many tokens each Skill's description currently occupies:

# Rough estimation method:
# English word count × 1.3 ≈ token count
# Chinese character count × 2 ≈ token count
# My 17 Skill descriptions accumulated about 2,400 tokens

Step 2: Grade by Trigger Frequency

Frequency	Skill	Strategy
High (daily triggers)	fe-hub, fe-engineer-pack, meta-hub	Keep full intents, no compression
Medium (weekly triggers)	fe-test-pack, content-publish	Streamline description, delete redundant explanations
Low (monthly triggers)	fe-developer-distill, skill-scout	Minimal router, keep only core trigger words

Step 3: Rewrite Low-Frequency Skill Descriptions

# Before (fe-developer-distill, ~120 tokens)

Load when user says "蒸馏编码风格"、"分析编码习惯"、"生成开发者 Skill"、
"提取编码风格"、"开发者画像"、"团队风格基线"、"更新团队蒸馏"、
"团队风格变化"、"团队编码规范更新"、"对比团队风格"、"团队蒸馏持续跟踪".
Distills developer coding DNA from Git history into a personal
Skill profile. Supports team-level continuous evolution (Phase 3/4).

# After (~50 tokens)

Load when: "蒸馏编码风格"、"开发者画像"、"团队风格基线".
Git 编码 DNA 蒸馏 → 个人/团队 Skill 画像.

Step 4: Add exclude_intents to Prevent Collisions

{
  "name": "meta-hub",
  "exclude_intents": ["写代码", "fix bug", "生成组件", "写测试"]
}

Final Results

Metric	Before Optimization	After Optimization	Change
Total Token Consumption	~2,400	~1,440	-40%
Avg. Skill Consumption	~141 tokens	~85 tokens	-40%
Route Mismatch Rate	~12%	~4%	-67%

Summary: A Phased Survival Strategy

Phase	Skill Count	Core Action	Token Effect
Now	20	No optimization needed, 1% overhead imperceptible	Baseline
P1	50	Compress description + exclude_intents	Save 40%
P2	100	Tiered index + domain-level lazy loading	Save 75%
P3	500+	Semantic routing + Skill federation + vector search	Save 95%

One-Sentence Summary

Tokens are the AI's working memory—the more Skills you equip it with, the less space it has to think. The key to optimization isn't making Skills smaller, but making the AI see them only when needed.

Final Words

Token context management is the first major hurdle in scaling an AI Skill system. The core principle for crossing this hurdle is just one:

Load on demand; always pay the tax only for the current task.

If you're also building your own AI Skill system, feel free to share your Skill count and the problems you've encountered in the comments. The next article will cover the "spillover effect" between Skills—why adding a new Skill can make another unrelated Skill dumber.

🛠️ Companion Tool: skill-token-audit.sh (Token audit script), outputs a health report for your Skill system with a single command. DM for the full version, or reply "审计" in the comments.

Comments

Top 1 from juejin.cn, machine-translated. The original thread is authoritative.

用户2137119879765

Audit