Artificial Intelligence · AI Programming · Programmer

What Is a Token? Two Analogies That Make It Click

By Coffeeee · Jun 26, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Token consumption directly drives API costs and limits what fits in a model's context window. Understanding tokenization—especially how different languages and text structures consume tokens unevenly—is essential for building efficient, cost-effective AI applications. These analogies make the abstract mechanics of BPE and context windows immediately actionable.

Summary

Tokens are the smallest units AI models process, but their behavior is unintuitive: a single English word can be split into multiple tokens, while punctuation counts as one. This post uses two vivid analogies to make the concept concrete.

In the eating analogy, each bite is a token, the meal is the input, and the stomach is the context window. A small tomato ("bug") takes one bite; a giant burger ("fix the bug") takes three; a hard-shell crab (Chinese text) takes many. The stomach can only hold so much—exceed it, and the AI "vomits" earlier information, causing apparent amnesia.

The skewer analogy reframes the same idea: the bamboo stick is the context window, each piece of meat is a token. Small pieces (simple English) pack many per stick; larger pieces (complex words or non-English scripts like Chinese) take up more space. Wasteful behaviors—like padding with polite phrases, demanding long outputs, or mixing languages—are like loading a skewer with fat instead of lean meat.

Practical tips emerge naturally: batch inputs, state requirements clearly upfront, avoid repeated regeneration, and start fresh conversations when context runs full. The underlying algorithm is Byte Pair Encoding (BPE), which determines how text gets split into tokens.

Takeaways

— Tokens are the smallest units AI models process; a single word can be split into multiple tokens, and punctuation counts as one.

— Byte Pair Encoding (BPE) is the algorithm that determines how text is split into tokens.

— Context windows have fixed token limits: GPT-3.5 ~4K, GPT-4 ~32K, GPT-4 Turbo ~128K, Claude 3.5 ~200K.

— Exceeding the context window causes the model to 'forget' earlier information as it drops older tokens.

— Non-English languages like Chinese consume more tokens per character than simple English words.

— Common token-wasting behaviors include padding with polite phrases, demanding long outputs, mixing languages, and repeatedly regenerating the same content.

— Effective token-saving strategies: batch inputs, state requirements clearly upfront, keep language consistent, and start fresh conversations when context runs full.

Conclusions

The eating and skewer analogies reveal that token efficiency is fundamentally about information density—'lean meat' over 'fat'—not just prompt length.

The fact that Chinese text consumes significantly more tokens than English per unit of meaning has real cost implications for developers building multilingual applications.

Context window limits are not just a technical constraint but a design constraint: they force developers to think about information prioritization and conversation structure.

The advice to 'start a new conversation' when context runs full is a practical workaround, but it also highlights a fundamental limitation of current transformer architectures.

Concepts & terms

Token

The smallest unit of text that an AI model processes. A token can be a word, part of a word, or a punctuation mark. Models split input text into tokens using algorithms like BPE.

Byte Pair Encoding (BPE)

A tokenization algorithm that iteratively merges the most frequent pairs of bytes or characters in a text corpus. It produces a vocabulary of subword units, allowing models to handle rare or unknown words by breaking them into known pieces.

Context window

The maximum number of tokens an AI model can process in a single input-output cycle. It limits how much text the model can 'see' at once; exceeding it causes the model to drop earlier tokens, effectively forgetting them.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗