← Back to the summary

What Is a Token? Two Analogies That Make It Click

When we use AI tools to solve problems or develop requirements, besides caring about whether the AI completed the task and whether there were any errors, the most concerning issue is how many tokens we have used. Why? The most intuitive reason is that it relates to cost. Especially if we run out of tokens, we feel like we can't do anything—just like when the internet or power goes out.

Has anyone ever wondered what a token is? How are tokens generated? Why do we always talk about saving tokens?

Concept

Token, in Chinese called 词元, is the smallest unit that AI can understand. It's like an article: when we humans read it, we understand it word by word, but AI is different. AI splits a sentence into several fragments. A complete word might be split into two or three pieces. Each such fragment is a token. Punctuation marks also count as one token.

To help understand, here are two common examples to deepen the impression.

Eating

Each bite = one token

Everyone is familiar with eating. Suppose you sit at a dining table full of various foods. How do you eat? Grain by grain? You'd never finish by New Year's. Or whole plates at once? You'd choke. Normally, you eat bite by bite. Here, each bite is a token, and the person eating is the AI.

How many tokens consumed = how many bites a meal took

We normally eat a small tomato in one bite. But if you're given a giant burger, few people can finish it in one bite—it takes several. AI is the same: processing different texts consumes different numbers of tokens.

This calculation method has its own algorithm, called BPE (Byte Pair Encoding).

Content	Corresponding food	Bites needed
bug	small tomato	one bite
fix the bug	giant burger	three bites
修一下这个问题	hard-shell crab	many bites

Context window = stomach capacity

Earlier we mentioned that one reason we care about token usage is cost. Actually, there's another reason: each conversation window has a size, which determines the maximum number of tokens it can process at once. Different AI models have different window sizes, just like adults have larger stomachs than children.

GPT-3.5: about 4,000 tokens
GPT-4: about 32,000 tokens
GPT-4 Turbo: about 128,000 tokens
Claude 3.5: about 200,000 tokens

So what happens when the window is "stuffed"? It will first "vomit" some before it can "eat" new information. But then you'll find that the AI seems to have amnesia—it can't remember some of the earlier things you told it.

How to save tokens

Eat in batches: process in multiple rounds; don't try to stuff all information into the stomach at once (ordering a whole table of dishes).
Eat slowly: give AI time to process existing information; don't rush to feed it continuously (eating too fast, stomach can't digest).
Order on demand: only give AI the information it really needs; don't have a "capacity is enough anyway" mindset (like a buffet mentality—ordering as much as possible since it's free).
Finish in one go: state requirements clearly at once to reduce "do-overs"; don't chew the same dish repeatedly (making AI process the same content multiple times).
Taste first, then take: determine what you need first, then upload corresponding content; don't grab a bowl of everything first (uploading a bunch of files but most are useless).

Skewers

Here's another example. As the weather gets warmer, one of the most comfortable things on a hot day is sitting at a street stall, ordering a few skewers of meat, and drinking some cold beer. As an AI enthusiast, looking at the skewer in your hand, you might find it very familiar. Each bamboo stick has a fixed length, just like a context window.

Each piece of meat = one token

If the bamboo stick is the context window, then the pieces of meat on the stick are tokens. Looking at the dazzling array of goods on the shelf, you can see:

Some small pieces of meat, like snail meat, can be strung many on one skewer. These small pieces are like simple English words we input.
Some slightly larger pieces, like tenderloin or chicken hearts, can only fit a few on a skewer. These are like more complex English words, such as "unbelievable."
Even larger pieces, like ribs, can only fit two pieces per skewer. These are like non-English languages, such as Chinese or Korean.

Common phenomena and how to avoid them

Wasteful behavior	Token-saving approach
Skewer full of "fat" (polite talk, nonsense, repeated content)	String more "lean meat" (core information), remove "fat"
Every skewer required to be packed full (long output)	Request "small skewers": "brief answer," "list 3 points only"
Having the chef grill the same skewer repeatedly (making AI regenerate)	State requirements clearly at once, reduce regrilling
Mixing meat from different languages on the same skewer (mixing Chinese and English)	Keep language consistent; mixing takes more skewer space
Continuously adding new meat in the conversation, breaking the skewer (exceeding context)	Start a new conversation in time, switch to a new skewer

Summary

Two common life examples introduced what a token is and its relationship with the context window. Some ways to save tokens were also summarized, along with correcting some improper behaviors. Hope it helps you.