跪拜 Guibai
← Back to the summary

What Is a Token? Two Analogies That Make It Click

When we use AI tools to solve problems or develop requirements, besides caring about whether the AI completed the task and whether there were any errors, the most concerning issue is how many tokens we have used. Why? The most intuitive reason is that it relates to cost. Especially if we run out of tokens, we feel like we can't do anything—just like when the internet or power goes out.

sorry_token.gif

Has anyone ever wondered what a token is? How are tokens generated? Why do we always talk about saving tokens?

Concept

Token, in Chinese called 词元, is the smallest unit that AI can understand. It's like an article: when we humans read it, we understand it word by word, but AI is different. AI splits a sentence into several fragments. A complete word might be split into two or three pieces. Each such fragment is a token. Punctuation marks also count as one token.

image.png

To help understand, here are two common examples to deepen the impression.

Eating

Each bite = one token

Everyone is familiar with eating. Suppose you sit at a dining table full of various foods. How do you eat? Grain by grain? You'd never finish by New Year's. Or whole plates at once? You'd choke. Normally, you eat bite by bite. Here, each bite is a token, and the person eating is the AI.

How many tokens consumed = how many bites a meal took

We normally eat a small tomato in one bite. But if you're given a giant burger, few people can finish it in one bite—it takes several. AI is the same: processing different texts consumes different numbers of tokens.

image.png

This calculation method has its own algorithm, called BPE (Byte Pair Encoding).

Content Corresponding food Bites needed
bug small tomato one bite
fix the bug giant burger three bites
修一下这个问题 hard-shell crab many bites

Context window = stomach capacity

Earlier we mentioned that one reason we care about token usage is cost. Actually, there's another reason: each conversation window has a size, which determines the maximum number of tokens it can process at once. Different AI models have different window sizes, just like adults have larger stomachs than children.

So what happens when the window is "stuffed"? It will first "vomit" some before it can "eat" new information. But then you'll find that the AI seems to have amnesia—it can't remember some of the earlier things you told it.

How to save tokens

Skewers

Here's another example. As the weather gets warmer, one of the most comfortable things on a hot day is sitting at a street stall, ordering a few skewers of meat, and drinking some cold beer. As an AI enthusiast, looking at the skewer in your hand, you might find it very familiar. Each bamboo stick has a fixed length, just like a context window.

Each piece of meat = one token

If the bamboo stick is the context window, then the pieces of meat on the stick are tokens. Looking at the dazzling array of goods on the shelf, you can see:

Common phenomena and how to avoid them

Wasteful behavior Token-saving approach
Skewer full of "fat" (polite talk, nonsense, repeated content) String more "lean meat" (core information), remove "fat"
Every skewer required to be packed full (long output) Request "small skewers": "brief answer," "list 3 points only"
Having the chef grill the same skewer repeatedly (making AI regenerate) State requirements clearly at once, reduce regrilling
Mixing meat from different languages on the same skewer (mixing Chinese and English) Keep language consistent; mixing takes more skewer space
Continuously adding new meat in the conversation, breaking the skewer (exceeding context) Start a new conversation in time, switch to a new skewer

Summary

Two common life examples introduced what a token is and its relationship with the context window. Some ways to save tokens were also summarized, along with correcting some improper behaviors. Hope it helps you.