The Six Abstractions That Make Claude Code an Agent Runtime
Series Foreword:
I stumbled upon a GitHub project that analyzes the leaked source code of Claude Code.
I've seen many Claude Code source code analysis articles before, but they were either too deep into the code specifics or had too much AI-generated content, making the pacing feel off.
However, this repo gave me a lot of inspiration and ideas, so I want to write a complete series based on this GitHub project.
My writing style will strictly control every diagram, doing my best to ensure a rigorous structure and clear hierarchy, giving everyone a deeper understanding of Claude Code from an architectural perspective.
▲ Image source: Screenshot of the Claude Code from Source homepage cover
This cover image is full of irony.
First, the "NO' REILLY" in the top left corner is a parody of the publisher O'REILLY, as if saying nothing is real—it's a spoof.
Alejandro Balderas himself is an engineer at Anthropic, and in the tech community, he was indeed deeply involved in the development of Claude Code. But Alejandro Balderas co-authored this book with AI, which is full of double entendre.
In the center of the cover, a crab holds a .map cover, referring to the incident of Claude Code leaking .map files. And this map can indeed be considered a .map file map.
(Purely my own random analysis, just take it for fun)
Main Text Begins
Let's start the main text. First, think about a question: What exactly is Claude Code?
A traditional CLI is essentially a function—a single command that performs an operation and gets a deterministic output.
For example, grep. When you execute grep, you don't need to run sed at the same time. For example, curl. After downloading something, you don't complete it based on the downloaded content.
Then Agentic CLI emerged.
Agentic CLI accepts descriptions in human natural language and decides which tools to use based on natural language prompts. It calls these tools in a specific order according to the situation, obtains results, and then loops until the task is completed or the user stops it.
So, the traditional CLI approach was abandoned, and everyone switched to using Agentic CLI.
▲ Traditional CLI is a linear pipeline; Agentic CLI is a feedback loop formed around model decisions.
From this, we can define Agentic CLI:
Agentic CLI is not a fixed sequence of instructions, but a loop revolving around a large language model, where the model generates the next instruction at runtime.
Claude Code is a TypeScript monolithic application that turns the terminal into a complete development environment powered by Claude. Claude Code is Anthropic's production-grade implementation of this idea.
The content of this first section is to discuss the abstracted six mental models of Claude Code.
Six Core Abstractions
Claude Code is built upon six core abstractions.
▲ Image source: Screenshot of the Chapter 1 interaction diagram from Claude Code from Source
These six abstraction layers are:
Query Loop, Tool System, Tasks, State, Memory, Hooks.
Corresponding to the query loop, tool system, tasks, state, memory, and hooks.
Besides these, things like hundreds of tool functions, the terminal renderer, the Vim emulator, and the cost tracker essentially serve these six abstractions.
Let me explain each one below.
Query Loop: The Core of the Entire System
The first abstraction is the Query Loop. The Query Loop is located in query.ts, roughly 1700 lines of code. It is an asynchronous generator and the core of the entire system.
So, the core of Claude Code is running round by round:
Call model -> Receive streaming response -> Collect tool calls -> Execute tools -> Append tool results back to context -> Then continue to the next round.
▲ The Query Loop is Claude Code's unified processing loop: different entry points converge into the same query(), which executes tasks while continuously outputting results to the outside.
One thing to note here: The Query Loop is not just the internal implementation of the REPL.
What is a REPL? REPL stands for Read-Eval-Print Loop. It is an interactive command environment.
Ordinary REPLs, SDK calls, sub-agents, and headless mode --print all use this Query Loop approach.
However, Claude Code did not write a separate set of Agent logic for each of these entry points. Instead, they all converge into the same query() loop, using the for await method to consume the events output by the large model through tool calls segment by segment.
In pseudocode, it looks something like this:
for await (const event of query(input)) {
render(event)
}
To use a real-life analogy: it's like a container in a parcel sorting center, where a robotic arm can sort parcels in real-time round by round, pack them into boxes after sorting, and finally transport them.
The Agent calls the query method to process the user's input. After processing, it generates raw event events. These event events are raw events generated by the large model and cannot be directly returned to the user. The Agent needs to render them, wrap them into a layer of Message events, and then return them to the user.
I actually ran claude -p to see what the specific Message events look like, which will help you understand better (data has been sanitized):
{"type":"system","subtype":"init","cwd":"<redacted>","session_id":"<redacted>","tools":["Read","Edit","Bash","..."],"model":"<model>"}
{"type":"system","subtype":"status","status":"requesting"}
{"type":"stream_event","event":{"type":"message_start"}}
{"type":"stream_event","event":{"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello!"}}}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"Hello!"}]}}
{"type":"stream_event","event":{"type":"message_delta","delta":{"stop_reason":"end_turn"}}}
{"type":"result","subtype":"success","result":"Hello!","stop_reason":"end_turn"}
The opening system/init indicates that Claude Code has started initializing the session;
system/status requesting indicates that Claude Code has started requesting the model;
stream_event/message_start indicates that the model has started returning a streaming response;
stream_event/content_block_delta indicates that the model is continuously outputting;
assistant is the assistant Message compiled by the SDK;
stream_event/message_delta is a message-level update;
The final result/success indicates that the entire query() execution is complete.
The entire loop is an async generator.
What does that mean?
Simply put, it doesn't run everything at once and then return the result. Instead, it continuously produces new event events as it runs.
(These events can be a piece of text output by the model, a tool call, a tool execution result, or the final stop event.)
This design offers several benefits:
First, the output rhythm is controllable. The Agent continuously produces things during execution: model text, tool calls, tool execution results, state changes. If ordinary event callbacks were used, the loop would keep generating new event events regardless of how fast the outside loop processes them.
If the user terminal cannot process them in time, it could easily lead to message backlog.
Second, when the user needs to interrupt, it can respond promptly.
If the user presses Ctrl+C, or the external caller cancels the request.
Because at this time, the model call might not be finished, tools might still be running, and subtasks might still be executing.
If there were a bunch of callbacks flying around at this point, a problem could easily arise: it appears to have stopped on the surface, but actually, it's still executing in the background.
In the end, it becomes unclear whether it was a user cancellation, a tool call failure, or a system exception.
The advantage of an async generator is that it has a clear stop signal to make a judgment.
If the user wants to cancel the execution, the Agent can wrap up along this execution chain: stop producing further events, notify running tasks to interrupt, and mark the final stop reason as user cancellation.
Third, and most importantly, it clearly states the reason for stopping.
After the task stops, the final output result Message will carry a stop_reason.
It can directly decide the next step based on this result.
▲ Claude Code Stop Reasons
It can clearly indicate why it stopped, rather than just telling the user, "The task stopped."
2. Tool System
The second abstraction is the Tool System, with source code in Tool.ts, tools.ts, services/tools/.
Tools are what the Agent can actually do in the computer world.
For example, our daily actions like reading files, running shell commands, editing code, and searching the web are all tool calls.
Although this sentence seems simple, it actually involves a complete tool system with high complexity.
Each tool in Claude Code implements a rich set of interfaces covering aspects like identity, mode, execution, permissions, and rendering.
Here, I need to introduce two concepts to you: one is the Tool Executor, and the other is the Streaming Scheduler.
The Tool Executor is relatively easy to understand; it is responsible for executing tools, such as Read, Write, Bash, Grep, etc.
But the Tool Executor divides tool calls into serial execution and concurrent execution during execution.
For example, reading files can usually be done in parallel, but writing files or running commands that modify state cannot be arbitrarily parallelized.
The Streaming Scheduler is more like an optimization mechanism or an execution strategy within the Tool Executor.
It cares about: Can certain tools be started in advance before the model has fully outputted?
For instance, if the model just outputs a Read call, and if Read is concurrency-safe, the Streaming Scheduler can immediately start it, and the Agent can go read the file first.
At this point, the model is still continuing to execute, but the file reading has already finished.
▲ The Tool System can receive model output while simultaneously judging which tools can be used directly.
Claude Code intertwines tool execution and model streaming output.
3. Tasks: Background Tasks and Sub-agents
The third abstraction is Tasks. The source code is in Task.ts, tasks/ files.
Task is mainly a background execution unit used to host sub-agents.
Each sub-agent has a state machine: including the following states.
pending -> running -> completed | failed | killed
That is, waiting, running, completed, failed, killed.
▲ A sub-agent is a new Query Loop started by AgentTool. It has its own message history, tool set, and permission mode.
The key point is AgentTool.
When Claude Code forks a sub-agent, the one that forks the sub-agent is called the parent Agent. The sub-agent and the parent Agent follow the same query loop, except the sub-agent calls the Query method to start a new query loop.
This new query loop has its own context, its own tool set, and its own permission mode, so it is also a small Agent.
This forking method gives Claude Code recursive capability:
One Agent can delegate to another Agent, and that Agent can continue to delegate further down.
But there is also a certain danger here, because once a sub-agent can make its own decisions, run its own commands, and modify its own files, the system could potentially spiral out of control.
So, there is a very important bubble mode in the permission system later, which will be discussed later.
It means: when a sub-agent encounters a dangerous action, it cannot approve it itself; it needs to report it upwards for the parent Agent or the user to decide.
This is a very important red line in multi-Agent systems.
4. State: Two Layers of State
The fourth abstraction is State.
Claude Code has two layers of state.
The first layer is a mutable singleton STATE.
It stores session-level basic state, such as the current working directory, model configuration, cost tracking, session ID, totaling about 80 fields.
You understand what session-level means, right? Every time you open a window in Claude Code, it's actually a session level.
The session records the following states (a portion):
Which directory is currently running
Which model is currently being used
What is the session id for this session
How much money has been spent
How many tokens have been used
What is the current permission mode
When Claude Code starts, it puts this information into STATE:
STATE.cwd = current working directory
STATE.sessionId = this session ID
STATE.model = current model
STATE.permissionMode = current permission mode
If changes need to be made later, you can just modify this object directly.
▲ The first layer STATE is more like a session-level runtime registration form: initialized at startup, directly modified during runtime, and read by system modules as needed.
The second layer is the UI's interface state, which contains these settings.
A new message has arrived
The input mode has changed
Waiting for user approval of a tool call
The progress bar has updated
The model is outputting
After these states change, the UI must change accordingly.
In the React language, there is something called Zustand, which is a React state management mechanism that drives these interface state changes.
The pseudocode is as follows:
const useStore = create((set, get) => ({
messages: [],
inputMode: "normal",
addMessage: (msg) =>
set((state) => ({
messages: [...state.messages, msg],
})),
setInputMode: (mode) =>
set({ inputMode: mode }),
}))
It's very simple, just a get and set method. Through simple set/get updates and reads, the UI can listen to these changes.
▲ Write via set(), read via get(), and notify the UI to re-render after state changes.
The interface state here uses a reactive design. The reason is relatively simple: changes in UI state are real-time.
But not all states should be reactive.
Like the State state changes mentioned above, they are not reactive; whereas UI states that need real-time changes require a reactive design.
5. Memory: Cross-Session Context
The fifth abstraction is Memory, located under the memdir/ path.
Memory is the Agent's persistent context between sessions.
The original text says Claude Code's memory has three layers, and Claude's official documentation also says there are three layers.
But I think there should be four layers; the last one is actually team-level.
- User-level:
~/.claude/CLAUDE.md, globally effective for Claude. - Project-level:
CLAUDE.mdin the repository, globally effective for the project. - Directory/Module-level:
CLAUDE.mdin the business module path, globally effective for the module (the official documentation categorizes this layer under the project level). - Team-level: Implemented through symbolic links, generally not directly maintained by ordinary developers.
▲ Memory writes long-term useful information into Markdown files. At the start of a session, relevant content is filtered and then fed into the Query Loop.
At the beginning of each session, the system scans these memory files, parses the frontmatter, and then lets the LLM determine which memories are relevant to the current conversation before entering the Query Loop.
Project conventions, architectural decisions, debugging history, and personal preferences are all suitable for being solidified into Memory. The md format is a file that can be opened, edited, and easily version-controlled.
I now have a thought: perhaps the best form of Agent memory is maintainable Memory.md.
6. Hooks: Lifecycle Interceptors
The sixth abstraction is Hooks, located under the hooks/, utils/hooks/ paths.
Hooks are user-defined interceptors for the entire lifecycle of Claude Code.
The original text says Claude Code's hooks trigger on 4 execution types and 27 different events.
These 4 types include shell commands, one-time LLM prompts, multi-turn Agent conversations, and HTTP webhooks.
If you are familiar with the Java Spring framework, these Hooks are very similar to the AOP design concept in Spring.
But they are not exactly the same. Spring AOP intercepts Java method calls, whereas Claude Code Hooks intercept Claude Code's predefined lifecycle events, such as before/after tool calls, prompt submission, session end, etc.
▲ Similarities and differences between Hooks and Spring AOP.
Hooks can do many things: prevent tool execution, modify input, inject additional context, and even directly block the entire query loop.
More interestingly, the permission system itself is partially implemented through hooks.
For example, the PreToolUse hook can reject a tool call before the interactive permission prompt appears.
A Complete Claude Code Request
Below is a complete Claude Code request, starting from when the user sends a message request.
User input: "Add error handling to the login function", then presses Enter.
▲ Golden Path dynamic demo: A request enters the Query Loop from the REPL, goes through model streaming response and tool execution, and then returns to the terminal output.
The complete static effect is as follows.
▲ Golden Path static complete path, for easy reference.
This path roughly goes like this:
The user enters a task in the Claude Code terminal.
The REPL passes the message to the Query Loop.
The Query Loop calls the model API.
The model streams back content and tool calls.
If the model needs to read files, modify files, or run commands, it hands them over to the Streaming Scheduler.
The Tool System then executes the specific actions.
The results of the tool execution are deposited as session context.
The Query Loop calls the model again with the new context.
Until the model no longer requests tools, or external conditions cause it to stop.
There are three points to note during the entire execution process.
First, the query loop is a generator, not a callback chain. A callback chain is a function with countless callback functions inside it.
The pseudocode for a callback chain is as follows:
runAgent(input, {
onText(text) {},
onToolCall(tool) {},
onToolResult(result) {},
onDone(reason) {},
onError(error) {},
})
The biggest drawback of a callback chain is that the outermost runAgent method has no execution control over the callback functions inside.
What does that mean? It means that how the inner functions execute cannot be controlled by the outer runAgent; it only serves the role of being notified when execution is complete.
The query loop is different; it uses for await to pull messages from inside.
for await (const msg of query(input)) {
render(msg)
}
The outer loop takes one message each time. After the outer loop finishes processing this one message, it continues to process the next one.
A callback chain is "pushed from the inside."
A generator is "pulled from the outside."
That is to say, the biggest difference between a callback chain and a generator is the ownership of control.
In the query loop structure, this means the consumption speed of the terminal UI will affect the generation speed.
This design is somewhat like the idea behind TCP sliding windows: if the receiver cannot process it, the sender cannot send requests without limit.
Second, Claude Code does not necessarily wait for the model to finish speaking a whole sentence before starting to execute tools.
The general approach is like this:
The model completely outputs a reply. The Agent reads the reply, finds tool calls inside, then starts executing the tools. After the tool execution is complete, the results are handed back to the model.
▲ Normal serial method: The model first outputs completely, the system then discovers the tool calls, and only then starts executing the tools.
Claude Code does not do this. Claude Code has a StreamingToolExecutor, which does not foolishly wait for the model's entire reply to finish before calling the corresponding tools.
As soon as it sees a tool is concurrency-safe, it can execute it first, such as Read and Grep.
While the model is still generating, the file might have already been read. The original text calls this speculative execution.
However, this approach also has a cost. The cost is that it might have to re-run, wasting tokens.
Because if the subsequent model output changes the previous result, the previous result might have to be discarded. Although this situation occurs relatively infrequently, it cannot be ignored.
This is Claude Code using potentially wasted computing power to trade for a reduction in overall latency.
Third, the entire loop is re-entrant.
When the model calls a tool, the execution result is added to the context of the current window. Then the loop, based on the context messages, calls the tool to continue execution, and the result is written back into the context. For example:
User asks a question
-> Model judges: I need to read a file
-> Tool reads the file
-> Read content is put back into context
-> Model looks at this content again
-> Model judges: I need to modify the file
-> Tool modifies the file
-> Modification result is put back into context
-> Model looks at the result again
-> Model judges: Can end
-> Final reply to the user
▲ Agent runtime loop: The model looks at the context to decide the next step, and tool results are written back into the context.
Permission System
Claude Code can execute any shell command on your machine.
It can modify files, spawn child processes, make network requests, and alter git history.
So, without a permission system to control the Agent, this would be a disaster.
I often see many domestic developers directly running with Bypass permissions enabled, saying things like, "You're already using AI, are you still afraid it might mess things up?"
But as far as I know, many developers abroad do not easily enable this permission. I've seen developers at OpenAI's press conference only using acceptEdits.
Claude Code has a total of seven permission modes, roughly from most permissive to most conservative:
(Note: these are permission modes at the source code level, not the modes you can switch between in Claude Code)
| Mode | Meaning |
|---|---|
bypassPermissions |
Allow all, no checks, mainly for internal or testing use |
dontAsk |
Allow all, but still log. |
auto |
Use a lightweight LLM classifier to judge whether to allow or deny |
acceptEdits |
File edits are automatically approved, other operations still prompt |
default |
Standard interactive mode, user confirms each critical action |
plan |
Read-only mode, all write operations are prohibited |
bubble |
Sub-agent does not decide itself, escalates permissions to the parent |
When a tool call requires permission, the resolution process follows a strict flow:
▲ Claude Code Permission Resolution Dynamic Demo: Hooks, tool self-checks, and permission modes participate in the decision in sequence.
The four strategies bypass/dontAsk/acceptEdits/plan are hardcoded static strategies. default requires a human to make each confirmation. bubble escalates every time, letting the parent Agent decide.
The permission mode that needs explanation here is auto.
Before making a judgment, auto mode makes an additional call to a lightweight LLM, letting this LLM judge whether it conforms to the user's original intent.
So, auto essentially adds a layer of automatic approval between full manual confirmation and completely open permissions.
If the user asks it to fix a bug, reading files, running tests, and modifying related files might be reasonable.
But if it suddenly wants to delete a directory or modify SSH configurations, it should stop and wait for user confirmation.
Sub-agents defaulting to bubble mode is also critical.
Bubble means bubbling up. Imagine whether bubbles in water float to the surface. The bubble mode is the same, and the surface is the parent Agent.
Because a sub-agent cannot approve its own dangerous actions. It must report to the upper-level Agent, and the upper-level Agent decides whether to apply to the user based on its own permissions.
Multi-Provider Architecture
Claude Code is a multi-provider architecture.
Claude Code can access Claude through four different infrastructure paths.
Direct API, AWS Bedrock, Google Vertex AI, Foundry.
But these differences are transparent to the rest of the system. The rest of the system neither knows nor cares about the multi-provider setup.

▲ Multi-Provider Architecture Pattern
The Anthropic SDK has created adapter wrappers for different cloud vendors. These wrappers expose the same set of interfaces externally.
getAnthropicClient() is a factory function. This factory function reads environment variables and configuration, decides which provider to use currently, and then builds the corresponding client.
After construction, callModel() and other callers will only treat it as a generic Anthropic client.
This is very much like the Factory Pattern + Adapter Pattern.
The Factory Pattern solves which Provider to create at startup.
The Adapter Pattern solves the problem that after the Client is created, the outer layer calls it using the same set of interfaces.
In addition, when callModel() chooses the caller, it actually also uses the Strategy Pattern.
However, the Query Loop does not care whether you are going through Direct API or Bedrock. The Provider selection is completed at startup, and the result is directly stored in STATE.
The subsequent Agent Loop, Tool System, and Permission System will not care what the provider is. Separation of concerns.
Build System
This section discusses the build system.
Claude Code is both an internal tool at Anthropic and a public npm package.
These two use the same codebase and control what content is included through compile-time feature flags.
const module = feature("SOME_FLAG")
? require("./some/internal/module")
: null
The feature() here comes from bun:bundle, which is Bun's built-in packaging API.
At build time, each feature flag is resolved into a boolean literal.
If the flag is false, the bundler will delete the entire corresponding require() segment.
After removal, the module will not be loaded, will not enter the bundle, and will not be published.
That is to say, Claude Code does not simply rely on runtime judgments to hide internal features; it cuts off certain paths at build time.
But the irony is right here.
The source map released in the early npm package contained sourcesContent.
This field contains the original TypeScript source code.
That is to say, feature flags indeed cut off the runtime code, but the source map still retained the source code content.
This directly led to the Claude Code source code being exposed.
How These Components Connect
So, returning to the six components we saw at the beginning, these six components have close interrelationships.
Memory is fed into the Query Loop as part of the system prompt.
The Query Loop drives tool execution.
Tool results return to the Query Loop as messages.
Tasks are recursive Query Loops, just with isolated contexts.
Hooks intercept the Query Loop at defined positions.
State is read and written by all modules, while the reactive design subscribes to UI state in real-time.
The cyclic dependency between the Query Loop and the Tool System is the most core characteristic of this system.
Until the model no longer generates tool calls, or external constraints like token budget, maximum turns reached, or user cancellation terminate it.
This is the essence of an Agent.
So now we can answer the question posed at the beginning of the main text: What exactly is Claude Code?
It is an Agent runtime running in the terminal.
The model is just the brain.
Tools are the hands and feet.
Permissions are the braking system.
State is the nervous system.
Memory is long-term experience.
Hooks are engineering discipline.
The Query Loop is the heartbeat.
So... what do you think it is?
In subsequent chapters, I will expand along a complete Claude Code request to introduce it to everyone.
So, the first chapter serves as an overview.
Next, I will start Chapter 2: The Startup Process.
References:
- Claude Code from Source, Chapter 1: The Architecture of an AI Agent https://claude-code-from-source.com/ch01-architecture/
- Claude Code from Source Homepage https://claude-code-from-source.com/
- Anthropic Claude Code Docs: Overview https://docs.anthropic.com/en/docs/claude-code/overview