Backend · Architecture · Artificial Intelligence

The Six Abstractions That Make Claude Code an Agent Runtime

By cxuanAI · Jun 29, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Understanding these six abstractions gives developers a blueprint for building their own agentic systems. The async-generator Query Loop, speculative tool execution, and bubble-permission model for sub-agents are concrete design patterns that solve real control, latency, and safety problems any multi-agent runtime will hit.

Summary

A leaked source-map analysis reveals Claude Code's architecture as a TypeScript monolith organized around six abstractions. The Query Loop, an async generator, is the heartbeat: it calls the model, streams responses, executes tools, and feeds results back into context until a stop condition is met. This generator design gives the outer UI pull-based control over the event stream, enabling clean cancellation with explicit stop reasons.

The Tool System interleaves execution with model streaming through a speculative scheduler that launches concurrency-safe tools like Read before the model finishes outputting, trading occasional wasted tokens for lower latency. Tasks fork recursive sub-agents that run their own Query Loops with isolated context, while a bubble permission mode prevents sub-agents from self-approving dangerous operations. State is split into a mutable session singleton and a reactive Zustand-driven UI layer. Memory persists cross-session context as scannable Markdown files, and Hooks intercept 27 lifecycle events across shell commands, LLM prompts, agent conversations, and webhooks.

Seven permission modes range from bypass to plan (read-only), with auto mode calling a lightweight LLM to judge whether an action matches user intent. The multi-provider architecture abstracts Direct API, Bedrock, and Vertex AI behind a factory-built client, and compile-time feature flags strip internal-only code—though an early source-map leak with sourcesContent exposed the full TypeScript source.

Takeaways

— Claude Code is a single TypeScript application, not a collection of microservices.

— The Query Loop is an async generator in query.ts (~1700 lines) that all entry points—REPL, SDK, sub-agents, headless mode—converge into.

— Using for-await on the generator gives the outer UI pull-based control; the terminal's render speed can throttle event production, similar to TCP flow control.

— Explicit stop_reason values (end_turn, user_cancel, token_budget, max_turns) make termination unambiguous.

— The StreamingToolExecutor speculatively launches concurrency-safe tools like Read and Grep before the model finishes its full response, reducing latency at the cost of occasional wasted tokens.

— Sub-agents are recursive Query Loops forked via AgentTool, each with its own context, tool set, and permission mode.

— The bubble permission mode forces sub-agents to escalate dangerous actions to the parent Agent instead of self-approving.

— State has two layers: a mutable session singleton (~80 fields) and a reactive Zustand store for UI state.

— Memory persists as scannable CLAUDE.md files at user, project, directory, and team levels, parsed for frontmatter and relevance-filtered by an LLM at session start.

— Hooks trigger on 27 events across four execution types—shell commands, one-shot LLM prompts, multi-turn agent conversations, and HTTP webhooks—and can block the Query Loop entirely.

— Seven permission modes exist at the source level; auto mode calls a separate lightweight LLM to judge whether a tool call matches user intent.

— The multi-provider layer uses a factory function (getAnthropicClient) to abstract Direct API, Bedrock, Vertex AI, and Foundry behind a uniform interface.

— Compile-time feature flags via Bun's bundler strip internal-only modules, but an early npm source map included sourcesContent, exposing the full TypeScript source.

Conclusions

An async generator is a deliberate inversion of control: callbacks push events from the inside, but a generator lets the outer loop pull, which makes cancellation and backpressure straightforward.

Speculative tool execution is a bet that the model won't contradict itself mid-stream—a latency optimization that accepts wasted compute as a cost of doing business.

Forking sub-agents through the same Query Loop rather than a separate runtime keeps the architecture simple but makes the bubble permission mode non-negotiable for safety.

Splitting state into a dumb mutable singleton and a reactive UI store avoids the overhead of making everything observable; only what the user sees needs real-time updates.

Memory as filesystem Markdown is underrated: it's version-controllable, editable by humans, and trivially scannable by an LLM at session start without a vector database.

Hooks that can block the entire Query Loop turn lifecycle events into policy enforcement points—permission checks, input validation, and context injection all run through the same mechanism.

The auto permission mode offloads intent-matching to a second LLM, which is a pragmatic middle ground between full manual approval and bypass, but introduces a second point of failure and token cost.

Feature flags that strip code at build time are effective until a source map leaks; the security boundary was the bundler, not the runtime.

Concepts & terms

Query Loop

An async generator that repeatedly calls the model, collects streaming responses and tool calls, executes tools, appends results to context, and loops until a stop condition. It is the central execution primitive that all Claude Code entry points converge into.

Streaming Scheduler (StreamingToolExecutor)

A mechanism that begins executing concurrency-safe tools (e.g., Read, Grep) as soon as the model streams their invocation, without waiting for the full response to complete. This speculative execution trades occasional wasted tokens for lower end-to-end latency.

AgentTool

The tool that forks a sub-agent by starting a new Query Loop with its own isolated context, tool set, and permission mode. It enables recursive agent delegation.

Bubble Permission Mode

A permission mode for sub-agents where dangerous actions cannot be self-approved; instead, the request bubbles up to the parent Agent, which decides whether to escalate to the user.

Async Generator

A function that yields values over time via for-await-of, giving the consumer pull-based control. In Claude Code, this inverts the callback model so the terminal UI can throttle event processing and handle cancellation cleanly.

Zustand

A lightweight React state-management library used for Claude Code's reactive UI layer. It provides simple get/set semantics and notifies components to re-render when state changes.

Feature Flags (Compile-time)

Boolean flags resolved at build time by Bun's bundler. When false, the corresponding require() calls and modules are stripped from the bundle entirely, preventing internal-only code from shipping in the public npm package.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗