Claude Code Gets a Browser UI with Real-Time Tool Approval
1 Foreword
Hello, I'm alien. In the previous article "Please Stop Only Using Claude Code for Coding! Unlock Its Full Potential?", we introduced some use cases for Claude Code in other scenarios. Today, following the Claude Code line of thinking, let's talk about how to conveniently use Claude Code inside a browser. I'll also share some creative ideas based on Claude Code.
What can you gain from this document?
- A new approach to running Claude in the terminal;
- Introduction to and practical use of claude-agent-sdk;
- How to make the Claude Code workflow controllable.
When using traditional Claude Code, we might encounter the following troublesome scenarios:
Scenario 1: Unfriendly for non-developer users
We all know that Claude Code is essentially an agent. Besides writing code, an agent can also help us with some process-oriented work-related tasks to empower our work. For non-developer engineers, the learning curve for Claude Code might be much steeper than interacting with a Doubao bot, including skills management, MCP management, model switching, etc.
So, can we move Claude Code into the browser, creating a more user-friendly terminal interaction interface?
Scenario 2: Claude Code typically runs in the terminal via CLI, interacting through terminal commands. This makes it difficult to interfere with the results returned by Claude Code, especially when we want to customize a workflow based on the Claude Code Agent, or perform secondary encapsulation or processing based on the results returned by Claude Code.
Today, we'll address these two issues and see how to move Claude Code into the browser, and potentially achieve effective control over the startup, process, and output results of Claude Code.
2 Basic Implementation Approach
2.1 Basic Schematic
Claude Code itself is based on the Node.js runtime environment. To use Claude Code more flexibly, you could directly use the CLI method. For remote scenarios, you can provide a stable sandbox environment for Claude Code through solutions like Docker. Let's look at the design principle:
- Create a Node service, make canUseTool asynchronous — push approval requests to the browser, and reinject them after a human/bot decision;
- Connect Node and the browser via WebSocket; HTTP streaming responses can also be used for this;
- Node receives query information from the web side, processes the query, and passes it to the Agent SDK;
- The Agent SDK handles the Claude Code child process logic;
An end-to-end interaction should look like this:
A very critical part of the entire process is claude-agent-sdk. Let's learn about it.
2.2 claude-agent-sdk
@anthropic-ai/claude-agent-sdk is an official library from Anthropic that treats "Claude Code, the CLI, as a high-level object that can be driven by a JS program."
Unlike @anthropic-ai/sdk, claude-agent-sdk encapsulates the following on top of it:
- Claude Code's complete toolset (Read / Edit / Write / Bash / Grep / Glob / WebFetch ...)
- Loading and parsing of Skills / MCP / CLAUDE.md
- Multi-turn session persistence (session_id + --resume equivalent semantics)
- Permission approval (canUseTool async callback)
- Streaming events (stream_event transparently passes original Anthropic events);
This way, you can operate the Claude Code CLI via an API, use Claude Code more conveniently, and also read the Claude Code configuration on the current application, as well as the skills and plugins registered by Claude Code.
Let's look at the basic usage:
import { query } from '@anthropic-ai/claude-agent-sdk';
const stream = query({
prompt: 'Create a new file hello.txt in the current directory',
options: {
cwd: '/path/to/project',
settingSources: ['user', 'project', 'local'],
includePartialMessages: true,
permissionMode: 'default',
canUseTool: async (toolName, input, ctx) => { /* ... */ },
},
});
for await (const msg of stream) {
// msg is one of the SDKMessage subtypes
console.log(msg);
}
Let's look at the core parameter information for options:
- cwd: string —— Working directory;
- settingSources: ('user' | 'project' | 'local')[] —— Which configurations to load
settingSources: ['user', 'project', 'local'];
- settingSources: ('user' | 'project' | 'local')[] —— Which configurations to load
| Value | What it reads |
|---|---|
'user' |
~/.claude/ global configuration |
'project' |
.claude/ under cwd |
'local' |
.claude/settings.local.json |
- includePartialMessages: true —— Key for character-by-character streaming;
- permissionMode: 'default' —— Permission mode;
- canUseTool —— Async permission callback;
- signal: AbortSignal —— Turn interruption;
After introducing claude-agent-sdk, let's look at the brief implementation logic:
A brief explanation here:
- Regarding the model, MiniMax-M3 was used. The overall experience was quite good; it's basically sufficient for writing code / doing some daily process-oriented work;
- To demonstrate the hijacking effect on Claude Code, I made a simple record of the duration and tokens for each conversation round here; Disclaimer upfront: The token consumption tracking might not be entirely accurate;
2.3 Brief Implementation
2.3.1 Overall Flow
A single user_message link:
- Browser input →
user_messageframe → Node query({ prompt, options })starts SDK session → spawns Claude Code child process- SDKMessage streamed back →
normalize()normalization → frames pushed to browser text_deltaframes rendered character by character- Dangerous tool triggers
canUseTool→permission_requestframe pushed to browser - User decision →
permission_responsereinjected → Promise resolves - Turn ends →
turn_result+doneframes → recordsessionIdfor next continuation
2.3.2 Node-side Code Snippet (core/session.js)
This file does two things: drives Claude Code + pushes approvals to the browser and waits for reinjection. Everything else is detail.
import { query } from '@anthropic-ai/claude-agent-sdk';
class CodeSession {
constructor({ cwd, onFrame }) {
this.cwd = cwd;
this.onFrame = onFrame; // Callback for returning frames, injected by server.js as ws.send
this.sessionId = null; // Stored after the first turn, used for subsequent resumes
this._permIdSeq = 0; // Auto-increment ID for approval requests, used for reconciliation
this._pendingPerms = new Map(); // id -> resolve function, retrieved when approval is reinjected
this._abort = null; // AbortController for the current turn
}
async send(prompt) {
this._abort = new AbortController();
const options = {
cwd: this.cwd,
// Crucial: Without this field, the SDK won't read ~/.claude/, and all your global skills/MCP will be gone
settingSources: ['user', 'project', 'local'],
includePartialMessages: true, // Without this, character-by-character streaming is impossible; you'll only get whole blocks
permissionMode: 'default', // Dangerous tools go through the callback below
canUseTool: this._canUseTool.bind(this),
signal: this._abort.signal,
};
if (this.sessionId) options.resume = this.sessionId; // Resume from the second turn onwards
try {
const stream = query({ prompt, options });
for await (const msg of stream) {
// Only record on first occurrence; subsequent messages won't overwrite
if (msg.session_id && !this.sessionId) this.sessionId = msg.session_id;
for (const frame of normalize(msg)) this.onFrame(frame);
}
this.onFrame({ type: 'done', sessionId: this.sessionId });
} catch (err) {
// User clicking stop ≠ error. The frontend needs a "Cancelled" prompt, not an error popup
if (this._abort.signal.aborted) {
this.onFrame({ type: 'done', aborted: true });
} else {
this.onFrame({ type: 'error', message: err.message });
}
}
}
stop() { this._abort?.abort(); }
resolvePermission(id, decision) {
// Entry point for browser reinjection, retrieves the corresponding resolve function from the Map
this._pendingPerms.get(id)?.(decision);
}
// The most convoluted part of the entire code, but the logic isn't complex:
// The SDK calls this function expecting a Promise (async), so we create one on the spot, push it to the browser, and wait for reinjection
_canUseTool(toolName, input, ctx) {
const id = ++this._permIdSeq;
this.onFrame({ type: 'permission_request', id, toolName, input });
return new Promise((resolve) => {
const finish = (decision) => {
this._pendingPerms.delete(id);
// Note: The allow branch must return updatedInput,
// The SDK runtime uses Zod for strict validation; only providing behavior will cause an error
resolve(
decision === 'allow'
? { behavior: 'allow', updatedInput: input }
: { behavior: 'deny', message: 'User denied in browser' }
);
};
this._pendingPerms.set(id, finish);
// If no one responds within 5 minutes, treat as deny to prevent the Promise from never settling if the browser hangs
setTimeout(() => finish('deny'), 5 * 60 * 1000);
});
}
}
// Normalization: Flatten the SDK's multiple event types into a few frame types easy for the frontend to handle
// Without this layer, the frontend would need to recognize 20+ types of SDK internal events, making modifications difficult
function normalize(msg) {
if (msg.type === 'system' && msg.subtype === 'init') {
return [{ type: 'system_init', model: msg.model, tools: msg.tools }];
}
// Streaming text: Only pick this one type of delta; other stream_event types are discarded
if (msg.type === 'stream_event' && msg.event?.delta?.type === 'text_delta') {
return [{ type: 'text_delta', text: msg.event.delta.text }];
}
// Text blocks in assistant messages have already been pushed above; here we only extract tool calls
if (msg.type === 'assistant') {
return msg.message.content
.filter(b => b.type === 'tool_use')
.map(b => ({ type: 'tool_use', id: b.id, name: b.name, input: b.input }));
}
if (msg.type === 'user') {
return msg.message.content
.filter(b => b.type === 'tool_result')
.map(b => ({ type: 'tool_result', id: b.tool_use_id, content: b.content }));
}
if (msg.type === 'result') {
return [{ type: 'turn_result', cost: msg.total_cost_usd, usage: msg.usage }];
}
return []; // All other control events are not forwarded to avoid frontend noise
}
A few points worth noting:
- Not passing
settingSourcesor passing an empty array = cannot access global skills/MCP. This is the key switch for "zero configuration" in claude code for web. includePartialMessages: trueis the lifeline of streaming. Turning it off means you only get complete blocks, and character-by-character rendering can only be faked.- Inside
canUseTool,allowmust returnupdatedInput. The SDK runtime's Zod schema is stricter than the .d.ts — coding based only on type definitions will inevitably hit this pitfall. - The 5-minute timeout is to prevent the SDK's internal Promise from hanging forever if the browser is closed. This is not a nice-to-have; it's a must.
normalize()is the key to decoupling: The frontend only recognizes N types of frames and is unaware of the SDK's internal protocol. If the SDK version changes, MCP is added, or streaming strategy is altered in the future, the frontend doesn't need to change.
2.3.3 Frontend Code Snippet (web/app.js)
The frontend has just one dispatch: receive a frame, switch to the corresponding rendering function. There is no business logic; all "judgments" are done on the Node side.
const ws = new WebSocket('ws://127.0.0.1:1717');
ws.onmessage = (e) => {
const frame = JSON.parse(e.data);
switch (frame.type) {
case 'system_init':
// Comes once when the session is first established, telling the frontend the current model and available tools
renderStatus(`Model: ${frame.model} · ${frame.tools.length} tools`);
break;
case 'text_delta':
// Core of character-by-character streaming: append to the current assistant bubble whenever new text arrives
appendToCurrentBubble(frame.text);
break;
case 'tool_use':
// Claude wants to use a tool; create a collapsible card placeholder, waiting for the result to be filled in
createToolCard(frame.id, frame.name, frame.input);
break;
case 'tool_result':
// Tool execution finished; fill the result back into the card created earlier
fillToolCard(frame.id, frame.content);
break;
case 'permission_request':
// Dangerous operation: show a bar, wait for user click; once clicked, send the decision back to Node
showApprovalBar(frame, (decision) => {
ws.send(JSON.stringify({
type: 'permission_response',
id: frame.id,
decision, // 'allow' | 'deny'
}));
});
break;
case 'turn_result':
// Bill for the entire turn: how much money spent, how many tokens used
renderMeta({ cost: frame.cost, duration: frame.duration });
break;
case 'done':
unlockInput();
if (frame.aborted) showBanner('Cancelled');
break;
case 'error':
showError(frame.message);
unlockInput();
break;
}
};
sendBtn.onclick = () => {
const text = inputEl.value.trim();
if (!text) return;
lockInput();
ws.send(JSON.stringify({ type: 'user_message', text }));
};
stopBtn.onclick = () => ws.send(JSON.stringify({ type: 'stop' });
A few points worth noting:
- This file has almost no state; it's just a
switch. All state machines for "what is Claude doing / should the input box be locked / should it render" reside in the Node side's state. The frontend is only responsible for presentation. - The handling of
permission_requestis the only asynchronous wait — everything else is rendered immediately upon receiving a frame. Here, the UI needs to be locked waiting for the user to click, and only then is the response sent back. - When actually implementing
text_delta, you need to do buffering and safety boundaries — don't justinnerHTML += textupon receipt; a half-received backtick will drive the Markdown parser crazy. This code is omitted here; seeapp.jsin the project. - The
abortedfield determines whetherdonemeans "user actively stopped" or "normal end". The UI needs to differentiate (Cancelled vs unlocking the input box).
2.3.4 Protocol Frames (Bidirectional)
Protocol design principle: Each message does only one thing. The browser sends 3 types (Node only cares about "what you want" + "what you decided" + "you want to stop"), and Node sends 8 types (the browser needs to perceive key nodes in the lifecycle).
Browser → Node (3 types):
| Type | Fields | Purpose |
|---|---|---|
user_message |
text |
User input |
permission_response |
id, decision |
Approval reinjection |
stop |
— | Interrupt current turn |
Node → Browser (8 types):
| Type | Fields | Purpose |
|---|---|---|
system_init |
model, tools, mcpServers |
Session initialization (once per connection) |
text_delta |
text |
Streaming text increment (pushed N times per turn) |
tool_use |
id, name, input |
Tool call start |
tool_result |
id, content, isError |
Tool execution result |
permission_request |
id, toolName, input |
Request approval |
turn_result |
cost, usage, duration, numTurns |
Bill for the entire turn |
done |
sessionId, aborted? |
Turn end |
error |
message |
Exception |
The protocol itself has no schema file; field conventions rely on the code. The cost of adding new frames is very low (just add a case). Changing field names will break the other side — so once field names are set, don't change them casually.
3 Demo of the Completed Effect:
4 Summary
Code repository address: https://github.com/GoodLuckAlien/claude-code-for-web
Writing up to this point, Claude Code has transformed from a cold terminal command into a window in the browser that can be pushed open at will.
Looking back at this link, it's actually not that mysterious: Node is the heart, the browser is the face, WebSocket is the blood vessel, and the protocol frames are the nerve endings. We didn't reinvent the Agent; we just disassembled the three things originally buried in the terminal black box — "startup / approval / output" — into segments of frames that can be observed, intercepted, and orchestrated.
Technology is often like this — the real difficulty has never been building something that can run, but making something that can run "usable". Putting the CLI into the Web is a small step, but what it pries open is the possibility for Agents to reach ordinary users and enter real workflows.
If this article gave you some new ideas, feel free to like / bookmark it. Your support is my confidence to continue playing with Claude Code in creative ways 🚀.