跪拜 Guibai
← Back to the summary

Claude Code Gets a Browser UI with Real-Time Tool Approval

1 Foreword

Hello, I'm alien. In the previous article "Please Stop Only Using Claude Code for Coding! Unlock Its Full Potential?", we introduced some use cases for Claude Code in other scenarios. Today, following the Claude Code line of thinking, let's talk about how to conveniently use Claude Code inside a browser. I'll also share some creative ideas based on Claude Code.

What can you gain from this document?

When using traditional Claude Code, we might encounter the following troublesome scenarios:

Scenario 1: Unfriendly for non-developer users

We all know that Claude Code is essentially an agent. Besides writing code, an agent can also help us with some process-oriented work-related tasks to empower our work. For non-developer engineers, the learning curve for Claude Code might be much steeper than interacting with a Doubao bot, including skills management, MCP management, model switching, etc.

So, can we move Claude Code into the browser, creating a more user-friendly terminal interaction interface?

Scenario 2: Claude Code typically runs in the terminal via CLI, interacting through terminal commands. This makes it difficult to interfere with the results returned by Claude Code, especially when we want to customize a workflow based on the Claude Code Agent, or perform secondary encapsulation or processing based on the results returned by Claude Code.

be92acd4b5ba2028da191e2acba7eb44.png

Today, we'll address these two issues and see how to move Claude Code into the browser, and potentially achieve effective control over the startup, process, and output results of Claude Code.

2 Basic Implementation Approach

2.1 Basic Schematic

Claude Code itself is based on the Node.js runtime environment. To use Claude Code more flexibly, you could directly use the CLI method. For remote scenarios, you can provide a stable sandbox environment for Claude Code through solutions like Docker. Let's look at the design principle:

0355f3b3a4f70085cbeb41c1a2a5319e.png

An end-to-end interaction should look like this:

fa2478d820a3be412df47cdc4340730a.png

A very critical part of the entire process is claude-agent-sdk. Let's learn about it.

2.2 claude-agent-sdk

@anthropic-ai/claude-agent-sdk is an official library from Anthropic that treats "Claude Code, the CLI, as a high-level object that can be driven by a JS program."

Unlike @anthropic-ai/sdk, claude-agent-sdk encapsulates the following on top of it:

This way, you can operate the Claude Code CLI via an API, use Claude Code more conveniently, and also read the Claude Code configuration on the current application, as well as the skills and plugins registered by Claude Code.

Let's look at the basic usage:

  import { query } from '@anthropic-ai/claude-agent-sdk';                   
                                                                            
  const stream = query({                                                    
    prompt: 'Create a new file hello.txt in the current directory',         
    options: {                                                              
      cwd: '/path/to/project',                                              
      settingSources: ['user', 'project', 'local'],                         
      includePartialMessages: true,                                         
      permissionMode: 'default',                                            
      canUseTool: async (toolName, input, ctx) => { /* ... */ },            
    },                                                                      
  });                                                           
                                                                            
  for await (const msg of stream) {                                         
    // msg is one of the SDKMessage subtypes                                
    console.log(msg);                                                       
  } 

Let's look at the core parameter information for options:

Value What it reads
'user' ~/.claude/ global configuration
'project' .claude/ under cwd
'local' .claude/settings.local.json

After introducing claude-agent-sdk, let's look at the brief implementation logic:

f23c9d10cefe05244cf2560cc9eb8f81.png

A brief explanation here:

2.3 Brief Implementation

2.3.1 Overall Flow

0355f3b3a4f70085cbeb41c1a2a5319e.png

A single user_message link:

  1. Browser input → user_message frame → Node
  2. query({ prompt, options }) starts SDK session → spawns Claude Code child process
  3. SDKMessage streamed back → normalize() normalization → frames pushed to browser
  4. text_delta frames rendered character by character
  5. Dangerous tool triggers canUseToolpermission_request frame pushed to browser
  6. User decision → permission_response reinjected → Promise resolves
  7. Turn ends → turn_result + done frames → record sessionId for next continuation

2.3.2 Node-side Code Snippet (core/session.js)

This file does two things: drives Claude Code + pushes approvals to the browser and waits for reinjection. Everything else is detail.

import { query } from '@anthropic-ai/claude-agent-sdk';

class CodeSession {
  constructor({ cwd, onFrame }) {
    this.cwd = cwd;
    this.onFrame = onFrame;        // Callback for returning frames, injected by server.js as ws.send
    this.sessionId = null;         // Stored after the first turn, used for subsequent resumes
    this._permIdSeq = 0;           // Auto-increment ID for approval requests, used for reconciliation
    this._pendingPerms = new Map(); // id -> resolve function, retrieved when approval is reinjected
    this._abort = null;            // AbortController for the current turn
  }

  async send(prompt) {
    this._abort = new AbortController();
    const options = {
      cwd: this.cwd,
      // Crucial: Without this field, the SDK won't read ~/.claude/, and all your global skills/MCP will be gone
      settingSources: ['user', 'project', 'local'],
      includePartialMessages: true,   // Without this, character-by-character streaming is impossible; you'll only get whole blocks
      permissionMode: 'default',      // Dangerous tools go through the callback below
      canUseTool: this._canUseTool.bind(this),
      signal: this._abort.signal,
    };
    if (this.sessionId) options.resume = this.sessionId;  // Resume from the second turn onwards

    try {
      const stream = query({ prompt, options });
      for await (const msg of stream) {
        // Only record on first occurrence; subsequent messages won't overwrite
        if (msg.session_id && !this.sessionId) this.sessionId = msg.session_id;
        for (const frame of normalize(msg)) this.onFrame(frame);
      }
      this.onFrame({ type: 'done', sessionId: this.sessionId });
    } catch (err) {
      // User clicking stop ≠ error. The frontend needs a "Cancelled" prompt, not an error popup
      if (this._abort.signal.aborted) {
        this.onFrame({ type: 'done', aborted: true });
      } else {
        this.onFrame({ type: 'error', message: err.message });
      }
    }
  }

  stop() { this._abort?.abort(); }

  resolvePermission(id, decision) {
    // Entry point for browser reinjection, retrieves the corresponding resolve function from the Map
    this._pendingPerms.get(id)?.(decision);
  }

  // The most convoluted part of the entire code, but the logic isn't complex:
  // The SDK calls this function expecting a Promise (async), so we create one on the spot, push it to the browser, and wait for reinjection
  _canUseTool(toolName, input, ctx) {
    const id = ++this._permIdSeq;
    this.onFrame({ type: 'permission_request', id, toolName, input });

    return new Promise((resolve) => {
      const finish = (decision) => {
        this._pendingPerms.delete(id);
        // Note: The allow branch must return updatedInput,
        // The SDK runtime uses Zod for strict validation; only providing behavior will cause an error
        resolve(
          decision === 'allow'
            ? { behavior: 'allow', updatedInput: input }
            : { behavior: 'deny', message: 'User denied in browser' }
        );
      };
      this._pendingPerms.set(id, finish);
      // If no one responds within 5 minutes, treat as deny to prevent the Promise from never settling if the browser hangs
      setTimeout(() => finish('deny'), 5 * 60 * 1000);
    });
  }
}

// Normalization: Flatten the SDK's multiple event types into a few frame types easy for the frontend to handle
// Without this layer, the frontend would need to recognize 20+ types of SDK internal events, making modifications difficult
function normalize(msg) {
  if (msg.type === 'system' && msg.subtype === 'init') {
    return [{ type: 'system_init', model: msg.model, tools: msg.tools }];
  }
  // Streaming text: Only pick this one type of delta; other stream_event types are discarded
  if (msg.type === 'stream_event' && msg.event?.delta?.type === 'text_delta') {
    return [{ type: 'text_delta', text: msg.event.delta.text }];
  }
  // Text blocks in assistant messages have already been pushed above; here we only extract tool calls
  if (msg.type === 'assistant') {
    return msg.message.content
      .filter(b => b.type === 'tool_use')
      .map(b => ({ type: 'tool_use', id: b.id, name: b.name, input: b.input }));
  }
  if (msg.type === 'user') {
    return msg.message.content
      .filter(b => b.type === 'tool_result')
      .map(b => ({ type: 'tool_result', id: b.tool_use_id, content: b.content }));
  }
  if (msg.type === 'result') {
    return [{ type: 'turn_result', cost: msg.total_cost_usd, usage: msg.usage }];
  }
  return [];  // All other control events are not forwarded to avoid frontend noise
}

A few points worth noting:

2.3.3 Frontend Code Snippet (web/app.js)

The frontend has just one dispatch: receive a frame, switch to the corresponding rendering function. There is no business logic; all "judgments" are done on the Node side.

const ws = new WebSocket('ws://127.0.0.1:1717');

ws.onmessage = (e) => {
  const frame = JSON.parse(e.data);
  switch (frame.type) {
    case 'system_init':
      // Comes once when the session is first established, telling the frontend the current model and available tools
      renderStatus(`Model: ${frame.model} · ${frame.tools.length} tools`);
      break;
    case 'text_delta':
      // Core of character-by-character streaming: append to the current assistant bubble whenever new text arrives
      appendToCurrentBubble(frame.text);
      break;
    case 'tool_use':
      // Claude wants to use a tool; create a collapsible card placeholder, waiting for the result to be filled in
      createToolCard(frame.id, frame.name, frame.input);
      break;
    case 'tool_result':
      // Tool execution finished; fill the result back into the card created earlier
      fillToolCard(frame.id, frame.content);
      break;
    case 'permission_request':
      // Dangerous operation: show a bar, wait for user click; once clicked, send the decision back to Node
      showApprovalBar(frame, (decision) => {
        ws.send(JSON.stringify({
          type: 'permission_response',
          id: frame.id,
          decision,                // 'allow' | 'deny'
        }));
      });
      break;
    case 'turn_result':
      // Bill for the entire turn: how much money spent, how many tokens used
      renderMeta({ cost: frame.cost, duration: frame.duration });
      break;
    case 'done':
      unlockInput();
      if (frame.aborted) showBanner('Cancelled');
      break;
    case 'error':
      showError(frame.message);
      unlockInput();
      break;
  }
};

sendBtn.onclick = () => {
  const text = inputEl.value.trim();
  if (!text) return;
  lockInput();
  ws.send(JSON.stringify({ type: 'user_message', text }));
};

stopBtn.onclick = () => ws.send(JSON.stringify({ type: 'stop' });

A few points worth noting:

2.3.4 Protocol Frames (Bidirectional)

Protocol design principle: Each message does only one thing. The browser sends 3 types (Node only cares about "what you want" + "what you decided" + "you want to stop"), and Node sends 8 types (the browser needs to perceive key nodes in the lifecycle).

Browser → Node (3 types):

Type Fields Purpose
user_message text User input
permission_response id, decision Approval reinjection
stop Interrupt current turn

Node → Browser (8 types):

Type Fields Purpose
system_init model, tools, mcpServers Session initialization (once per connection)
text_delta text Streaming text increment (pushed N times per turn)
tool_use id, name, input Tool call start
tool_result id, content, isError Tool execution result
permission_request id, toolName, input Request approval
turn_result cost, usage, duration, numTurns Bill for the entire turn
done sessionId, aborted? Turn end
error message Exception

The protocol itself has no schema file; field conventions rely on the code. The cost of adding new frames is very low (just add a case). Changing field names will break the other side — so once field names are set, don't change them casually.

3 Demo of the Completed Effect:

ezgif-4b2348b4197f2003.gif

4 Summary

Code repository address: https://github.com/GoodLuckAlien/claude-code-for-web

Writing up to this point, Claude Code has transformed from a cold terminal command into a window in the browser that can be pushed open at will.

Looking back at this link, it's actually not that mysterious: Node is the heart, the browser is the face, WebSocket is the blood vessel, and the protocol frames are the nerve endings. We didn't reinvent the Agent; we just disassembled the three things originally buried in the terminal black box — "startup / approval / output" — into segments of frames that can be observed, intercepted, and orchestrated.

Technology is often like this — the real difficulty has never been building something that can run, but making something that can run "usable". Putting the CLI into the Web is a small step, but what it pries open is the possibility for Agents to reach ordinary users and enter real workflows.

If this article gave you some new ideas, feel free to like / bookmark it. Your support is my confidence to continue playing with Claude Code in creative ways 🚀.