跪拜 Guibai
← Back to the summary

Streaming AI Agent Terminal Output to the Browser with SSE

Stream & SSE 101 — Receiving AI Agent Terminal Output in Real-Time on the Web

Foreword

Recently, I've been working on integrating an AI Agent into a web frontend. The requirement is straightforward: a user triggers an AI Agent (like Claude Code, Codex, etc.) on a webpage, and then sees its output stream in real-time, just like watching a terminal—characters appearing line by line, rather than waiting for it to finish and returning everything at once.

The core technologies for this scenario are Stream and SSE (Server-Sent Events). I've been working on this for over a week and stepped on quite a few pitfalls; here's a summary.

Why Not WebSocket

When people hear "real-time communication," many immediately think of WebSocket. But WebSocket is too heavy—it's full-duplex, meaning the client and server can send messages to each other simultaneously. For an AI Agent's output scenario, we only need one-way server-to-client push. WebSocket is overkill.

SSE is naturally designed for this kind of scenario:

Feature SSE WebSocket
Communication Direction Server → Client (one-way) Bidirectional
Protocol HTTP ws://
Auto-Reconnect Built-in Manual implementation required
Browser Support All modern browsers All modern browsers
Proxy/Firewall Traversal Good (HTTP-based) Occasionally blocked
Complexity Low High

In a nutshell: If you only need the server to push data, use SSE; if you need bidirectional communication, use WebSocket.

What is a Stream

Before discussing SSE, let's clarify the concept of "Stream." A Stream is essentially a chunked data transfer pattern.

Traditional HTTP requests follow a "request-response" model: the client sends a request, waits, the server finishes processing, and returns the entire response body at once. For AI Agent tasks that can take tens of seconds or even minutes, the user experience is terrible—you stare at a spinning loading indicator with no idea what's happening.

The Stream approach: the server doesn't hoard data; it sends a bit as soon as it's generated. The client receives a chunk and renders it immediately, just like watching terminal output.

Traditional: Request ──────────────────────────────> Full Response
Stream:     Request ──> chunk1 ──> chunk2 ──> chunk3 ──> ... ──> [DONE]

Backend Stream

Taking Node.js as an example, the backend pushes data chunk by chunk using ReadableStream or the framework's built-in stream capabilities:

// Express example
app.post('/api/agent/run', async (req, res) => {
  // Key: Set SSE response headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const agent = spawn('claude', ['--print', req.body.prompt]);

  agent.stdout.on('data', (chunk) => {
    // Push each line produced to the client via SSE format
    res.write(`data: ${JSON.stringify({ text: chunk.toString() })}\n\n`);
  });

  agent.on('close', () => {
    res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
    res.end();
  });
});

Note the use of res.write() instead of res.send()—the former is streaming writes, the latter is a one-time send.

SSE Protocol

SSE (Server-Sent Events) is a one-way push protocol built on top of HTTP. Its data format is very simple:

data: {"text": "Hello"}

data: {"text": " World"}

data: {"done": true}

There are only three rules:

  1. Each event starts with data:
  2. The event content follows the colon
  3. An event ends with two newline characters \n\n

That's it. No handshake, no frames, no binary—pure text.

Additional SSE Fields

Besides data, SSE supports several optional fields:

id: 42
event: message
retry: 3000
data: {"text": "Hello"}
Field Purpose
data Event data, supports multiple lines (one data: per line)
event Event type, clients can handle differently based on this
id Event ID, used for resuming from breakpoints
retry Reconnection wait time (milliseconds)

Architecture Design

Below is a diagram showing the architecture for the entire AI Agent streaming output:

┌─────────────────────────────────────────────────────────────────────┐
│                            Browser                                  │
│                                                                     │
│  ┌───────────────┐   POST /api/agent/run   ┌──────────────────────┐ │
│  │   UI Layer    │ ───────────────────────► │ fetch + Readable    │ │
│  │   (React)     │                          │ Stream (SSE parser) │ │
│  └───────────────┘                          └──────────┬──────────┘ │
│          │                                             │            │
│          │ render chunk by chunk                       │ HTTP       │
│          ▼                                             ▼            │
│  ┌───────────────┐                          ┌─────────────────────┐ │
│  │   Terminal    │ ◄─────────────────────── │ EventSource /       │ │
│  │   xterm.js    │   text: "Hello\n"        │ fetch SSE client    │ │
│  └───────────────┘   text: "World\n"        └─────────────────────┘ │
│                     done: true                                      │
└──────────────────────────────────────┬──────────────────────────────┘
                                       │ HTTP (SSE)
                                       │ Content-Type: text/event-stream
                                       ▼
┌──────────────────────────────────────┴──────────────────────────────┐
│                        Nginx / Reverse Proxy                        │
│              proxy_buffering off;  chunked_transfer_encoding on;    │
└──────────────────────────────────────┬──────────────────────────────┘
                                       │
                                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       Backend Server (Node.js)                      │
│                                                                     │
│  ┌──────────────────┐    spawn / API call    ┌───────────────────┐  │
│  │ SSE Route Handler│ ─────────────────────► │ AI Agent Process  │  │
│  │ Set SSE headers  │                        │ Claude Code       │  │
│  │ Push chunk by    │ ◄─── stdout.on('data') │ Codex / Others    │  │
│  └──────────────────┘                        └───────────────────┘  │
│                                                                     │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │ Session Manager                                               │  │
│  │ - Maintain each Agent session state                           │  │
│  │ - Track Last-Event-ID for resume support                      │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Data flow:

  1. The user initiates a request from the frontend (POST prompt)
  2. The backend spawns the AI Agent process and establishes an SSE long connection
  3. Every time the Agent produces a line of output, the backend wraps it as an SSE event and pushes it
  4. The frontend receives it chunk by chunk and renders it in real-time to the terminal component

Resumable Delivery: Last-Event-ID

SSE has a very practical mechanism—resumable delivery. When a network hiccup causes a connection interruption, the client doesn't need to start from scratch; it can continue receiving from where it left off.

This is the role of Last-Event-ID. It's not a request parameter, but an automatic reconnection mechanism built into the SSE protocol:

How It Works

Server sends:
  id: 1
  data: {"text": "Hello"}

  id: 2
  data: {"text": " World"}

  id: 3
  data: {"text": "!"}

        ── Network disconnected ──

Client auto-reconnects, request header includes:
  Last-Event-ID: 3    ← Tells the server "I received id=3"

Server continues pushing from id=4:
  id: 4
  data: {"text": " Done!"}

Backend Implementation

app.get('/api/agent/stream', (req, res) => {
  // Get the last event ID the client received
  const lastId = req.headers['last-event-id'];

  // SSE response headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  // Resume from the breakpoint (need to recover historical events from session)
  const sessionId = req.query.sessionId;
  const history = sessionStore.getHistory(sessionId, lastId);

  // First, resend events before the breakpoint
  for (const event of history) {
    res.write(`id: ${event.id}\ndata: ${JSON.stringify(event.data)}\n\n`);
  }

  // Then continue receiving new Agent output
  agent.stdout.on('data', (chunk) => {
    const eventId = sessionStore.nextId(sessionId);
    res.write(
      `id: ${eventId}\ndata: ${JSON.stringify({ text: chunk.toString() })}\n\n`,
    );
  });
});

Frontend Implementation

Native EventSource automatically handles Last-Event-ID with no extra code. If using fetch + ReadableStream, you need to implement it manually:

let lastEventId = null;

async function connectWithResume(sessionId) {
  const headers = { 'Content-Type': 'application/json' };
  if (lastEventId) {
    headers['Last-Event-ID'] = lastEventId; // Manually include the breakpoint ID
  }

  const response = await fetch(`/api/agent/stream?sessionId=${sessionId}`, {
    method: 'GET',
    headers,
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split('\n\n');
    buffer = events.pop();

    for (const event of events) {
      const idMatch = event.match(/^id:\s*(.+)$/m);
      const dataMatch = event.match(/^data:\s*(.*)$/m);
      if (idMatch) lastEventId = idMatch[1]; // Record the latest ID
      if (dataMatch) {
        const data = JSON.parse(dataMatch[1]);
        appendToTerminal(data.text);
      }
    }
  }
}

Note: Last-Event-ID is only automatically carried in EventSource. With fetch, you need to manually parse the id field from the response and add it to the request header upon reconnection.

How the Frontend Receives Data

Method 1: EventSource (Native API)

Browsers natively provide the EventSource API, which is extremely simple to use:

const source = new EventSource('/api/agent/stream?id=123');

source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.done) {
    console.log('Agent execution finished');
    source.close();
    return;
  }
  // Append to terminal UI in real-time
  appendToTerminal(data.text);
};

source.onerror = (err) => {
  console.error('SSE connection error', err);
};

But EventSource has a fatal flaw: it only supports GET requests. AI Agents typically need to POST a prompt, which is awkward.

Method 2: fetch + ReadableStream (Recommended)

For SSE scenarios requiring POST requests, using fetch with ReadableStream is a more flexible approach:

async function runAgent(prompt) {
  const response = await fetch('/api/agent/run', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });

    // Split SSE events by \n\n
    const events = buffer.split('\n\n');
    buffer = events.pop(); // The last segment might be incomplete, save for next time

    for (const event of events) {
      const match = event.match(/^data:\s*(.*)$/m);
      if (!match) continue;

      const data = JSON.parse(match[1]);
      if (data.done) {
        console.log('Agent execution finished');
        return;
      }
      appendToTerminal(data.text);
    }
  }
}

Core idea:

  1. fetch gets the response.body (a ReadableStream)
  2. Use getReader() to read chunk by chunk
  3. Use TextDecoder to convert binary to text
  4. Split by \n\n to parse SSE events one by one

Method 3: Third-Party Libraries

If you don't want to parse the SSE format yourself, you can use an off-the-shelf library:

npm install @microsoft/fetch-event-source
import { fetchEventSource } from '@microsoft/fetch-event-source';

await fetchEventSource('/api/agent/run', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ prompt }),

  onmessage(ev) {
    const data = JSON.parse(ev.data);
    if (data.done) return;
    appendToTerminal(data.text);
  },

  onerror(err) {
    console.error('SSE error', err);
  },

  onclose() {
    console.log('Connection closed');
  },
});

The advantage of this Microsoft library: it supports POST, automatic reconnection, and custom headers, making it much more useful than native EventSource.

Frontend Rendering Solutions

Once you have the streaming data, how do you render it? Claude Code's output contains ANSI escape sequences (colors, cursor movements, progress bars, etc.), not plain text, so the choice of rendering solution is critical.

Solution 1: xterm.js (Recommended)

xterm.js is the de facto standard for frontend terminal rendering; VS Code's built-in terminal uses it. It fully supports ANSI escape sequences and can reproduce Claude Code's terminal output 1:1.

npm install xterm @xterm/addon-fit @xterm/addon-web-links
import { Terminal } from 'xterm';
import { FitAddon } from '@xterm/addon-fit';
import { WebLinksAddon } from '@xterm/addon-web-links';
import 'xterm/css/xterm.css';

const term = new Terminal({
  theme: {
    background: '#1e1e1e',
    foreground: '#d4d4d4',
    cursor: '#d4d4d4',
  },
  fontSize: 14,
  fontFamily: "'Fira Code', 'Menlo', monospace",
  cursorBlink: true,
  scrollback: 10000,
});

const fitAddon = new FitAddon();
term.loadAddon(fitAddon);
term.loadAddon(new WebLinksAddon()); // Auto-detect links
term.open(document.getElementById('terminal'));
fitAddon.fit();

// Receive SSE data, write directly to terminal
function onSSEMessage(data) {
  if (data.done) {
    term.writeln('\r\n\x1b[32m✓ Agent execution finished\x1b[0m');
    return;
  }
  // xterm.js natively supports ANSI escape sequences, just write directly
  term.write(data.text);
}

Advantages:

Solution 2: ansi_up (Lightweight)

If you don't want to introduce a full terminal emulator, ansi_up can convert ANSI escape sequences to HTML for rendering with regular DOM elements:

npm install ansi_up
import AnsiUp from 'ansi_up';

const ansiUp = new AnsiUp();
const output = document.getElementById('output');

function appendToTerminal(text) {
  const html = ansiUp.ansi_to_html(text);
  output.innerHTML += html;
  output.scrollTop = output.scrollHeight;
}

Advantages: Lightweight (~10KB), suitable for simple scenarios needing only color rendering. Disadvantages: Does not support complex ANSI operations like cursor movement or progress bar overwriting.

Solution Comparison

Feature xterm.js ansi_up Plain <pre>
ANSI Colors
Cursor Movement / Clear Screen
Progress Bar Overwrite Refresh
Text Selection/Copy
Bundle Size ~200KB ~10KB 0
Use Case Full terminal experience Simple colored output Plain text

Conclusion: For rendering Claude Code output, xterm.js is the first choice. Its ANSI compatibility is the best and can fully reproduce the terminal experience. If you're just displaying simple colored logs, ansi_up is sufficient.

Streaming Markdown Rendering (For Conversational AI Output)

If the frontend isn't for a pure terminal display but a conversational UI like ChatGPT, you need a renderer that can handle streaming Markdown. Besides terminal streams, Claude Code's output often contains Markdown-formatted content (code blocks, file diffs, lists, etc.).

Traditional react-markdown cannot handle unclosed code blocks, incomplete tables, and other issues in AI streaming output—and Streamdown (by Vercel, 5k+ stars) is specifically designed to solve this pain point:

npm install streamdown @streamdown/code @streamdown/math @streamdown/mermaid @streamdown/cjk
import { useChat } from '@ai-sdk/react';
import { Streamdown } from 'streamdown';
import { code } from '@streamdown/code';
import { mermaid } from '@streamdown/mermaid';
import { math } from '@streamdown/math';
import { cjk } from '@streamdown/cjk';
import 'katex/dist/katex.min.css';
import 'streamdown/styles.css';

export default function Chat() {
  const { messages, status } = useChat();

  return (
    <div>
      {messages.map((message) => (
        <div key={message.id}>
          {message.role === 'user' ? 'User: ' : 'AI: '}
          {message.parts.map((part, index) =>
            part.type === 'text' ? (
              <Streamdown
                key={index}
                plugins={{ code, mermaid, math, cjk }}
                isAnimating={status === 'streaming'}>
                {part.text}
              </Streamdown>
            ) : null,
          )}
        </div>
      ))}
    </div>
  );
}

Streamdown's core advantages:

Code Highlighting Library Comparison (Streamdown has Shiki built-in; the following is for reference in standalone usage scenarios):

Library Bundle Size Language Support Features
Shiki ~500KB 100+ VS Code equivalent syntax highlighting, best results
Prism.js ~20KB 270+ Lightweight, rich plugins

Common UI Component Summary

Scenario Recommended Solution Representative Project
Full Terminal Experience xterm.js (14k stars) VS Code Terminal, Claude Code Web
Simple Colored Logs ansi_up Lightweight log panel
Conversational AI Output Streamdown (5k+ stars) ChatGPT, Claude Web
File Diff Display react-diff-viewer GitHub PR, Code Review tools
Mind Map react-markmap Real-time mind map preview

Keeping Long Connections Alive: Heartbeat Keep-Alive

SSE is essentially an HTTP long connection. In theory, it stays open as long as neither side actively closes it. But in real-world production, proxy layers (Nginx, CDN, load balancers) usually have an idle timeout mechanism—if no data is transmitted for a period, they will actively disconnect.

For example, Nginx's default proxy_read_timeout is 60 seconds, meaning if no data flows through for 60 seconds, the connection gets cut. AI Agents sometimes think for a long time (e.g., Claude Code analyzing a large file), during which there might be no output for tens of seconds, putting the connection at risk.

Solution: Server-Side Heartbeat

The server periodically sends an SSE comment (a line starting with :), telling the proxy layer "the connection is still alive." The SSE protocol specifies that lines starting with : are comments and are automatically ignored by the client:

app.post('/api/agent/run', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  // Heartbeat: send a comment every 15 seconds to prevent proxy layer timeout disconnection
  const heartbeat = setInterval(() => {
    res.write(': heartbeat\n\n');
  }, 15000);

  const agent = spawn('claude', ['--print', req.body.prompt]);

  agent.stdout.on('data', (chunk) => {
    res.write(`data: ${JSON.stringify({ text: chunk.toString() })}\n\n`);
  });

  agent.on('close', () => {
    res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
    clearInterval(heartbeat); // Clean up heartbeat
    res.end();
  });

  // Also clean up when the client disconnects
  req.on('close', () => {
    clearInterval(heartbeat);
    agent.kill();
  });
});

Key points:

No Need to Poll and Re-establish Connection

Some might ask: Do we need to re-send a request every minute with a since parameter?

No. SSE itself is a persistent connection. As long as the heartbeat keep-alive is done correctly, the connection will remain open. Only when the connection actually breaks (network failure, proxy timeout, etc.) do you need to reconnect—at that point, use Last-Event-ID to resume from the breakpoint, no need to start from scratch.

Pitfalls Encountered

1. Nginx Proxy Buffering

SSE's biggest enemy is proxy layer buffering. Nginx buffers responses by default, causing the client not to receive real-time data. Solution:

location /api/agent/ {
  proxy_pass http://backend;
  proxy_buffering off;          # Disable buffering
  proxy_cache off;              # Disable caching
  proxy_read_timeout 300s;      # Long connection timeout
  chunked_transfer_encoding on;
}

2. Connection Limit

Under HTTP/1.1, browsers have a limit on concurrent connections to the same domain (usually 6). If multiple Agent sessions are opened simultaneously, connections might be maxed out. Solutions:

3. Large Output Memory Issues

AI Agents sometimes output large amounts of content (e.g., reading an entire file). If the frontend keeps appending to the DOM, the page will get increasingly laggy. Suggestions:

Summary

Scenario Recommended Solution
SSE requiring only GET Native EventSource
SSE requiring POST (e.g., AI Agent) fetch + ReadableStream or @microsoft/fetch-event-source
Need bidirectional communication WebSocket
Need full terminal experience xterm.js
Conversational AI output (streaming Markdown) Streamdown

The Stream + SSE combo works excellently in AI Agent scenarios: lightweight, one-way, auto-reconnect, good compatibility. Compared to WebSocket's complex handshake and bidirectional communication, SSE is tailor-made for the "server pushes data" scenario.