Streaming AI Agent Terminal Output to the Browser with SSE
Stream & SSE 101 — Receiving AI Agent Terminal Output in Real-Time on the Web
Foreword
Recently, I've been working on integrating an AI Agent into a web frontend. The requirement is straightforward: a user triggers an AI Agent (like Claude Code, Codex, etc.) on a webpage, and then sees its output stream in real-time, just like watching a terminal—characters appearing line by line, rather than waiting for it to finish and returning everything at once.
The core technologies for this scenario are Stream and SSE (Server-Sent Events). I've been working on this for over a week and stepped on quite a few pitfalls; here's a summary.
Why Not WebSocket
When people hear "real-time communication," many immediately think of WebSocket. But WebSocket is too heavy—it's full-duplex, meaning the client and server can send messages to each other simultaneously. For an AI Agent's output scenario, we only need one-way server-to-client push. WebSocket is overkill.
SSE is naturally designed for this kind of scenario:
| Feature | SSE | WebSocket |
|---|---|---|
| Communication Direction | Server → Client (one-way) | Bidirectional |
| Protocol | HTTP | ws:// |
| Auto-Reconnect | Built-in | Manual implementation required |
| Browser Support | All modern browsers | All modern browsers |
| Proxy/Firewall Traversal | Good (HTTP-based) | Occasionally blocked |
| Complexity | Low | High |
In a nutshell: If you only need the server to push data, use SSE; if you need bidirectional communication, use WebSocket.
What is a Stream
Before discussing SSE, let's clarify the concept of "Stream." A Stream is essentially a chunked data transfer pattern.
Traditional HTTP requests follow a "request-response" model: the client sends a request, waits, the server finishes processing, and returns the entire response body at once. For AI Agent tasks that can take tens of seconds or even minutes, the user experience is terrible—you stare at a spinning loading indicator with no idea what's happening.
The Stream approach: the server doesn't hoard data; it sends a bit as soon as it's generated. The client receives a chunk and renders it immediately, just like watching terminal output.
Traditional: Request ──────────────────────────────> Full Response
Stream: Request ──> chunk1 ──> chunk2 ──> chunk3 ──> ... ──> [DONE]
Backend Stream
Taking Node.js as an example, the backend pushes data chunk by chunk using ReadableStream or the framework's built-in stream capabilities:
// Express example
app.post('/api/agent/run', async (req, res) => {
// Key: Set SSE response headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const agent = spawn('claude', ['--print', req.body.prompt]);
agent.stdout.on('data', (chunk) => {
// Push each line produced to the client via SSE format
res.write(`data: ${JSON.stringify({ text: chunk.toString() })}\n\n`);
});
agent.on('close', () => {
res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
res.end();
});
});
Note the use of res.write() instead of res.send()—the former is streaming writes, the latter is a one-time send.
SSE Protocol
SSE (Server-Sent Events) is a one-way push protocol built on top of HTTP. Its data format is very simple:
data: {"text": "Hello"}
data: {"text": " World"}
data: {"done": true}
There are only three rules:
- Each event starts with
data: - The event content follows the colon
- An event ends with two newline characters
\n\n
That's it. No handshake, no frames, no binary—pure text.
Additional SSE Fields
Besides data, SSE supports several optional fields:
id: 42
event: message
retry: 3000
data: {"text": "Hello"}
| Field | Purpose |
|---|---|
data |
Event data, supports multiple lines (one data: per line) |
event |
Event type, clients can handle differently based on this |
id |
Event ID, used for resuming from breakpoints |
retry |
Reconnection wait time (milliseconds) |
Architecture Design
Below is a diagram showing the architecture for the entire AI Agent streaming output:
┌─────────────────────────────────────────────────────────────────────┐
│ Browser │
│ │
│ ┌───────────────┐ POST /api/agent/run ┌──────────────────────┐ │
│ │ UI Layer │ ───────────────────────► │ fetch + Readable │ │
│ │ (React) │ │ Stream (SSE parser) │ │
│ └───────────────┘ └──────────┬──────────┘ │
│ │ │ │
│ │ render chunk by chunk │ HTTP │
│ ▼ ▼ │
│ ┌───────────────┐ ┌─────────────────────┐ │
│ │ Terminal │ ◄─────────────────────── │ EventSource / │ │
│ │ xterm.js │ text: "Hello\n" │ fetch SSE client │ │
│ └───────────────┘ text: "World\n" └─────────────────────┘ │
│ done: true │
└──────────────────────────────────────┬──────────────────────────────┘
│ HTTP (SSE)
│ Content-Type: text/event-stream
▼
┌──────────────────────────────────────┴──────────────────────────────┐
│ Nginx / Reverse Proxy │
│ proxy_buffering off; chunked_transfer_encoding on; │
└──────────────────────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Backend Server (Node.js) │
│ │
│ ┌──────────────────┐ spawn / API call ┌───────────────────┐ │
│ │ SSE Route Handler│ ─────────────────────► │ AI Agent Process │ │
│ │ Set SSE headers │ │ Claude Code │ │
│ │ Push chunk by │ ◄─── stdout.on('data') │ Codex / Others │ │
│ └──────────────────┘ └───────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ Session Manager │ │
│ │ - Maintain each Agent session state │ │
│ │ - Track Last-Event-ID for resume support │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Data flow:
- The user initiates a request from the frontend (POST prompt)
- The backend spawns the AI Agent process and establishes an SSE long connection
- Every time the Agent produces a line of output, the backend wraps it as an SSE event and pushes it
- The frontend receives it chunk by chunk and renders it in real-time to the terminal component
Resumable Delivery: Last-Event-ID
SSE has a very practical mechanism—resumable delivery. When a network hiccup causes a connection interruption, the client doesn't need to start from scratch; it can continue receiving from where it left off.
This is the role of Last-Event-ID. It's not a request parameter, but an automatic reconnection mechanism built into the SSE protocol:
How It Works
Server sends:
id: 1
data: {"text": "Hello"}
id: 2
data: {"text": " World"}
id: 3
data: {"text": "!"}
── Network disconnected ──
Client auto-reconnects, request header includes:
Last-Event-ID: 3 ← Tells the server "I received id=3"
Server continues pushing from id=4:
id: 4
data: {"text": " Done!"}
Backend Implementation
app.get('/api/agent/stream', (req, res) => {
// Get the last event ID the client received
const lastId = req.headers['last-event-id'];
// SSE response headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Resume from the breakpoint (need to recover historical events from session)
const sessionId = req.query.sessionId;
const history = sessionStore.getHistory(sessionId, lastId);
// First, resend events before the breakpoint
for (const event of history) {
res.write(`id: ${event.id}\ndata: ${JSON.stringify(event.data)}\n\n`);
}
// Then continue receiving new Agent output
agent.stdout.on('data', (chunk) => {
const eventId = sessionStore.nextId(sessionId);
res.write(
`id: ${eventId}\ndata: ${JSON.stringify({ text: chunk.toString() })}\n\n`,
);
});
});
Frontend Implementation
Native EventSource automatically handles Last-Event-ID with no extra code. If using fetch + ReadableStream, you need to implement it manually:
let lastEventId = null;
async function connectWithResume(sessionId) {
const headers = { 'Content-Type': 'application/json' };
if (lastEventId) {
headers['Last-Event-ID'] = lastEventId; // Manually include the breakpoint ID
}
const response = await fetch(`/api/agent/stream?sessionId=${sessionId}`, {
method: 'GET',
headers,
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split('\n\n');
buffer = events.pop();
for (const event of events) {
const idMatch = event.match(/^id:\s*(.+)$/m);
const dataMatch = event.match(/^data:\s*(.*)$/m);
if (idMatch) lastEventId = idMatch[1]; // Record the latest ID
if (dataMatch) {
const data = JSON.parse(dataMatch[1]);
appendToTerminal(data.text);
}
}
}
}
Note:
Last-Event-IDis only automatically carried inEventSource. Withfetch, you need to manually parse theidfield from the response and add it to the request header upon reconnection.
How the Frontend Receives Data
Method 1: EventSource (Native API)
Browsers natively provide the EventSource API, which is extremely simple to use:
const source = new EventSource('/api/agent/stream?id=123');
source.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.done) {
console.log('Agent execution finished');
source.close();
return;
}
// Append to terminal UI in real-time
appendToTerminal(data.text);
};
source.onerror = (err) => {
console.error('SSE connection error', err);
};
But EventSource has a fatal flaw: it only supports GET requests. AI Agents typically need to POST a prompt, which is awkward.
Method 2: fetch + ReadableStream (Recommended)
For SSE scenarios requiring POST requests, using fetch with ReadableStream is a more flexible approach:
async function runAgent(prompt) {
const response = await fetch('/api/agent/run', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Split SSE events by \n\n
const events = buffer.split('\n\n');
buffer = events.pop(); // The last segment might be incomplete, save for next time
for (const event of events) {
const match = event.match(/^data:\s*(.*)$/m);
if (!match) continue;
const data = JSON.parse(match[1]);
if (data.done) {
console.log('Agent execution finished');
return;
}
appendToTerminal(data.text);
}
}
}
Core idea:
fetchgets theresponse.body(aReadableStream)- Use
getReader()to read chunk by chunk - Use
TextDecoderto convert binary to text - Split by
\n\nto parse SSE events one by one
Method 3: Third-Party Libraries
If you don't want to parse the SSE format yourself, you can use an off-the-shelf library:
npm install @microsoft/fetch-event-source
import { fetchEventSource } from '@microsoft/fetch-event-source';
await fetchEventSource('/api/agent/run', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
onmessage(ev) {
const data = JSON.parse(ev.data);
if (data.done) return;
appendToTerminal(data.text);
},
onerror(err) {
console.error('SSE error', err);
},
onclose() {
console.log('Connection closed');
},
});
The advantage of this Microsoft library: it supports POST, automatic reconnection, and custom headers, making it much more useful than native EventSource.
Frontend Rendering Solutions
Once you have the streaming data, how do you render it? Claude Code's output contains ANSI escape sequences (colors, cursor movements, progress bars, etc.), not plain text, so the choice of rendering solution is critical.
Solution 1: xterm.js (Recommended)
xterm.js is the de facto standard for frontend terminal rendering; VS Code's built-in terminal uses it. It fully supports ANSI escape sequences and can reproduce Claude Code's terminal output 1:1.
npm install xterm @xterm/addon-fit @xterm/addon-web-links
import { Terminal } from 'xterm';
import { FitAddon } from '@xterm/addon-fit';
import { WebLinksAddon } from '@xterm/addon-web-links';
import 'xterm/css/xterm.css';
const term = new Terminal({
theme: {
background: '#1e1e1e',
foreground: '#d4d4d4',
cursor: '#d4d4d4',
},
fontSize: 14,
fontFamily: "'Fira Code', 'Menlo', monospace",
cursorBlink: true,
scrollback: 10000,
});
const fitAddon = new FitAddon();
term.loadAddon(fitAddon);
term.loadAddon(new WebLinksAddon()); // Auto-detect links
term.open(document.getElementById('terminal'));
fitAddon.fit();
// Receive SSE data, write directly to terminal
function onSSEMessage(data) {
if (data.done) {
term.writeln('\r\n\x1b[32m✓ Agent execution finished\x1b[0m');
return;
}
// xterm.js natively supports ANSI escape sequences, just write directly
term.write(data.text);
}
Advantages:
- Full support for ANSI colors, cursor movement, screen clearing, etc.
- Supports text selection, copy/paste
- Excellent performance, no lag with large output
- Active community, rich plugins
Solution 2: ansi_up (Lightweight)
If you don't want to introduce a full terminal emulator, ansi_up can convert ANSI escape sequences to HTML for rendering with regular DOM elements:
npm install ansi_up
import AnsiUp from 'ansi_up';
const ansiUp = new AnsiUp();
const output = document.getElementById('output');
function appendToTerminal(text) {
const html = ansiUp.ansi_to_html(text);
output.innerHTML += html;
output.scrollTop = output.scrollHeight;
}
Advantages: Lightweight (~10KB), suitable for simple scenarios needing only color rendering. Disadvantages: Does not support complex ANSI operations like cursor movement or progress bar overwriting.
Solution Comparison
| Feature | xterm.js | ansi_up | Plain <pre> |
|---|---|---|---|
| ANSI Colors | ✅ | ✅ | ❌ |
| Cursor Movement / Clear Screen | ✅ | ❌ | ❌ |
| Progress Bar Overwrite Refresh | ✅ | ❌ | ❌ |
| Text Selection/Copy | ✅ | ✅ | ✅ |
| Bundle Size | ~200KB | ~10KB | 0 |
| Use Case | Full terminal experience | Simple colored output | Plain text |
Conclusion: For rendering Claude Code output, xterm.js is the first choice. Its ANSI compatibility is the best and can fully reproduce the terminal experience. If you're just displaying simple colored logs, ansi_up is sufficient.
Streaming Markdown Rendering (For Conversational AI Output)
If the frontend isn't for a pure terminal display but a conversational UI like ChatGPT, you need a renderer that can handle streaming Markdown. Besides terminal streams, Claude Code's output often contains Markdown-formatted content (code blocks, file diffs, lists, etc.).
Traditional react-markdown cannot handle unclosed code blocks, incomplete tables, and other issues in AI streaming output—and Streamdown (by Vercel, 5k+ stars) is specifically designed to solve this pain point:
npm install streamdown @streamdown/code @streamdown/math @streamdown/mermaid @streamdown/cjk
import { useChat } from '@ai-sdk/react';
import { Streamdown } from 'streamdown';
import { code } from '@streamdown/code';
import { mermaid } from '@streamdown/mermaid';
import { math } from '@streamdown/math';
import { cjk } from '@streamdown/cjk';
import 'katex/dist/katex.min.css';
import 'streamdown/styles.css';
export default function Chat() {
const { messages, status } = useChat();
return (
<div>
{messages.map((message) => (
<div key={message.id}>
{message.role === 'user' ? 'User: ' : 'AI: '}
{message.parts.map((part, index) =>
part.type === 'text' ? (
<Streamdown
key={index}
plugins={{ code, mermaid, math, cjk }}
isAnimating={status === 'streaming'}>
{part.text}
</Streamdown>
) : null,
)}
</div>
))}
</div>
);
}
Streamdown's core advantages:
- Designed specifically for AI streaming output, gracefully handles unclosed Markdown blocks
- Built-in Shiki code highlighting, KaTeX math formulas, Mermaid diagrams
- Supports CJK typography optimization
- Plugin-based architecture, import on demand, tree-shakeable
- Built-in security hardening (rehype-harden) to prevent XSS
Code Highlighting Library Comparison (Streamdown has Shiki built-in; the following is for reference in standalone usage scenarios):
| Library | Bundle Size | Language Support | Features |
|---|---|---|---|
| Shiki | ~500KB | 100+ | VS Code equivalent syntax highlighting, best results |
| Prism.js | ~20KB | 270+ | Lightweight, rich plugins |
Common UI Component Summary
| Scenario | Recommended Solution | Representative Project |
|---|---|---|
| Full Terminal Experience | xterm.js (14k stars) | VS Code Terminal, Claude Code Web |
| Simple Colored Logs | ansi_up | Lightweight log panel |
| Conversational AI Output | Streamdown (5k+ stars) | ChatGPT, Claude Web |
| File Diff Display | react-diff-viewer | GitHub PR, Code Review tools |
| Mind Map | react-markmap | Real-time mind map preview |
Keeping Long Connections Alive: Heartbeat Keep-Alive
SSE is essentially an HTTP long connection. In theory, it stays open as long as neither side actively closes it. But in real-world production, proxy layers (Nginx, CDN, load balancers) usually have an idle timeout mechanism—if no data is transmitted for a period, they will actively disconnect.
For example, Nginx's default proxy_read_timeout is 60 seconds, meaning if no data flows through for 60 seconds, the connection gets cut. AI Agents sometimes think for a long time (e.g., Claude Code analyzing a large file), during which there might be no output for tens of seconds, putting the connection at risk.
Solution: Server-Side Heartbeat
The server periodically sends an SSE comment (a line starting with :), telling the proxy layer "the connection is still alive." The SSE protocol specifies that lines starting with : are comments and are automatically ignored by the client:
app.post('/api/agent/run', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Heartbeat: send a comment every 15 seconds to prevent proxy layer timeout disconnection
const heartbeat = setInterval(() => {
res.write(': heartbeat\n\n');
}, 15000);
const agent = spawn('claude', ['--print', req.body.prompt]);
agent.stdout.on('data', (chunk) => {
res.write(`data: ${JSON.stringify({ text: chunk.toString() })}\n\n`);
});
agent.on('close', () => {
res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
clearInterval(heartbeat); // Clean up heartbeat
res.end();
});
// Also clean up when the client disconnects
req.on('close', () => {
clearInterval(heartbeat);
agent.kill();
});
});
Key points:
- Recommended heartbeat interval: 15-30 seconds. Too short wastes bandwidth; too long might miss the proxy timeout.
- Use the
: heartbeat\n\nformat; the client ignores it automatically, not affecting business logic. - When the Agent finishes or the client disconnects, must clean up
setInterval, otherwise memory leaks.
No Need to Poll and Re-establish Connection
Some might ask: Do we need to re-send a request every minute with a since parameter?
No. SSE itself is a persistent connection. As long as the heartbeat keep-alive is done correctly, the connection will remain open. Only when the connection actually breaks (network failure, proxy timeout, etc.) do you need to reconnect—at that point, use Last-Event-ID to resume from the breakpoint, no need to start from scratch.
Pitfalls Encountered
1. Nginx Proxy Buffering
SSE's biggest enemy is proxy layer buffering. Nginx buffers responses by default, causing the client not to receive real-time data. Solution:
location /api/agent/ {
proxy_pass http://backend;
proxy_buffering off; # Disable buffering
proxy_cache off; # Disable caching
proxy_read_timeout 300s; # Long connection timeout
chunked_transfer_encoding on;
}
2. Connection Limit
Under HTTP/1.1, browsers have a limit on concurrent connections to the same domain (usually 6). If multiple Agent sessions are opened simultaneously, connections might be maxed out. Solutions:
- Upgrade to HTTP/2 (multiplexing, not subject to this limit)
- Or use WebSocket instead
3. Large Output Memory Issues
AI Agents sometimes output large amounts of content (e.g., reading an entire file). If the frontend keeps appending to the DOM, the page will get increasingly laggy. Suggestions:
- Limit the maximum number of lines, truncate the head when exceeded
- Use virtual scrolling to only render the visible area
Summary
| Scenario | Recommended Solution |
|---|---|
| SSE requiring only GET | Native EventSource |
| SSE requiring POST (e.g., AI Agent) | fetch + ReadableStream or @microsoft/fetch-event-source |
| Need bidirectional communication | WebSocket |
| Need full terminal experience | xterm.js |
| Conversational AI output (streaming Markdown) | Streamdown |
The Stream + SSE combo works excellently in AI Agent scenarios: lightweight, one-way, auto-reconnect, good compatibility. Compared to WebSocket's complex handshake and bidirectional communication, SSE is tailor-made for the "server pushes data" scenario.