跪拜 Guibai
← All articles
Frontend · JavaScript · Artificial Intelligence

Streaming AI Agent Terminal Output to the Browser with SSE

By anOnion ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

As coding agents move from local CLIs into collaborative web tools, the ability to stream their raw terminal output—ANSI escapes and all—becomes a baseline UX requirement. SSE is the lighter, HTTP-native alternative to WebSocket for this one-way push, and pairing it with xterm.js avoids the fragile work of parsing and re-rendering terminal control sequences.

Summary

A practical architecture for piping AI agent output into a browser uses SSE (Server-Sent Events) over a single HTTP long-connection, avoiding the overhead of WebSocket for one-way streams. The backend spawns a CLI agent like Claude Code, wraps each stdout chunk as an SSE event, and pushes it to the frontend. On the client, xterm.js renders the full ANSI escape sequences—colors, cursor movements, progress bars—exactly as they appear in a native terminal.

SSE's built-in `Last-Event-ID` mechanism handles disconnects by resuming from the last received event, and a server-side heartbeat (a comment line every 15–30 seconds) prevents proxy timeouts from Nginx or CDNs from killing idle connections. For conversational UIs that output streaming Markdown instead of raw terminal escapes, Streamdown handles unclosed code blocks and incomplete tables that break traditional Markdown parsers.

The piece also catalogs the common production pitfalls: Nginx proxy buffering must be disabled, HTTP/1.1's six-connection limit can bottleneck multiple agent sessions, and large outputs demand virtual scrolling or line truncation to avoid DOM bloat.

Takeaways
SSE is a one-way, HTTP-based push protocol that needs only three response headers and a `data:` prefix per event, making it simpler than WebSocket for server-to-client streams.
Native `EventSource` supports only GET requests; for POST-based agent triggers, use `fetch` with a `ReadableStream` or Microsoft's `fetch-event-source` library.
xterm.js is the only frontend renderer that fully handles ANSI escape sequences—cursor movement, progress bars, and screen clearing—making it the correct choice for CLI agent output.
SSE's `Last-Event-ID` header enables automatic resumption after disconnects without replaying the entire stream from the beginning.
A server-side heartbeat sending SSE comment lines (`: heartbeat\n\n`) every 15–30 seconds keeps the connection alive through Nginx and CDN idle timeouts.
Nginx must have `proxy_buffering off`, `proxy_cache off`, and an extended `proxy_read_timeout` for SSE streams to deliver chunks in real time.
HTTP/1.1 browsers cap concurrent connections to a domain at six, which can become a bottleneck when running multiple agent sessions simultaneously.
Streamdown handles streaming Markdown with unclosed code blocks and incomplete tables, problems that break standard `react-markdown` in conversational AI UIs.
Large agent outputs require virtual scrolling or line-count truncation to prevent DOM memory pressure on the frontend.
Conclusions

SSE's simplicity is its strongest asset: three headers, a `data:` prefix, and double-newline delimiters. That minimal surface area means fewer bugs and easier debugging compared to WebSocket frames and handshakes.

The `EventSource` API's GET-only limitation is a real constraint for agent workflows that need to POST prompts, which is why `fetch`-based SSE parsing has become the practical default despite requiring manual buffer management.

Choosing xterm.js over a lightweight ANSI-to-HTML converter is not about preference—it's about correctness. CLI agents use cursor manipulation and overwrite sequences that ansi_up simply discards, producing garbled output.

Streamdown's existence as a Vercel-backed project signals that streaming Markdown rendering is now a distinct problem space, separate from static Markdown, driven entirely by the proliferation of LLM-generated text in UIs.

The connection-limit problem under HTTP/1.1 is a quiet architectural constraint that will push multi-agent web apps toward HTTP/2 or WebSocket, not because SSE is wrong, but because browser connection pooling forces the issue.

Concepts & terms
SSE (Server-Sent Events)
An HTTP-based protocol for one-way server-to-client streaming. The server sets `Content-Type: text/event-stream` and writes events as `data: <payload>\n\n`. Clients receive chunks in real time, and the protocol includes built-in auto-reconnection via `Last-Event-ID`.
ANSI escape sequences
In-band control codes embedded in terminal output that handle colors, cursor positioning, screen clearing, and progress bars. CLI tools like Claude Code emit these sequences; rendering them correctly requires a full terminal emulator like xterm.js, not a simple ANSI-to-HTML converter.
Last-Event-ID
An SSE mechanism for resumable streams. The server assigns an `id` to each event; if the connection drops, the client automatically reconnects and sends the last received `id` in the `Last-Event-ID` HTTP header, allowing the server to resume from that point.
Streamdown
A Vercel-built React component for rendering streaming Markdown from LLMs. Unlike `react-markdown`, it gracefully handles partially received code blocks, incomplete tables, and unterminated syntax that occur mid-stream.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗