Streaming AI Agent Terminal Output to the Browser with SSE
As coding agents move from local CLIs into collaborative web tools, the ability to stream their raw terminal output—ANSI escapes and all—becomes a baseline UX requirement. SSE is the lighter, HTTP-native alternative to WebSocket for this one-way push, and pairing it with xterm.js avoids the fragile work of parsing and re-rendering terminal control sequences.
A practical architecture for piping AI agent output into a browser uses SSE (Server-Sent Events) over a single HTTP long-connection, avoiding the overhead of WebSocket for one-way streams. The backend spawns a CLI agent like Claude Code, wraps each stdout chunk as an SSE event, and pushes it to the frontend. On the client, xterm.js renders the full ANSI escape sequences—colors, cursor movements, progress bars—exactly as they appear in a native terminal.
SSE's built-in `Last-Event-ID` mechanism handles disconnects by resuming from the last received event, and a server-side heartbeat (a comment line every 15–30 seconds) prevents proxy timeouts from Nginx or CDNs from killing idle connections. For conversational UIs that output streaming Markdown instead of raw terminal escapes, Streamdown handles unclosed code blocks and incomplete tables that break traditional Markdown parsers.
The piece also catalogs the common production pitfalls: Nginx proxy buffering must be disabled, HTTP/1.1's six-connection limit can bottleneck multiple agent sessions, and large outputs demand virtual scrolling or line truncation to avoid DOM bloat.
SSE's simplicity is its strongest asset: three headers, a `data:` prefix, and double-newline delimiters. That minimal surface area means fewer bugs and easier debugging compared to WebSocket frames and handshakes.
The `EventSource` API's GET-only limitation is a real constraint for agent workflows that need to POST prompts, which is why `fetch`-based SSE parsing has become the practical default despite requiring manual buffer management.
Choosing xterm.js over a lightweight ANSI-to-HTML converter is not about preference—it's about correctness. CLI agents use cursor manipulation and overwrite sequences that ansi_up simply discards, producing garbled output.
Streamdown's existence as a Vercel-backed project signals that streaming Markdown rendering is now a distinct problem space, separate from static Markdown, driven entirely by the proliferation of LLM-generated text in UIs.
The connection-limit problem under HTTP/1.1 is a quiet architectural constraint that will push multi-agent web apps toward HTTP/2 or WebSocket, not because SSE is wrong, but because browser connection pooling forces the issue.