跪拜 Guibai
← All articles
Frontend

WebSocket vs. SSE: A No-Library Guide to Streaming Protocols for AI Apps

By 海尔智慧家技术平台 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Every AI app that streams tokens, every real-time dashboard, and every live notification system depends on one of these two protocols. Knowing how to implement them from scratch — not just wire up a library — gives developers the ability to debug, optimize, and customize streaming for production workloads.

Summary

As AI chatbots, real-time data dashboards, and collaborative tools become standard, streaming data is no longer optional. This guide strips away third-party libraries to show exactly how WebSocket and SSE work under the hood, using nothing but Node.js's built-in http module.

For WebSocket, the walkthrough covers the HTTP upgrade handshake, the binary frame structure (FIN, opcode, mask, payload length), and the critical masking step that decrypts client data. The code handles the 101 Switching Protocols response, parses frames bit by bit, and sends back text frames — all without ws or Socket.IO.

For SSE, the implementation is simpler: set the Content-Type to text/event-stream, push data lines with the 'data:' prefix, and let the browser's EventSource API handle reconnection automatically. The guide also covers Last-Event-ID for resumable streams and the retry field for custom reconnect intervals.

The comparison table makes the tradeoffs clear: WebSocket for full-duplex chat or gaming, SSE for one-way notifications or log feeds where simplicity and auto-reconnect matter more.

Takeaways
WebSocket requires an HTTP upgrade handshake with a Sec-WebSocket-Key and a fixed magic string (258EAFA5-E914-47DA-95CA-C5AB0DC85B11) to generate the Sec-WebSocket-Accept response.
WebSocket frames have a 2-byte minimum header: FIN bit, opcode (0x1 for text), MASK bit, and 7-bit payload length. Client-to-server frames must be masked; server-to-client frames are not.
Masking uses a 4-byte XOR key. Without it, client data arrives as garbled text.
SSE uses standard HTTP with Content-Type: text/event-stream and Cache-Control: no-cache. Each message is a 'data:' line followed by two newlines.
SSE's EventSource API automatically reconnects on drop. The server can use the Last-Event-ID header and 'id:' fields to resume from the last delivered message.
WebSocket supports binary and text frames; SSE is text-only.
SSE has no built-in bidirectional communication — the client must use separate HTTP requests to send data back.
Conclusions

Implementing WebSocket from scratch reveals just how much complexity libraries like ws abstract away — especially the bit-level frame parsing and XOR masking that are easy to get wrong.

The fact that SSE reconnection is built into the browser (EventSource) while WebSocket requires manual heartbeat logic is a strong argument for choosing SSE when the use case is one-way.

Many production 'WebSocket' services could actually be SSE services, saving development and debugging effort without sacrificing real-time feel.

The masking requirement for client-to-server WebSocket frames exists to prevent cache poisoning attacks on intermediaries — a security detail that's invisible when using a library.

Node's http module is surprisingly capable for both protocols; the main gap is that it doesn't handle WebSocket fragmentation (multi-frame messages) or extended payload lengths (over 125 bytes) out of the box.

Concepts & terms
WebSocket Frame
The smallest unit of data in a WebSocket connection. It includes a header with control bits (FIN, opcode, MASK) and a payload. A single message can span multiple frames.
Masking Key
A 4-byte random value that the client XORs with the payload data before sending. The server must XOR again to decrypt. This prevents intermediaries from injecting data into the stream.
SSE (Server-Sent Events)
A standard that lets a server push text data to a browser over a single HTTP connection. The browser's EventSource API handles reconnection automatically.
Last-Event-ID
An HTTP header sent by the client on reconnection, telling the server the ID of the last received event. The server can then resume the stream from that point.
101 Switching Protocols
The HTTP status code the server returns to confirm a WebSocket upgrade. After this, the connection switches from HTTP to the WebSocket protocol.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗