WebSocket vs. SSE: A No-Library Guide to Streaming Protocols for AI Apps
Preface
With the explosive development of AI technology, scenarios such as large model interaction, real-time data analysis, and AI-driven collaborative tools are becoming increasingly common, and the application of streaming transmission technology is also becoming more widespread. Compared to the traditional "request-response" complete data return model, streaming transmission enables segmented, real-time data push, significantly reducing interaction latency and improving user experience—for example, the word-by-word replies of AI chatbots, real-time subtitle synchronization for voice transcription, and real-time alert pushes for smart monitoring all rely on streaming transmission.
WebSocket and SSE, as two mainstream streaming transmission implementation solutions, respectively cater to the core needs of full-duplex and one-way push, and both are built on the HTTP ecosystem, offering good compatibility and implementability. Understanding the underlying principles and native implementation logic of these two protocols is fundamental for developers to efficiently build AI streaming applications and solve real-time interaction scenarios. This article will completely abandon reliance on third-party libraries, focusing on the native implementation of the Node.js http module, and comprehensively deconstruct from protocol principles, frame structure analysis, code implementation to documentation basis, providing a core reference for technical development and solution design.
1. WebSocket Protocol: Full-Duplex Communication Implementation
1.1 Core Protocol Principles (Simplified Explanation)
The core value of WebSocket is to break the shackles of HTTP's "one question, one answer" model, establishing a persistent "two-way communication pipeline" between the client and server, suitable for real-time scenarios. We understand it through three key steps:
- Handshake Upgrade: "Switching Channels" from HTTP When a client wants to establish a WebSocket connection with the server, it first sends a special HTTP request, essentially saying "I want to switch to the WebSocket channel." The request header must carry two key pieces of information:
Upgrade: websocket(declaring the intention to upgrade the protocol) andConnection: Upgrade(declaring the intention to keep the connection alive). If the server agrees, it returns a 101 status code (indicating "protocol switch successful"), and from then on, both parties no longer communicate using HTTP rules but instead use WebSocket rules. - Frame Format Communication: Efficient "Package Delivery" After the switch, data transmission between the two parties no longer carries cumbersome HTTP headers. Instead, data is packaged into "frames" (similar to express packages). Each frame has clear identifiers: opcode (telling the other party whether this is text, binary, close connection, etc.), mask (data sent by the client must be masked to prevent tampering), data length, and content. This method greatly reduces transmission overhead and is suitable for high-frequency real-time data. Document Source: The core definition of the frame structure originates from RFC 6455 Chapter 5 (Data Framing), which details the field composition, bit meanings, and transmission rules of the frame, serving as the authoritative basis for frame principles.
- Keeping the Connection Alive: Heartbeat "Keep-Alive" TCP connections that don't send data for a long time may be disconnected by firewalls. WebSocket uses a "Ping/Pong" heartbeat mechanism to keep the connection alive: the server sends a Ping frame to the client, and the client must reply with a Pong frame to prove the connection is normal, preventing forced disconnection.
WebSocket is an application-layer protocol that provides full-duplex (bidirectional simultaneous communication) over a single TCP connection, aiming to solve the unidirectionality and short-connection problems of the HTTP "request-response" model. It is suitable for real-time chat, real-time collaboration, and other scenarios. Its core mechanisms include:
- Handshake Upgrade: The client initiates a protocol upgrade via an HTTP request, with request headers including
Upgrade: websocket,Connection: Upgrade, etc. The server responds with a 101 Switching Protocols status code, completing the protocol switch from HTTP to WebSocket. - Frame Format Communication: After the handshake succeeds, both parties transmit data in units of WebSocket frames. The frame contains an opcode (text/binary/close, etc.), mask (data sent from the client to the server must be masked), data length, and payload content, without the need to repeatedly carry HTTP headers, reducing overhead. Supplementary Note: A frame is the smallest communication unit of WebSocket. A message can consist of a single frame or multiple frames (fragmented transmission). The frame structure strictly follows the RFC 6455 specification. The specific diagram and field meanings are as follows:
- Keeping the Connection Alive: Heartbeat detection is achieved through Ping/Pong frames to prevent the TCP connection from being disconnected by intermediate devices (such as firewalls), ensuring communication stability.
WebSocket Frame Structure Diagram (Corresponding to RFC 6455 Standard)
The frame is divided into two parts: the "frame header" (at least 2 bytes) and the "payload data" (actual transmitted content). The fields are arranged bit by bit, corresponding to the frame parsing logic in the code. The diagram is as follows (text description adapted for technical document embedding, can be directly converted into a visual chart):
Standard Frame Structure (Byte-level Breakdown):
Byte 1: 1 bit (FIN) + 3 bits (RSV1-RSV3) + 4 bits (opcode)
Byte 2: 1 bit (MASK) + 7 bits (Payload length)
Optional fields: 4 bytes (Masking-key, only present when data is sent by the client) + Payload data
Field Meanings (Corresponding to Code Parsing Logic):
- FIN (1 bit): Identifies whether this is the last frame of a message. 1 indicates a complete message, 0 indicates a fragmented frame. Corresponds to code
const fin = (buffer[0] & 0x80) === 0x80(extracting the value of the 1st bit via bitwise operation). - RSV1-RSV3 (1 bit each): Reserved fields, default 0, only used when extending the protocol. Not processed in the code for now.
- opcode (4 bits): Frame type identifier. Core values: 0x01 (text frame), 0x02 (binary frame), 0x08 (close frame), 0x09 (Ping frame), 0x0A (Pong frame). Corresponds to code
const opcode = buffer[0] & 0x0F(extracting the value of the lower 4 bits). - MASK (1 bit): Identifies whether the payload data is masked. Frames sent from the client to the server must have this set to 1 (mandatory encryption), while frames sent from the server to the client have it set to 0. Corresponds to code
const hasMask = (buffer[1] & 0x80) === 0x80. - Payload length (7 bits): The length of the payload data. There are three cases: 0-125 directly represents the length; 126 indicates that the next 2 bytes are the length; 127 indicates that the next 8 bytes are the length (the code only handles short data of 0-125). Corresponds to code
let payloadLen = buffer[1] & 0x7F(extracting the value of the lower 7 bits). - Masking-key (4 bytes): Only exists when MASK=1, used to decrypt the payload data. In the code, decryption is done via XOR operation (the previous garbled text issue was due to not handling this step).
Reference Diagram Source: In addition to the original diagram in RFC 6455, you can refer to the visualization diagram of MDN WebSocket data frame format for an easier understanding of the field relationships.
1.2 Implementing WebSocket with Node.js http Module
The native Node.js http module can directly capture protocol upgrade requests and complete WebSocket handshake, frame parsing, and data transmission through custom logic. This is the core way to understand the WebSocket protocol. The following native implementation breaks down the code logic corresponding to each principle, without relying on any third-party libraries, getting straight to the essence of the protocol.
1.3 Native Implementation Breakdown (Principles Corresponding to Code)
The following native code fully implements the core process of handshake upgrade and text frame sending/receiving. Each step corresponds to the WebSocket principle, while also noting the principle details that need to be supplemented in a production environment (such as masking, multi-frame handling), helping you thoroughly understand the underlying logic of the protocol.
Combined with the above frame structure principles, the following code adds mask decryption logic (solving the garbled text issue). Each parsing step corresponds to a frame field, and RFC specification references are noted, achieving a deep binding of principles and code:
const http = require('http');
const crypto = require('crypto');
// Create an HTTP server (WebSocket is based on HTTP handshake, so an HTTP service must be started first)
const server = http.createServer((req, res) => {
res.writeHead(200);
res.end('Non-WebSocket request');
});
// Listen for the 'upgrade' event: corresponds to the [Handshake Upgrade] principle, capturing the client's upgrade request
// This event is triggered when the client sends an HTTP request with Upgrade: websocket
server.on('upgrade', (req, socket, head) => {
// 1. Verify the legitimacy of the upgrade request (Principle: Ensure it is a WebSocket protocol upgrade request)
if (req.headers.upgrade !== 'websocket') {
socket.write('HTTP/1.1 400 Bad Request\r\n\r\n');
socket.destroy();
return;
}
// 2. Generate the handshake response identifier (Principle: Protocol-mandated identity verification mechanism to prevent unauthorized connections)
const secWebSocketKey = req.headers['sec-websocket-key']; // Client's random key
const magicString = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'; // Protocol's fixed string
const hash = crypto.createHash('sha1')
.update(secWebSocketKey + magicString) // Concatenate the key with the fixed string
.digest('base64'); // Generate the response identifier, sent back to the client for verification
// 3. Send the 101 response to complete the handshake upgrade (Principle: HTTP protocol switches to WebSocket protocol)
const responseHeaders = [
'HTTP/1.1 101 Switching Protocols',
'Upgrade: websocket', // Confirm upgrade to WebSocket
'Connection: Upgrade', // Confirm keeping the long connection
`Sec-WebSocket-Accept: ${hash}`, // Send back the verification identifier; the connection is established if the client verifies it successfully
'\r\n'
];
socket.write(responseHeaders.join('\r\n'));
// 4. Listen for socket data and parse WebSocket frames (corresponds to the [Frame Format Communication] principle)
// After the handshake succeeds, data sent by the client is transmitted in units of frames, requiring manual parsing of the frame structure
socket.on('data', (buffer) => {
const fin = (buffer[0] & 0x80) === 0x80;
const opcode = buffer[0] & 0x0F;
const hasMask = (buffer[1] & 0x80) === 0x80;
let payloadLen = buffer[1] & 0x7F;
let payloadStart = 2; // Default data start position (after the frame header)
let maskKey = [];
// Step 1: Extract the masking key (client data always carries a mask)
if (hasMask) {
maskKey = buffer.slice(payloadStart, payloadStart + 4);
payloadStart += 4; // Move the data start position back by 4 bytes (skip the masking key)
}
// Step 2: Decrypt the data (XOR operation)
const payloadBuffer = buffer.slice(payloadStart, payloadStart + payloadLen);
const decryptedPayload = [];
for (let i = 0; i < payloadBuffer.length; i++) {
decryptedPayload.push(payloadBuffer[i] ^ maskKey[i % 4]); // XOR decryption
}
const payload = Buffer.from(decryptedPayload).toString('utf8');
// Only process complete text frames
if (opcode === 1 && fin) {
console.log('Received:', payload);
// Build a response frame to send back (server-sent data does not require a mask)
const responseBuffer = Buffer.alloc(2 + payload.length);
responseBuffer[0] = 0x81;
responseBuffer[1] = payload.length;
responseBuffer.write(payload, 2);
socket.write(responseBuffer);
}
});
// Connection close and error handling to avoid resource leaks
socket.on('close', () => {
console.log('WebSocket connection closed');
});
socket.on('error', (err) => {
console.error('WebSocket error:', err);
});
});
server.listen(8080, () => {
console.log('WebSocket server running on ws://localhost:8080');
});
Client test (browser console):
const ws = new WebSocket('ws://localhost:8080');
ws.onopen = () => console.log('Connected');
ws.send('Hello WebSocket');
ws.onmessage = (e) => console.log('Received:', e.data); // Receive server response
1.4 WebSocket Documentation and Protocol Standards
- Protocol Specification: RFC 6455 - The WebSocket Protocol WebSockets Standard (IETF standard, core chapters: Chapter 4 Handshake Process, Chapter 5 Frame Structure and Transmission Rules, the most authoritative basis for frame principles and implementation).
- MDN Guide: WebSocket API - Web API | MDN
2. SSE Protocol: Server-to-Client One-Way Push
2.1 Core Protocol Principles (Simplified Explanation)
SSE is a lightweight communication method where "the server sends data one-way, and the client only receives it." It's like the server opens a "real-time broadcast channel" for the client, suitable for scenarios that don't require client feedback (such as notifications, market data). The core logic is simpler than WebSocket, based on HTTP long connections:
- One-Way Communication: A "Read-Only Pipeline" The client only sends a GET request once. After receiving it, the server does not close the connection but continuously pushes data to the client through this long connection. The client cannot send data back through this connection; if feedback is needed, it must send another HTTP request.
- Fixed Data Format: The "Rules for the Server to Send Messages" The data pushed by the server must meet two requirements: the response header must be set to
text/event-stream(telling the client this is an SSE stream), and each message format must be "data:content\n\n" (double newline ending marks the end of a message). It also supports custom event names and message IDs. - Automatic Reconnection: Client "Self-Healing After Disconnection" If the connection is unexpectedly disconnected (e.g., server restart), the client's EventSource API will automatically retry the connection (default interval is 3 seconds). It can also use the "message ID" to record the last received message, allowing the server to continue pushing from the point of disconnection after reconnection, achieving resumable transmission.
- Lightweight, No Upgrade: Based on Native HTTP Stack No protocol switch like WebSocket is needed. It fully reuses the HTTP mechanism, making it simple to implement and low overhead, suitable for one-way push scenarios that don't require extreme performance but need rapid implementation.
Server-Sent Events (SSE) is a one-way communication protocol based on HTTP, only supporting data push from the server to the client. It is suitable for real-time notifications, market data updates, and other scenarios that don't require client feedback. Its core features:
- One-Way Communication: Based on an HTTP long connection, the client initiates a single GET request, and the server keeps the connection open to continuously push data. The client cannot send data to the server (for bidirectional communication, it can be supplemented with HTTP requests).
- Data Format: The data pushed by the server must be of type
text/event-stream. Each message starts withdata:, ends with\n\n, and supports extended fields such as event name, ID, and retry time. - Automatic Reconnection: The client (EventSource API) will automatically reconnect after a connection drop (default interval is 3 seconds). The
retry:field can be used to customize the reconnection interval. - Lightweight: No protocol upgrade is required. It is based on the existing HTTP stack, making it simple to implement and lower overhead than WebSocket.
2.2 Implementing SSE with Node.js http Module
SSE does not require third-party libraries and can be directly implemented using the Node.js http module. The core is to set the correct response headers and continuously push formatted data.
const http = require('http');
const server = http.createServer((req, res) => {
// Only handle requests to the /sse path as the SSE connection entry point
if (req.url === '/sse') {
// Step 1: Set the core SSE response headers (corresponds to the "Fixed Data Format" principle)
res.writeHead(200, {
'Content-Type': 'text/event-stream', // Must be set to this type for the client to recognize it as SSE
'Cache-Control': 'no-cache', // Disable caching to prevent the client from receiving old data repeatedly
'Connection': 'keep-alive', // Keep the HTTP long connection, do not close immediately
'Access-Control-Allow-Origin': '*' // Cross-origin support (restrict domains as needed in actual projects)
});
// Step 2: Handle resumable transmission (corresponds to the "Automatic Reconnection" principle)
// When the client reconnects, it carries the Last-Event-ID header, recording the ID of the last received message
const lastEventId = req.headers['last-event-id'] || '0';
console.log('Last Event ID:', lastEventId);
let eventId = parseInt(lastEventId) + 1; // Continue generating message IDs from the point of disconnection
// Step 3: Push messages at regular intervals (simulating real-time data, embodying "one-way continuous push")
const interval = setInterval(() => {
const data = {
time: new Date().toISOString(),
content: `SSE message #${eventId}`
};
// Build the SSE message format: id (optional) + data (required) + double newline ending
const message = [
`id: ${eventId}`, // Message ID, used for resumable transmission
`data: ${JSON.stringify(data)}`, // Message content, must start with data:
'\n' // Blank line + double newline, marks the end of a message
].join('\n');
res.write(message); // Push the message to the client
eventId++;
// Simulate connection closure (optional, can be based on business logic in actual scenarios)
if (eventId > 10) {
clearInterval(interval);
res.write('event: close\ndata: Connection closed\n\n'); // Custom close event
res.end();
}
}, 1000);
// Step 4: Clean up resources when the client disconnects (to avoid memory leaks)
req.on('close', () => {
clearInterval(interval);
res.end();
console.log('SSE connection closed');
});
} else {
// Non-SSE request, return a test page (including client-side EventSource logic)
res.writeHead(200, { 'Content-Type': 'text/html' });
res.end(`
<!DOCTYPE html>
<html>
<body>
<div id="messages"></div>
<script>
const eventSource = new EventSource('/sse');
eventSource.onmessage = (e) => {
document.getElementById('messages').innerHTML += '<p>' + e.data + '</p>';
};
eventSource.addEventListener('close', (e) => {
document.getElementById('messages').innerHTML += '<p>Connection closed by server</p>';
eventSource.close();
});
</script>
</body>
</html>
`);
}
});
server.listen(8081, () => {
console.log('SSE server running on http://localhost:8081');
});
Test method: Visit http://localhost:8081, and you can see a message pushed by the server every second. The connection will automatically close after 10 messages.
2.3 SSE Documentation and Protocol Standards
- Protocol Specification: RFC 8895 - Server-Sent Events (IETF standard replacing the old HTML5 SSE draft, defining message format and reconnection mechanism).
- MDN Guide: MDN Server-Sent Events (Detailed explanation of the client-side EventSource API and message format).
3. WebSocket vs. SSE: Comparison and Applicable Scenarios
| Feature | WebSocket | SSE |
|---|---|---|
| Communication Direction | Full-duplex (bidirectional) | One-way (server → client) |
| Protocol Basis | HTTP handshake upgrade to independent protocol | HTTP long connection, no protocol upgrade |
| Reconnection Mechanism | Requires manual implementation (e.g., heartbeat detection) | Client EventSource automatic reconnection |
| Data Format | Binary/text frames, flexible and efficient | Text only (text/event-stream) |
| Applicable Scenarios | Real-time chat, collaborative editing, gaming | Real-time notifications, market data push, log streams |
4. Notes
- WebSocket Cross-Origin: The
Originrequest header must be handled during the handshake, or cross-origin can be configured via Nginx reverse proxy. - SSE Caching Issue:
Cache-Control: no-cachemust be set, otherwise the client may cache the pushed data. - Production Environment Optimization: WebSocket needs to handle concurrent connections (the
wslibrary supports cluster deployment), and SSE needs to limit the duration of a single connection to avoid resource leaks. - Compatibility: WebSocket is supported by all modern browsers. SSE is not supported in IE (can be polyfilled with an EventSource polyfill).
Team Introduction
"Smart Home Technology Platform - Application Software Framework Development" is primarily responsible for the research and development of design tools, including marketing design tools, home appliance VR design and display, water, electricity, HVAC, and pre-design capabilities. It researches and develops material libraries, builds home furnishing material libraries, integrates unit libraries, full-category product libraries, design plan libraries, and production process models, creating AI design capabilities based on unit types and styles for rapid generation of quantity takeoffs and quotations. It also develops the store designer center and project center, including designer management capabilities and project manager management capabilities. It achieves full lifecycle management of scenarios and provides business opportunity management tools for industries such as water, air, and kitchen, thereby realizing a B-end to C-end full-process system centered on scenarios.