跪拜 Guibai
← All articles
Architecture · Frontend · Backend

From Single Server to Multi-Server: A Full-Stack Developer's Production Architecture Upgrade

By Ausra无忧 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

This is a real-world, battle-tested blueprint for scaling a Node.js application from a single server to a production-grade, multi-server architecture. It demonstrates a pragmatic, cloud-native approach that any developer facing similar growth pains can directly apply, using managed services to eliminate single points of failure and simplify operations.

Summary

Faced with daily active users growing from hundreds to over 5,000, a single 4-core 16GB server running a Node.js + Nest.js + MySQL + Redis stack began to buckle. API response times ballooned from 200ms to over 2 seconds, memory warnings became routine, and WebSocket connections for the online education platform were dropping, leading to a poor user experience.

The developer's response was a full architecture overhaul on Volcengine, centered on the principle of separating compute from storage. The new setup deploys two cloud servers across different availability zones (A and B) behind a public load balancer. MySQL and Redis were moved off the application servers entirely and into managed, highly available cloud services.

The MySQL setup is particularly thorough: a primary instance in one zone, a read replica in another for read-heavy workloads like an AI teaching assistant, and a database proxy to automatically handle read/write splitting without any code changes. The post also covers Nginx configuration for serving the frontend and proxying API and WebSocket traffic, DNS setup pointing to the load balancer, and a detailed future roadmap including Dockerization, Kubernetes, HTTPS, monitoring, and performance testing.

Takeaways
The architecture upgrade was driven by a user base growing from hundreds to over 5,000 daily active users, causing response times to spike from 200ms to over 2 seconds.
The core design principle was separating compute (application servers) from storage (database and cache).
Two cloud server instances (4 vCPU, 16GB RAM) were deployed in different availability zones for high availability.
A public load balancer distributes traffic between the two servers and performs health checks.
MySQL was moved to a managed service with a primary instance, a read replica in a different zone, and a database proxy for automatic read/write splitting.
Redis was moved to a managed primary-standby instance for high availability.
Nginx on each server handles frontend hosting, API proxying, and WebSocket forwarding.
The developer's future plans include Dockerizing the application, implementing HTTPS, setting up comprehensive monitoring and alerting, and conducting performance stress tests.
Conclusions

The most valuable part of this post isn't the cloud setup itself, but the clear decision-making process: the developer didn't just add servers, they fundamentally decoupled the database and cache from the application tier, which is the correct first step for scaling.

The use of a database proxy for automatic read/write splitting is a smart, low-effort optimization. It offloads read traffic without requiring any application-level changes, a pattern many teams overlook.

The detailed 'future plans' section reveals a mature operational mindset. The developer isn't just building for today; they are planning for Dockerization, monitoring, cost alerts, and disaster recovery drills, which is the difference between a hobby project and a production service.

The choice to use a managed database service (RDS) rather than self-hosting MySQL on the servers is a strong signal. It trades a small amount of control for significant gains in operational simplicity, backup management, and high availability.

The fact that the developer initially struggled with VPC/subnet configuration and internal network connectivity is a relatable and important reminder that networking fundamentals are often the trickiest part of cloud migrations.

Concepts & terms
Compute-Storage Separation
An architectural pattern where the servers that run your application code (compute) are distinct from the infrastructure that stores your data (storage, like databases and caches). This allows each layer to be scaled independently and improves resilience.
Read Replica
A copy of a primary database that can serve read-only queries. It's used to offload read traffic from the primary database, improving performance for read-heavy applications. The primary database still handles all writes (INSERT, UPDATE, DELETE).
Database Proxy
A service that sits between an application and a database cluster. It provides a single connection endpoint and can automatically route queries to the correct node (e.g., sending SELECTs to a read replica and writes to the primary), enabling read/write splitting without application code changes.
Load Balancer Health Check
A mechanism where a load balancer periodically sends requests (e.g., to a `/health` endpoint) to backend servers. If a server fails to respond correctly, the load balancer stops sending it traffic, ensuring users only reach healthy instances.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗