Concurrency Models — Node.js vs WSGI vs Threads vs Go
Understanding how different server runtimes handle concurrent requests is a common senior interview topic.
The Core Problem: Handling 10,000 Concurrent Connections
Each HTTP connection is a file descriptor (socket).
OS can handle millions of file descriptors.
Problem: how does your server handle 10k simultaneous requests?
Option 1: One thread per request → 10k threads → memory explosion, context switching
Option 2: Thread pool → bounded threads, requests queue
Option 3: Async I/O / Event Loop → one thread, never blocks
Option 4: Lightweight coroutines → green threads / goroutinesThread-Per-Request Model (Apache, Java Tomcat, Rails + Puma)
┌─────────┐ request ┌─────────────┐ spawn/assign ┌──────────────┐
│ Client │ ─────────→│ Server │ ──────────────→ │ Thread 1 │ ← handles req1
└─────────┘ └─────────────┘ ├──────────────┤
│ Thread 2 │ ← handles req2
├──────────────┤
│ Thread 3 │ ← handles req3
└──────────────┘
Each thread:
- Stack: 512KB - 8MB per thread
- Context switch: ~5µs of CPU time
- 10,000 threads = ~5-40GB of RAM just for stacks
Thread pool (fixed size):
Pool of N threads. Incoming requests queue. When thread finishes → picks next.
JVM default: 200 threads
Max concurrent = pool size
Requests beyond pool size queue or get rejected (503)java// Java/Spring Boot — thread-per-request by default:
@RestController
public class OrderController {
@GetMapping("/orders/{id}")
public Order getOrder(@PathVariable Long id) {
// Each request runs in its own thread from the thread pool
Order order = db.findById(id); // BLOCKS the thread while waiting for DB
return order;
}
}
// Thread is idle during DB wait — not using CPU, but still occupying stack memoryWSGI — Python's Synchronous Server Interface
WSGI = Web Server Gateway Interface (PEP 3333)
Standard interface between Python web apps and web servers.
Request handling:
1. Web server (Gunicorn/uWSGI) receives HTTP request
2. Calls WSGI app as a callable: app(environ, start_response)
3. App returns an iterable of response body chunks
4. Everything is SYNCHRONOUS — no async/await in WSGI
def application(environ, start_response):
status = '200 OK'
headers = [('Content-Type', 'text/plain')]
start_response(status, headers)
return [b'Hello World'] # WSGI app
Gunicorn worker types:
sync (default): one thread per worker, blocks on I/O
gthread: threads per worker (thread-per-request within each worker)
gevent: monkey-patches stdlib with greenlets (async via greenlets)
eventlet: similar to gevent
Typical production: gunicorn --workers 4 --worker-class sync
4 workers = 4 simultaneous requests (per process)
Each worker is a forked Python process
Formula: workers = 2 × CPU_cores + 1WSGI Concurrency Model:
┌──────────────┐ HTTP ┌─────────────────────────────────┐
│ nginx │ ────────→│ Gunicorn (pre-fork model) │
│ (reverse │ │ │
│ proxy) │ │ Worker 1 (PID 1234): req 1 │
└──────────────┘ │ Worker 2 (PID 1235): req 2 │
│ Worker 3 (PID 1236): req 3 │
│ Worker 4 (PID 1237): idle │
│ req 5: WAITING (all workers busy)│
└─────────────────────────────────┘
With sync workers:
req doing: db query (10ms) + external API (200ms) + response (1ms)
Worker is BLOCKED for 211ms — not usable by other requests
4 workers → max ~4/0.211 ≈ 19 RPS per second efficientlyASGI — Python's Async Interface
ASGI = Asynchronous Server Gateway Interface
Supports async/await, WebSockets, and HTTP/2.
Frameworks: FastAPI, Django 3.1+ (async views), Starlette
async def application(scope, receive, send):
if scope['type'] == 'http':
body = await receive()
await send({'type': 'http.response.start', 'status': 200, ...})
await send({'type': 'http.response.body', 'body': b'Hello'})
Servers: uvicorn (uses uvloop), hypercorn, daphne
With async:
Same 4 workers, but each worker can handle THOUSANDS of concurrent requests
if they're I/O bound (DB, external APIs)
Worker 1 running 500 async coroutines — all suspended waiting for I/O
when one gets its I/O result, it runs briefly, suspends againNode.js — Single-Threaded Event Loop
┌──────────────────────────────────────────────────────────────┐
│ Node.js Process │
│ │
│ Single thread runs JS (your code) │
│ │
│ Event Loop: poll I/O completions → run callbacks │
│ │
│ libuv thread pool (4 threads): handles blocking I/O │
│ fs.readFile, crypto, dns.lookup │
│ │
│ OS async I/O: TCP, UDP → no threads needed │
└──────────────────────────────────────────────────────────────┘
10,000 HTTP connections:
Each connection = a socket file descriptor (just an integer)
OS monitors all 10k sockets via epoll/kqueue
When data arrives → Node.js callback fires
Your JS code runs briefly, suspends at next I/O → back to event loop
Memory per connection: ~2KB (socket buffer) vs 512KB-8MB (thread stack)javascript// Node.js HTTP server — handles 10k concurrent connections in one thread
import http from 'http';
import { Pool } from 'pg';
const pool = new Pool({ max: 20 }); // only 20 actual DB connections needed!
const server = http.createServer(async (req, res) => {
// This callback runs in the single JS thread
// But it SUSPENDS during await, letting other requests run
const result = await pool.query('SELECT * FROM orders WHERE id = $1', [1]);
// ↑ suspends here — event loop handles other requests
// ↑ pool has only 20 connections for potentially 10,000 concurrent requests!
// ↑ other requests await in queue — the pool manages this
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify(result.rows));
});
server.listen(3000);
// This single process handles 10k connections efficiently!What Blocks Node.js
javascript// ❌ This BLOCKS the event loop — ALL requests stall:
app.get('/compute', (req, res) => {
// Synchronous CPU-intensive work — event loop cannot process other requests
const result = fibonacci(45); // takes 10+ seconds!
res.json({ result });
});
// ✅ Offload to worker thread:
import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';
app.get('/compute', (req, res) => {
const worker = new Worker('./fibonacci-worker.js', {
workerData: { n: 45 }
});
worker.on('message', result => res.json({ result }));
worker.on('error', err => res.status(500).json({ error: err.message }));
});
// fibonacci-worker.js:
if (!isMainThread) {
const result = fibonacci(workerData.n);
parentPort?.postMessage(result);
}Go — Goroutines (M:N Threading)
Go uses goroutines: lightweight threads managed by the Go runtime.
OS threads (M): small number (1 per CPU core typically)
Goroutines (N): can have millions; only a few run at a time
Go runtime scheduler:
Maps N goroutines onto M OS threads
When goroutine blocks (I/O, channel, mutex):
Runtime parks it, runs another goroutine on that OS thread
Goroutine stack: starts at 8KB (dynamic, grows/shrinks)
OS thread stack: 1-8MB fixed
10,000 concurrent connections in Go:
10,000 goroutines, one per connection
Total stack: 10k × 8KB = 80MB (vs 10k × 1MB threads = 10GB)
Go scheduler runs them on 4-8 OS threads
When goroutine does I/O → suspended → another goroutine runsgo// Go HTTP server — each request gets its own goroutine:
package main
import (
"database/sql"
"encoding/json"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
// Runs in its own goroutine (lightweight thread)
// Can use BLOCKING-style I/O — scheduler suspends goroutine automatically
rows, err := db.QueryContext(r.Context(), "SELECT * FROM orders WHERE id = $1", 1)
// ↑ looks synchronous but goroutine is suspended during DB wait
if err != nil { http.Error(w, err.Error(), 500); return }
// ...
json.NewEncoder(w).Encode(rows)
}
func main() {
http.HandleFunc("/orders", handler)
http.ListenAndServe(":8080", nil)
// Each connection → new goroutine → blocking I/O works naturally
}Comparison Table
Model | Runtime | Concurrency Unit | Memory/Unit | I/O Style
───────────────────┼──────────────────┼──────────────────┼─────────────┼──────────
Thread-per-request | Java/C#/Rails | OS thread | 1-8MB | Blocking
WSGI (sync) | Python+Gunicorn | Process/Thread | ~50-100MB | Blocking
WSGI (gevent) | Python+gevent | Greenlet | ~few KB | Patched
ASGI | Python+uvicorn | Coroutine | ~few KB | async/await
Node.js | V8 + libuv | Callback/Promise | ~2KB/conn | async/await
Go | Go runtime | Goroutine | 8KB+ | Blocking*
Erlang/Elixir | BEAM VM | Erlang process | ~300 bytes | Message-pass
Rust (tokio) | tokio async rt | Future/Task | ~few KB | async/await
* Looks blocking, scheduler suspends automaticallyWhen to Choose What
Node.js:
✓ I/O-bound workloads (REST APIs, BFF, proxies)
✓ Real-time (WebSockets, SSE)
✓ JSON-heavy APIs (V8 is fast at JSON)
✗ CPU-intensive computation (use worker threads or separate service)
✗ Heavy parallel computation (Go/Rust more efficient)
Python ASGI (FastAPI):
✓ ML/data science (Python ecosystem — numpy, pandas, torch)
✓ Rapid prototyping
✓ Data pipelines
✗ High-throughput APIs (Python interpreter overhead)
Go:
✓ High-throughput network services
✓ Systems programming (CLI tools, gRPC services)
✓ When you need true parallelism without threads complexity
✗ Rich ecosystem for web (JS/Python win here)
Java/JVM:
✓ Enterprise, complex business logic
✓ Reactive (Webflux) for async
✓ Strong ecosystem (Spring)
✗ Startup time (JVM warmup) — bad for serverless
✗ Memory (JVM overhead)
Serverless (Lambda):
✓ Sporadic traffic, extreme scaling
✓ No server management
✗ Cold starts
✗ Long-running tasks
✗ Stateful connections (WebSockets complicated)The C10K Problem and Modern Solutions
C10K (10,000 concurrent connections) — classic 1999 paper by Dan Kegel.
Old approach: one thread per connection.
10k threads × 1MB stack = 10GB RAM. Plus context switch overhead.
Falls apart at scale.
Modern solutions:
1. Async I/O with event loop (Node.js, nginx, Redis)
OS handles I/O multiplexing (epoll on Linux, kqueue on macOS)
Single thread handles thousands of sockets
2. Lightweight concurrency primitives (Go goroutines, Erlang processes)
Thousands of "threads" with tiny stacks
Runtime scheduler, not OS scheduler
3. Reactor pattern (Java Netty, Vert.x)
Non-blocking NIO, event-driven, callbacks
Today we talk about C10M (10 million connections):
Needed by: live trading, game servers, IoT
Requires: kernel bypass (DPDK), RDMA, custom network stack
Normal backends don't need thisCommon Interview Questions
Q: Why is Node.js good for I/O-bound but bad for CPU-bound? Node.js runs JavaScript on a single thread. For I/O (network, disk), the thread suspends while waiting and the event loop handles other requests. For CPU work (hashing, image processing, compression), the thread runs continuously — blocking all other requests. Solution: offload CPU work to worker threads, child processes, or a separate service.
Q: What is WSGI? How is it different from ASGI?
WSGI (PEP 333) is Python's synchronous web server interface — app(environ, start_response). Each request blocks a worker thread/process until complete. ASGI is the async version — async def app(scope, receive, send). ASGI supports async/await, WebSockets, and HTTP/2. FastAPI uses ASGI; Flask/Django (older versions) use WSGI.
Q: How does Go handle 10k concurrent connections with only 4 OS threads? Go uses M:N threading. Goroutines are user-space lightweight threads (start at 8KB stack). The Go runtime scheduler maps N goroutines onto M OS threads. When a goroutine blocks on I/O, the scheduler parks it and runs another goroutine on that OS thread. You write blocking-style code but the scheduler makes it non-blocking.
Q: What is the event loop and why can't you block it?
The Node.js event loop is a single-threaded loop that processes I/O callbacks, timers, and promises. When you await fetch(...), Node.js registers a callback and yields — the event loop processes other callbacks. If you run a CPU-heavy loop synchronously, the event loop thread is occupied — no other callbacks can run, so all concurrent requests stall until your loop finishes.
Q: How many Node.js processes should you run on a 4-core machine?
One per CPU core — so 4 processes. Use the cluster module or PM2 in cluster mode. Each process gets its own V8 instance and event loop. The OS distributes connections across processes. Alternatively, run 1 process with worker threads for CPU tasks.