Understanding why Go/Rust >> Python/Nodejs for CPU-bound tasks!

The Setup

Picture this: you've got a CPU-intensive task—computing the sum of squares for 50 million numbers. You fire up Python with 4 threads, expecting to see your 4-core machine flex its muscles. But something's wrong. Your CPU usage hovers around 25%, and the program takes just as long (or longer) than if you'd used a single thread.

What's happening? You've just met Python's Global Interpreter Lock (GIL), and it's not playing nice with your parallel dreams.

The Experiment

To understand this phenomenon, I built a simple benchmark: compute sum(i*i for i in range(N)) across multiple workers. This is pure CPU-bound work—no I/O, no waiting, just raw computation. The same task, implemented in three languages, tells a revealing story.

Python (threads): The GIL prevents multiple threads from executing Python bytecode simultaneously. Even with 4 threads, your program will mostly use 1 CPU core. You may even see no speedup or a slight slowdown because of thread switching overhead. It's like having four workers, but only one can work at a time—and they're constantly fighting over the same tool.

Go (goroutines): Goroutines are scheduled on OS threads by the Go runtime. The work can run simultaneously on multiple CPU cores. Expect near-linear speedup (e.g., ~4× on 4 cores). Here, all four workers can actually work in parallel.

Rust (threads): Rust spawns OS threads directly (no GIL). Expect near-linear speedup similar to Go. Raw power, no compromises.

The Results

On a 4-core machine, the numbers tell the story:

  • Python (4 threads): ~7.3376s — struggling with the GIL
  • Go (4 goroutines) / Rust (4 threads): ~0.278s / ~0.330s — smooth parallel execution

The absolute numbers depend on your CPU, but the trend is rock-solid: Python won't parallelize pure CPU-bound work with threads; Go and Rust will.

The Takeaway

This isn't about Python being "bad"—it's about choosing the right tool for the job. Python's GIL exists for good reasons (simplified memory management, thread safety), but it comes with a cost for CPU-bound parallelism. When you need true parallel execution, Go's goroutines and Rust's native threads deliver the performance you expect.

The lesson? Know your workload. For I/O-bound tasks, Python threads work great. For CPU-bound tasks, you might want to reach for Go or Rust—or use Python's multiprocessing module, which sidesteps the GIL entirely.

Try It Yourself

This repo contains minimal examples in all three languages. Run ./run_benchmarks.sh and watch the GIL's impact in real-time. Sometimes, seeing is believing.

=========================================
BENCHMARK RESULTS COMPARISON
=========================================
Python: 7.376 seconds
Go:     0.278 seconds
Rust:   0.330 seconds

šŸ† WINNER: Go (0.278s)
=========================================

Comments

Popular posts from this blog

Free Hand Drawing on Google's Map View

India: Union Budget 2025 - Notes