rust infra

Why the gateway got rewritten in Rust

Two years of Python latency tax, a runaway autoscaler, and a 3am incident later — the case for moving the hot path off the GIL.

April 11, 2026 14 min #rust#infra

The Cuecoder API gateway started as a FastAPI service in early 2024. It worked. It was fast to build. And for a year, it was fine.

Then traffic grew. Not virally — steadily. 40 rps became 400. 400 became 4,000. And somewhere around 1,500 rps, the Python GIL stopped being a footnote and started being the bottleneck.

The numbers that prompted the rewrite

At peak:

p50 latency: 18ms (acceptable)
p95 latency: 340ms (bad)
p99 latency: 1,200ms (unacceptable)
Autoscaler thrash: pods spinning up and down every 3 minutes

The p99 was the tell. In Python’s asyncio, a slow synchronous operation blocks the event loop. Even with async/await everywhere, we had a few places where JSON deserialization or route matching was taking longer than expected. Under load, those became queue backups.

The 3am incident was a runaway token counter — a synchronous Redis call inside an async handler — that stalled the event loop and caused a cascade. 14 minutes of degraded service. Not great.

Why Rust specifically

We considered Go. Go has better tooling, a larger community, and fewer sharp edges. The reason we chose Rust:

No GC pauses — Go’s GC is good. Rust has no GC at all. For a latency-sensitive hot path, that’s a meaningful difference at p99.
WASM target — the CLI (cue) was already in Rust. Sharing types between the gateway and the CLI via WASM was valuable.
Axum — the Rust async web framework. Ergonomic enough that the team could ship a working service in a week.

The migration

We didn’t rewrite all at once. The approach:

Extract the hot path (request parsing, auth, routing, upstream proxy) into a Rust service behind a feature flag.
Shadow 5% of traffic for a week, comparing outputs.
Ramp to 100% over two weeks.

Total rewrite time: six weeks of one engineer’s time.

Results

After the migration:

p50 latency: 4ms (−78%)
p95 latency: 22ms (−94%)
p99 latency: 48ms (−96%)
Pod count: stable, no thrash
Memory: 40% reduction

The autoscaler stopped spinning. The on-call rotation got quieter.

What we’d do differently

The main regret is not migrating the token counter earlier — it was the root cause of the incident and it was a known issue. The lesson: synchronous operations in async handlers are not tech debt, they’re incidents waiting to happen.

The second regret: the Rust error handling surface is large and the team took two weeks to converge on a consistent error type. Starting with thiserror and anyhow from day one would have saved that time.

The rewrite was worth it. Not because Rust is better than Python in the abstract — it often isn’t. But for a latency-sensitive hot path at the edge of a distributed system, the GIL is a liability you can eventually afford to remove.

← back to writing Subscribe to The Cue →