Home

Posts

Tricky parts of Golang. Preemption in Go

August 18, 2025

#golang

In the Go runtime, the scheduler is the engine that manages how thousands of goroutines are multiplexed onto a limited number of operating system threads. To ensure system responsiveness and prevent “resource hogging,” Go employs a mechanism called preemption.

This article explores the internal mechanics of the Go scheduler, the transition from cooperative to non-cooperative preemption, and real-world case studies from high-scale systems.

Why do we need it?

Imagine you’re running a tight loop that never calls anything else. In a cooperative system, that goroutine would hog the CPU forever. With preemption, the runtime periodically interrupts it, giving other goroutines a chance to run. That’s how Go keeps the “green threads” from starving each other.

When you’ll see it in action

CPU‑bound loops: If you write a tight loop that does no I/O or function calls, Go will still preempt it, but the pause will happen only at the next safe point (e.g., after a function call).
Blocking I/O: When a goroutine blocks on a network read, the scheduler can preempt it and run another goroutine on the same OS thread.
Garbage collection: During a GC pause, all goroutines are preempted so the collector can safely move memory.

A quick Python analogy

Below is a tiny Python snippet that mimics the idea of a scheduler pausing a task:

import time
import threading

def worker():
    for i in range(5):
        print(f"Working {i}")
        # Pretend the scheduler interrupts us here
        time.sleep(0.1)  # This is the cooperative yield

t = threading.Thread(target=worker)
t.start()

In Python, time.sleep() is the explicit yield. Go, on the other hand, inserts its own yield points automatically, so you don’t have to sprinkle sleep calls everywhere.

The G-M-P Architecture

To understand preemption, one must first understand the three building blocks of the Go scheduler:

G (Goroutine): The user-level thread. It contains the stack, instruction pointer, and other metadata.
M (Machine): An actual OS thread that executes the code.
P (Processor): A logical resource or “context” required to run Go code. The number of P’s defaults to the number of CPU cores (GOMAXPROCS).

For a goroutine (G) to execute, it must be assigned to a processor (P), which is then bound to an OS thread (M).

graph LR  
    subgraph Scheduling\_Context  
    G1[G: Goroutine] --> P[P: Processor Context]  
    P --> M  
    M --> CPU[Hardware CPU]  
    end  
    Sys -.->|Monitors| M

The Evolution of Preemption: From Cooperative to Asynchronous

Historically, Go’s scheduler was purely cooperative. A goroutine would only yield control at specific “safe-points” inserted by the compiler, such as function calls, channel operations, or system calls.

The Tight Loop Problem (Pre-Go 1.14)

The significant flaw in cooperative multitasking was the “tight loop” vulnerability. A loop that performed heavy computation without function calls could effectively block the scheduler.


package main

import "runtime"

func tightLoop() {  
    for {  
        // Endless loop with no function calls  
    }  
}

func main() {  
    runtime.GOMAXPROCS(1)  
    go tightLoop() // This G will hog the only P  
    go func() {  
        println("This will never print in Go < 1.14")  
    }()  
    select {}  
}

In earlier versions, the “coordinator” goroutine would be starved, leading to a system-wide hang or even deadlocks in internal runtime tests like TestGoroutineParallelism. Furthermore, the Garbage Collecto (GC) would be unable to “stop the world,” leading to multi-second latency spikes.

Asynchronous Preemption (Go 1.14+)

To solve this, Go 1.14 introduced non-cooperative, asynchronous preemption. The runtime now uses a background thread called sysmon to monitor goroutines. If a goroutine runs for more than ~10ms, sysmon sends a SIGURG signal to the running thread. This signal interrupts execution and requests preemption; the goroutine is then stopped at the next async-safe point, allowing the runtime to safely save its state and perform a context switch even in long-running hot loops.

What actually happens:

SIGURG interrupts the OS thread
The signal handler:
- Sets preemption flags (preempt, stack guard poisoning, etc.)
- Redirects execution only if the current PC is at an async-safe point
If not at a safe point:
- Execution resumes
- Preemption happens at the next async-safe point

Real-World Case Studies: War Stories from the Field

Twitch: Eliminating 10-Second GC Pauses

In its early days, Twitch’s interactive chat service suffered from GC pauses that froze the application for tens of seconds. By upgrading through Go 1.5 and 1.6, which refined the scheduler’s interaction with the GC, Twitch reduced these pauses to under 70ms—a 30x improvement that was critical for a real-time service.

Cloudflare: The ARM64 Compiler Bug

At Cloudflare’s massive scale, they discovered a rare race condition in the Go ARM64 compiler triggered by asynchronous preemption. During stack unwinding (often for GC), a goroutine being preempted at the exact moment of a stack pointer adjustment could lead to stack corruption and a fatal panic: “traceback did not unwind completely”. This discovery forced the community to refine how the runtime handles “unsafe points” during signal-based preemption.

Uber: Dynamic GOMAXPROCS

Uber identified that in containerized environments (like Kubernetes), the Go runtime often miscalculates the available CPU cores, leading to excessive context switching. They developed a library to automatically adjust GOMAXPROCS to match the container’s CPU quota, ensuring the scheduler doesn’t spawn more threads than the underlying hardware can actually handle.

Side-by-Side: Go vs. Python Concurrency

While Python is often used for rapid prototyping, its concurrency model lacks the preemptive resilience of Go.

Feature	Go (1.14+)	Python (Asyncio)
Multitasking Type	Asynchronous Preemptive	Cooperative
Parallelism	True Parallelism (across all cores)	Concurrent but not Parallel (GIL)
Starvation Risk	Low; handled by signals	High; tight loops freeze the event loop
Communication	Channels (CSP model)	Shared memory / Callbacks

In Python’s asyncio, a single blocking call (like a heavy calculation without an await) will freeze the entire program because it lacks a background monitor like Go’s sysmon to force preemption.

Practical Optimizations and Profiling

To manage preemption in production, we should utilize Go’s built-in observability tools:

pprof: Use go tool pprof to capture CPU profiles. In the resulting flame graphs, a high concentration of time in runtime.gcDrain or runtime.scanobject often indicates that preemption is struggling to stop goroutines for a GC cycle.
Execution Tracing: go tool trace provides a visual timeline of when goroutines are preempted, helping identify if a specific task is being interrupted too frequently (increasing context switch overhead).
Manual Yielding: In performance-critical kernels where 10ms is too long, we can still use runtime.Gosched() to voluntarily yield control.

Practical Tips

Avoid long, call-free loops

for i := 0; i < 1e9; i++ { // no calls, limited preemption opportunities
    // work
}

Before Go 1.14, loops like this could completely starve the scheduler. Since Go 1.14, asynchronous preemption usually prevents total starvation, but preemption still happens only at async-safe points and may be delayed. Fix: insert a lightweight yield or call:

runtime.Gosched()
// or any small function call

Understand where preemption can occur

Function calls, returns, and blocking operations (for example net.Conn.Read) are the most reliable preemption points. Since Go 1.14, the runtime can also preempt goroutines at asynchronous safe points, even inside tight loops without calls, but this is not guaranteed at every instruction. Rule of thumb: if a goroutine does long CPU-bound work, give the scheduler an explicit chance to run others.

Choose concurrency primitives carefully

Channels yield naturally when they block (send/receive on an unready channel), making them good scheduling points.
Mutexes may also block when contended, but holding a mutex during long CPU work increases contention and latency. Other goroutines cannot make progress until the lock is released—even if preemption occurs. Prefer keeping critical sections small and avoid heavy computation while holding locks.

Profile scheduling behavior

Use pprof to inspect scheduler and runtime metrics. High preemption or long-running goroutines in profiles often indicate tight loops or oversized critical sections that should be refactored.

Protect shared state correctly

A goroutine can be preempted at almost any safe point, including while updating shared memory. Use atomic operations or proper synchronization to prevent subtle data races—never rely on “this code probably won’t be preempted here.”

The Future: Go 1.24 and Beyond

As of 2025, Go 1.24 has further refined the scheduler to reduce CPU overhead by 2-3% through more efficient goroutine parking and wake-up cycles. The runtime continues to evolve toward “fine-grained preemption,” aiming to reduce the cost of signal-based interrupts on non-Unix systems.

Conclusion

Preemption is not a silver bullet; it is a safety net. While Go 1.14+ protects systems from being frozen by tight loops, the most efficient code remains that which yields naturally. By understanding the G-M-P model and utilizing profiling tools like pprof, we can build concurrent systems that are both fair and extraordinarily fast.

Sources