Tricky parts of Golang. Goroutine Leaks: Patterns, Detection, and Prevention

A goroutine leak is one of the most insidious bugs in Go. Unlike a memory leak in C, nothing crashes. No panic, no OOM killer. Your service just slowly grows in memory and CPU usage over hours or days until it falls over — or someone notices the goroutine count in a dashboard.

This article covers the three root causes, how to find leaks with the tools built into the standard library, and API design patterns that make leaks structurally impossible.

What a Goroutine Leak Is

A goroutine leak occurs when a goroutine is started and then blocks indefinitely, preventing it from ever finishing and being garbage-collected. Because goroutines are cheap (starting at ~2KB of stack), teams launch them freely — and a small persistent leak rate can accumulate to tens of thousands of stuck goroutines in a long-running service.

Each leaked goroutine:

  • Holds its stack memory (at minimum 2KB, often more after growth)
  • Holds any references on that stack, pinning objects on the heap
  • Consumes a scheduler slot — the runtime still accounts for it

A service leaking 10 goroutines per request at 1000 req/s accumulates 36 million leaked goroutines per hour. In practice services fail much sooner than that, but the math shows why the rate matters more than the absolute count.

Root Cause 1: Channel Sends and Receives with No Partner

The simplest leak: a goroutine blocks on a channel operation that nobody ever resolves.

Blocked receive

func search(query string) Result {
    ch := make(chan Result)

    go func() {
        ch <- fetchFromDB(query) // blocks until someone receives
    }()

    select {
    case result := <-ch:
        return result
    case <-time.After(100 * time.Millisecond):
        return Result{} // timeout — goroutine is now leaked
    }
}

When the timeout fires, search returns, but the goroutine is still running fetchFromDB. When fetchFromDB eventually finishes and tries to send on ch, there is no receiver. ch goes out of scope when the goroutine finishes — but the goroutine never finishes, so ch is never collected, and neither is the goroutine.

Fix — use a buffered channel:

ch := make(chan Result, 1) // buffer of 1: send never blocks when caller is gone

With a buffer of 1, the goroutine can always send its result and exit, even if the caller already returned due to timeout.

Fix — use context:

func search(ctx context.Context, query string) Result {
    ch := make(chan Result, 1)

    go func() {
        select {
        case ch <- fetchFromDB(query):
        case <-ctx.Done(): // exit cleanly if caller cancels
        }
    }()

    select {
    case result := <-ch:
        return result
    case <-ctx.Done():
        return Result{}
    }
}

Blocked send

func producer(jobs []Job) <-chan Result {
    out := make(chan Result) // unbuffered

    go func() {
        for _, j := range jobs {
            out <- process(j) // blocks if no one is reading
        }
        close(out)
    }()

    return out
}

func main() {
    results := producer(jobs)
    first := <-results // read only one result
    // ...return, results goes out of scope
    // goroutine is stuck trying to send the second result
}

Fix: Always drain a channel before abandoning it, or pass a done channel that the producer listens to:

func producer(ctx context.Context, jobs []Job) <-chan Result {
    out := make(chan Result)

    go func() {
        defer close(out)
        for _, j := range jobs {
            select {
            case out <- process(j):
            case <-ctx.Done():
                return // caller cancelled — exit cleanly
            }
        }
    }()

    return out
}

Root Cause 2: Forgotten Context

Context cancellation is the standard Go mechanism for telling goroutines to stop. Forgetting to cancel, or forgetting to listen for cancellation, leaves goroutines running long after they should have stopped.

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    // BUG: cancel never called — context leak (and possible goroutine leak downstream)

    result, err := fetchData(ctx)
    // ...
}

Always defer cancel() immediately after creating a derived context:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel() // guaranteed to run

    result, err := fetchData(ctx)
    // ...
}

Forgetting cancel does not always cause a goroutine leak by itself, but it keeps the context’s internal goroutine alive and delays propagation of cancellation to any work spawned under that context. go vet will flag uncalled cancel functions.

Goroutines that ignore context

A goroutine that spawns on a cancelled context but never checks ctx.Done() will run to completion regardless:

func backgroundWorker(ctx context.Context) {
    for {
        doWork()
        time.Sleep(time.Second) // never checks ctx — ignores cancellation
    }
}

Fix: Check ctx.Done() in every long-running loop:

func backgroundWorker(ctx context.Context) {
    for {
        select {
        case <-ctx.Done():
            return
        default:
        }

        doWork()

        select {
        case <-ctx.Done():
            return
        case <-time.After(time.Second):
        }
    }
}

Root Cause 3: Range Over a Channel That Never Closes

func consume(ch <-chan Event) {
    go func() {
        for event := range ch { // blocks forever if ch is never closed
            handle(event)
        }
    }()
}

If the sender never calls close(ch), this goroutine blocks on the channel forever. The fix is a discipline enforced at the producer: whoever creates a channel is responsible for closing it when done. Use defer close(ch) in the producer goroutine to make this unconditional:

func produce(ctx context.Context) <-chan Event {
    ch := make(chan Event)
    go func() {
        defer close(ch) // always close, even on early return
        for {
            select {
            case <-ctx.Done():
                return
            case ch <- nextEvent():
            }
        }
    }()
    return ch
}

Detecting Leaks

pprof goroutine profile

The Go runtime tracks every running goroutine. You can dump them at any time:

import _ "net/http/pprof" // registers /debug/pprof handlers

// Then in another goroutine:
http.ListenAndServe(":6060", nil)
# View goroutine count and stacks:
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Or get a text snapshot:
curl http://localhost:6060/debug/pprof/goroutine?debug=2

Look for goroutines stuck at the same stack frame across multiple snapshots. A growing count of goroutines all blocked at chan receive on the same line is a strong leak signal.

goleak in tests

goleak by Uber is the most practical tool for catching leaks before they reach production. It checks that no unexpected goroutines remain after a test:

import "go.uber.org/goleak"

func TestSearch(t *testing.T) {
    defer goleak.VerifyNone(t) // fails the test if goroutines are leaked

    result := search(context.Background(), "query")
    _ = result
}

goleak is a drop-in for existing tests. Add it to any test that spawns goroutines and it will catch leaks introduced by future changes automatically.

Runtime metrics

fmt.Println(runtime.NumGoroutine()) // current count

Log or export this as a metric. A monotonically increasing goroutine count is a reliable early warning. In production, expose it via your metrics system (Prometheus, InfluxDB, etc.) and alert on sustained growth.

Leak-Proof API Design

The most reliable way to prevent leaks is to design APIs that make it impossible for callers to forget to clean up.

Rule 1: Every goroutine needs a shutdown path.

If you start a goroutine in a constructor or Start method, provide a corresponding Stop or cancel mechanism. Document it. Better: use context.Context as the first parameter of any function that spawns goroutines, so cancellation is part of the contract.

Rule 2: Never return a channel without also returning a way to stop the producer.

// Fragile: no way to stop the producer
func StreamEvents() <-chan Event

// Better: context controls the producer's lifetime
func StreamEvents(ctx context.Context) <-chan Event

Rule 3: Prefer synchronous APIs and add concurrency at the call site.

A function that does its work synchronously and returns a result is trivially leak-free. Concurrency is the caller’s responsibility, not the library’s. Push goroutines up the call stack toward main, not down into packages.

The Lifecycle of a Leak-Free Goroutine

flowchart TD
    A["Goroutine started\n(with context or done channel)"] --> B["Does work"]
    B --> C{Work complete\nor ctx cancelled?}
    C -- "yes" --> D["Goroutine returns\n(runtime frees stack)"]
    C -- "no" --> B

Every goroutine you start should have a clear answer to: what will cause this goroutine to return? If the answer is “nothing in particular,” you have a potential leak.

Summary

Root cause Detection signal Fix
Blocked channel send/receive Goroutines stuck at chan send/chan receive Buffered channels; context cancellation in select
Forgotten context cancel Increasing timeout/deadline counts defer cancel() immediately after WithTimeout/WithCancel
Unchecked context in loop CPU without progress, goroutine blocked in sleep Check ctx.Done() in every long-running loop
Channel never closed Goroutines stuck at range chan defer close(ch) in the producer goroutine

The common thread in every fix is ownership: someone must be responsible for ending each goroutine, and that responsibility must be encoded in the API — not assumed from documentation.