Understanding the Go Runtime: The Scheduler

I read an article about the Go scheduler that I found to be incredibly enlightening. I’d highly recommend it for anyone writing software in Go.

A few key takeaways for me:

  • Goroutines usually preempt themselves (cooperative preemption). Before Go 1.14, this was actually the only way for goroutines to relinquish their OS thread back to the scheduler.
  • Cooperative preemption happens when a goroutine:
    • calls a function – Every function has a preamble that checks a flag to see if the scheduler wants the goroutine to preempt itself. This is a very cheap check; also, the preamble already exists to confirm that the goroutine’s stack is large enough to run the function.
    • receives from an empty channel or sends to a full one (i.e. blocking channel operation).
    • acquires a lock (i.e. mutex) that is held. (however sometimes it actually does a brief lock spin loop before preempting itself)
    • makes a system call. (however, most system calls are very fast, so the scheduler makes a quick decision re: preemption before the system call is made)
    • does some network I/O
    • explicitly yields by calling runtime.Gosched()
  • For tight loops with no function calls, the scheduler uses “asynchronous preemption” instead. In Linux this involves sending an OS signal to the thread, and the signal handler will save the thread state and yield to the scheduler. In Windows, the SuspendThread system call does a similar thing. This does not guarantee preemption in some cases (i.e. atomic operation was running, so it’s “unsafe” to preempt), but this kind of preemption is rarely needed anyway. The scheduler cannot preempt a goroutine that is executing runtime code, a system call, or cgo code, for example.
  • Context switching between goroutines is shockingly fast (50-100 nanoseconds). My understanding is that only 3 values are saved/restored: the program counter, the stack pointer, and the base pointer.
  • Deciding which goroutine to run next is interesting. There are interesting coroutine-like optimizations for consumer / producer patterns.

All of these details are yet another reminder that “cgo is not Go.” You have to assume that any cgo call or system call that is executed in your Go program will essentially “eat” an OS thread from the scheduler while it’s running. Although the scheduler can just create more threads when it needs to (up to ~10k), this is much more expensive than creating a goroutine.

Also noteworthy is Go handles network I/O with the netpoller. Instead of blocking an OS thread for network I/O, it puts the goroutine to sleep and uses epoll or kqueue to wake it up later. This avoids creating lots of OS threads.