I wrote this for engineers who need to make many CPUs cooperate concurrently. I start from first principles of cache coherence and the C memory model, then build up atomics and fences that withstand both compiler and hardware reordering. I examine interrupts, preemption, and execution contexts that compete for the same data, then develop a toolbox of synchronization primitives including spinlocks, mutexes, reader-writer locks, RCU, and sequence counters. Each mechanism is presented with invariants, failure modes, progress guarantees, and guidance on when to use it.
I then construct a practical scheduler and its ecosystem. You will see per-CPU runqueues, wakeups, timers, and load balancing for SMP and NUMA, along with affinity, migration, IPIs, TLB shootdowns, deferred work, and futex-based wait-wake paths. I address priority inversion, deadlocks, livelocks, and false sharing, and I include lock-free structures with safe reclamation. The goal is simple and measurable: predictable concurrency, low tail latency, and strong throughput in a kernel written in C.