Weird FIFO scheduling trick
When you're running programs on Linux, you're using either the Earliest Eligible Virtual Deadline First EEVDF available since version 6.6 or the Completely Fair Scheduler CFS. The job of both is to be fair by dividing up CPU time equally among all the normal processes on your system. This means it's constantly preempting tasks to give other tasks a turn.
But what if you have a task that is so critical you don't want it to be preempted?
The SCHED_FIFO Trick
For that Linux provides real-time scheduling policies, primarily SCHED_FIFO (First In, First Out)
and SCHED_RR (Round-Robin). These policies operate on a static priority level from 1 (low) to 99
(high). Crucially, real-time threads always have higher priority than normal threads. A normal
SCHED_OTHER process, handled by EEVDF or CFS, effectively has a priority of 0.
The trick lies in how SCHED_FIFO works. Unlike SCHED_OTHER, SCHED_FIFO is a simple scheduling
algorithm without time slicing.
When your SCHED_FIFO thread becomes runnable, it will immediately preempt any running
SCHED_OTHER process1. Once running, the SCHED_FIFO thread will not be preempted by any other
process of the same or lower priority. It doesn't have a "time quantum" that expires. It will only
stop running under three conditions:
- It blocks (e.g., waiting for an I/O request)
- It is preempted by a thread with a higher priority
- It voluntarily gives up the CPU by calling
sched_yield
If you launch a process, set its policy to SCHED_FIFO with sched_setscheduler or sched_setattr
with a high-enough priority (and you have the permissions to do so), and that process just enters a
tight computation loop, it will effectively own that CPU core and will not be interrupted by any
normal system process.
This is different from SCHED_RR, which basically just adds a time quantum to SCHED_FIFO. If a
SCHED_RR thread runs for its allotted time, it gets moved to the end of the list for its priority,
allowing another thread of the same priority to run.
The Obvious, Giant Warning
Of course, this can be misused.
A nonblocking infinite loop in a SCHED_FIFO thread is a very effective way to freeze your entire
system.
Before kernel 2.6.25, the only way to recover was to have a shell running at an even higher real-time priority so you could log in and kill the runaway process.
Modern kernels have a couple of safety nets. The RLIMIT_RTTIME resource limit can set a ceiling on
how much CPU time a real-time process can consume. More importantly, the kernel reserves a
percentage of CPU time for non-real-time processes by default, controlled by
/proc/sys/kernel/sched_rt_period_us and /proc/sys/kernel/sched_rt_runtime_us. By
default, this reserves 5% of the CPU time
for normal processes, giving you a window to kill a runaway task.
So, while it's a powerful trick for achieving minimal-latency, jitter-free execution for real-time systems or specialized benchmarks, it's also a great way to learn what the "SysRq" key is for.
-
It will also preempt
SCHED_BATCH,SCHED_IDLE, and other realtime threads with a lower priority. ↩︎