Pascal Zittlau


Weird FIFO scheduling trick


When you're running programs on Linux, you're using either the Earliest Eligible Virtual Deadline First EEVDF available since version 6.6 or the Completely Fair Scheduler CFS. The job of both is to be fair by dividing up CPU time equally among all the normal processes on your system. This means it's constantly preempting tasks to give other tasks a turn.

But what if you have a task that is so critical you don't want it to be preempted?

The SCHED_FIFO Trick

For that Linux provides real-time scheduling policies, primarily SCHED_FIFO (First In, First Out) and SCHED_RR (Round-Robin). These policies operate on a static priority level from 1 (low) to 99 (high). Crucially, real-time threads always have higher priority than normal threads. A normal SCHED_OTHER process, handled by EEVDF or CFS, effectively has a priority of 0.

The trick lies in how SCHED_FIFO works. Unlike SCHED_OTHER, SCHED_FIFO is a simple scheduling algorithm without time slicing.

When your SCHED_FIFO thread becomes runnable, it will immediately preempt any running SCHED_OTHER process1. Once running, the SCHED_FIFO thread will not be preempted by any other process of the same or lower priority. It doesn't have a "time quantum" that expires. It will only stop running under three conditions:

If you launch a process, set its policy to SCHED_FIFO with sched_setscheduler or sched_setattr with a high-enough priority (and you have the permissions to do so), and that process just enters a tight computation loop, it will effectively own that CPU core and will not be interrupted by any normal system process.

This is different from SCHED_RR, which basically just adds a time quantum to SCHED_FIFO. If a SCHED_RR thread runs for its allotted time, it gets moved to the end of the list for its priority, allowing another thread of the same priority to run.

The Obvious, Giant Warning

Of course, this can be misused.

A nonblocking infinite loop in a SCHED_FIFO thread is a very effective way to freeze your entire system.

Before kernel 2.6.25, the only way to recover was to have a shell running at an even higher real-time priority so you could log in and kill the runaway process.

Modern kernels have a couple of safety nets. The RLIMIT_RTTIME resource limit can set a ceiling on how much CPU time a real-time process can consume. More importantly, the kernel reserves a percentage of CPU time for non-real-time processes by default, controlled by /proc/sys/kernel/sched_rt_period_us and /proc/sys/kernel/sched_rt_runtime_us. By default, this reserves 5% of the CPU time for normal processes, giving you a window to kill a runaway task.

So, while it's a powerful trick for achieving minimal-latency, jitter-free execution for real-time systems or specialized benchmarks, it's also a great way to learn what the "SysRq" key is for.


  1. It will also preempt SCHED_BATCH, SCHED_IDLE, and other realtime threads with a lower priority. ↩︎