Writing a user-space loader in zig
In Part 2, we looked at the map. We learned that an ELF file describes Segments that need to be loaded into memory.
Now, we are going to write the code to do it.
We will write a simple user-space loader in Zig. Why Zig? Because it makes manual memory management and alignment math explicit and relatively painless, which is exactly what we need when messing with page tables.
The Skeleton
Our loader is a program that takes one argument: the path to the binary we want to run.
1const std = @import("std");
2const mem = std.mem;
3const elf = std.elf;
4const posix = std.posix;
5const page_size = 4096;
6
7pub fn main() !void {
8 // Parse arguments to find the target binary
9 if (std.os.argv.len < 2) return error.NoInput;
10 const filename = std.os.argv[1];
11
12 const fd = try posix.open(mem.sliceTo(filename, 0), .{}, 0);
13 const file = std.fs.File{ .handle = fd };
14 var file_buffer: [page_size]u8 = undefined;
15 var file_reader = file.reader(&file_buffer);
16 const header = try elf.Header.read(&file_reader.interface);
17
18 // TODO: the actual loading ...
19}
Zig's standard library already has an ELF parser, which saves us from manually unpacking bytes. Once we have the header, we can move to the hard part.
The Address Space
Where do we load the code? It depends on the type of the executable which can be found by inspecting
header.e_type. This can be one of NONE, REL, EXEC, DYN, or CORE.
For our purposes only two matter: EXEC, for fixed binaries that have a hardcoded address where
they must be loaded; and DYN, for position independent code that use relative addressing. We can
load them anywhere we want.
If it is a PIE binary, we need to pick a base address. The kernel usually randomizes the base
address for security(Address space layout
randomization, or ASLR).
But we can just pick a safe spot or ask mmap to pick one for us.
Before we map the individual segments, we calculate the total memory span the program needs (min address to max address) and verify we can reserve that chunk1.
1// Find boundaries
2const minva, const maxva = bounds: {
3 var minva: u64 = std.math.maxInt(u64);
4 var maxva: u64 = 0;
5 var phdrs = header.iterateProgramHeaders(&file_reader);
6 while (try phdrs.next()) |phdr| {
7 if (phdr.p_type != elf.PT_LOAD) continue;
8 minva = @min(minva, phdr.p_vaddr);
9 maxva = @max(maxva, phdr.p_vaddr + phdr.p_memsz);
10 }
11 minva = mem.alignBackward(usize, minva, page_size);
12 maxva = mem.alignForward(usize, maxva, page_size);
13 break :bounds .{ minva, maxva };
14};
15
16// Check that the needed memory region can be allocated as a whole.
17const pic = header.type == elf.ET.DYN;
18const hint = if (pic) null else @as(?[*]align(page_size) u8, @ptrFromInt(minva));
19const base = try posix.mmap(
20 hint,
21 maxva - minva,
22 posix.PROT.READ | posix.PROT.WRITE,
23 .{ .TYPE = .PRIVATE, .ANONYMOUS = true, .FIXED_NOREPLACE = !pic },
24 -1,
25 0,
26);
27const entry = header.entry + if (pic) @intFromPtr(base.ptr) else 0;
We use mmap here with ANONYMOUS=true to simply reserve a big block of zeroed RAM.
If pic is false (Fixed binary), we pass the required address as a hint and enforce it.
If pic is true (PIE), we pass null and let the kernel decide where to put us.
Also note FIXED_NOREPLACE. This is a relatively new Linux flag that says "Try to map exactly here,
but if something is already there, fail instead of overwriting it." This prevents us from
accidentally stomping on our own loader's memory if addresses collide.
Mapping the Segments
Now we have a blank canvas (base). We iterate through the PT_LOAD segments again and actually
map the file content onto that canvas.
We do this by overwriting the anonymous memory we just reserved with new mappings directly from the file.
1// Map file
2var phdrs = header.iterateProgramHeaders(&file_reader);
3errdefer posix.munmap(base);
4while (try phdrs.next()) |phdr| {
5 if (phdr.p_type != elf.PT_LOAD) continue;
6 if (phdr.p_memsz == 0) continue;
7
8 const offset = phdr.p_vaddr & (page_size - 1);
9 const size = mem.alignForward(usize, phdr.p_memsz + offset, page_size);
10 var start = mem.alignBackward(usize, phdr.p_vaddr, page_size);
11 const base_for_dyn = if (pic) @intFromPtr(base.ptr) else 0;
12 start += base_for_dyn;
13 const ptr_base = @as([*]align(page_size) u8, @ptrFromInt(start));
14 const ptr = ptr_base[0..size];
15
16 try file_reader.seekTo(phdr.p_offset);
17 if (try file_reader.read(ptr[offset..][0..phdr.p_filesz]) != phdr.p_filesz)
18 return error.UnfinishedRead;
19 try posix.mprotect(ptr, elfToMmapProt(phdr.p_flags));
20}
There is some alignment math here. mmap only works on page boundaries (4KiB). However, ELF
segments often start at weird offsets like 0x1040. We have to alignBackward to find the start
of the page, map the whole page, and then trust that the file contents line up correctly (which the
ELF spec guarantees).
We read the data (file_reader.read) directly into the mapped memory. You might ask: "Why not use
mmap again to map the file directly?" You could! But reading allows us to handle the .bss
section cleanly. Remember from the last post that .bss takes up memory (memsz) but not file
space (filesz). By reading only filesz bytes into a buffer of size memsz (which was already
zeroed by our initial anonymous map), the remaining bytes stay zero. Perfect for .bss.
Finally, we call mprotect to set the correct permissions (Read, Write, Execute). We need a helper
for that because ELF flags don't match mmap flags 1:1:
1/// Converts ELF program header protection flags to mmap protection flags.
2fn elfToMmapProt(elf_prot: u64) u32 {
3 var result: u32 = posix.PROT.NONE;
4 if ((elf_prot & elf.PF_R) != 0) result |= posix.PROT.READ;
5 if ((elf_prot & elf.PF_W) != 0) result |= posix.PROT.WRITE;
6 if ((elf_prot & elf.PF_X) != 0) result |= posix.PROT.EXEC;
7 return result;
8}
The Stack Shuffle
The binary is in memory. Can we jump to it? No.
The Linux kernel provides arguments (argc, argv, envp) and the Auxiliary Vector (auxv) on
the stack. Currently, our loader's stack looks like this:
1[ argc=2 ] [ "loader" ] [ "target_bin" ] [ NULL ] [ env... ] [ auxv... ]
If we jump to the target now, it will think its name is "loader" and it has an argument
"target_bin". We need to shift the stack to remove the loader argument and also update the
Auxiliary Vector, such that the program can find itself.
First, let's fix the Aux Vector. Zig provides std.os.linux.elf_aux_maybe which points to the
vector the kernel gave us. We can edit it in place:
1var i: usize = 0;
2const auxv = std.os.linux.elf_aux_maybe.?;
3while (auxv[i].a_type != elf.AT_NULL) : (i += 1) {
4 auxv[i].a_un.a_val = switch (auxv[i].a_type) {
5 elf.AT_PHDR => @intFromPtr(base.ptr) + header.phoff,
6 elf.AT_PHENT => header.phentsize,
7 elf.AT_PHNUM => header.phnum,
8 elf.AT_ENTRY => entry,
9 elf.AT_EXECFN => @intFromPtr(std.os.argv[1]),
10 else => auxv[i].a_un.a_val,
11 };
12}
We update AT_PHDR (Program Headers), AT_ENTRY, and AT_EXECFN (filename) to point to the new
binary we just loaded.
Now, for the shuffle. We calculate the size of the entire block (from argv[1] all the way to the
end of the auxv array). Then we use memmove to slide that whole block "up" by one slot (8
bytes), overwriting the old argv[0].
1// The stack layout provided by the kernel is:
2// argc, argv..., NULL, envp..., NULL, auxv...
3// We need to shift this block of memory to remove the loader's own arguments before we jump to
4// the new executable.
5// The end of the block is one entry past the AT_NULL entry in auxv.
6const end_of_auxv = &auxv[i + 1];
7const dest_ptr = @as([*]u8, @ptrCast(std.os.argv.ptr));
8const src_ptr = @as([*]u8, @ptrCast(&std.os.argv[1]));
9const len = @intFromPtr(end_of_auxv) - @intFromPtr(src_ptr);
10@memmove(dest_ptr[0..len], src_ptr[0..len]);
11
12// `std.os.argv.ptr` points to the argv pointers. The word just before it is argc and also the
13// start of the stack.
14const argc: [*]usize = @as([*]usize, @ptrCast(@alignCast(&std.os.argv.ptr[0]))) - 1;
15argc[0] = std.os.argv.len - 1;
Finally, we decrement argc so the program knows it has one fewer argument.
The Jump
We are ready. We have the Entry Point (entry) and the clean Stack Pointer (argc).
We cannot simply call this as a function pointer. A Zig/C function compiles with a "prologue" that pushes things to the stack (like the return address). We need a "clean" jump where the stack pointer is exactly where we want it and registers are reset.
For that, we need assembly.
1// We can't just defer because we never return;
2file.close();
3
4trampoline(entry, argc);
5}
This tiny x86-64 assembly block puts our prepared stack pointer into rsp and then jumps straight
to the entry point. We mark it noreturn and unreachable because once we make that jump, we
aren't coming back.
1fn trampoline(entry: usize, sp: [*]usize) noreturn {
2 asm volatile (
3 \\ mov %[sp], %%rsp
4 \\ jmp *%[entry]
5 : // No outputs
6 : [entry] "r" (entry),
7 [sp] "r" (sp),
8 : .{ .rsp = true, .memory = true });
9 unreachable;
10}
Does it work?
Let's try compiling a static C hello world:
1#include <stdio.h>
2
3int main(int argc, char *argv[])
4{
5 printf("Hello World!\n");
6 return 0;
7}
Compile it with -static to ensure it doesn't need external libraries:
1gcc -static hello.c -o hello
Now run it with our loader:
1zig build-exe -fPIE loader.zig
2./loader hello
Output: Hello World!
It works! We have successfully replicated the OS loader in user space.
We can even run the zig compiler itself:
1./loader $(which zig) version
Which for me outputs 0.15.1.
However, if you try to run /bin/ls (which is likely dynamically linked):
1./loader /bin/ls
2Segmentation fault
Why? Because /bin/ls doesn't contain the code for printf. It expects an interpreter (the dynamic
linker) to be loaded alongside it to handle the wiring.
There is also a more subtle failure mode. Try running the Zig compiler itself:
1./loader $(which zig) build-exe -fPIE loader.zig
2error: unable to find zig installation directory '.../loader': FileNotFound
The loader successfully loaded and started Zig, but Zig crashed immediately. Why?
Complex tools often need to find their own "install location" to load standard libraries. They do
this by reading /proc/self/exe, which the kernel points to the running executable file. Since the
kernel started our loader, /proc/self/exe points to ./loader. Zig looks for its standard library
in the current directory, fails to find it, and panics.
Solving the Dynamic Linker issue, by finding the dynamic linker mentioned in the INTERP segment,
loading it as well, and letting it run first; and spoofing /proc/self/exe are exercises for the
reader ;). Or topics for a future post!.
If you are curious, the full source code for the loader (including the INTERP handling) is available here on my Git or for the Microsoft shills here is the same code on GitHub.