Pascal Zittlau


Writing a user-space loader in zig


In Part 2, we looked at the map. We learned that an ELF file describes Segments that need to be loaded into memory.

Now, we are going to write the code to do it.

We will write a simple user-space loader in Zig. Why Zig? Because it makes manual memory management and alignment math explicit and relatively painless, which is exactly what we need when messing with page tables.

The Skeleton

Our loader is a program that takes one argument: the path to the binary we want to run.

 1const std = @import("std");
 2const mem = std.mem;
 3const elf = std.elf;
 4const posix = std.posix;
 5const page_size = 4096;
 6
 7pub fn main() !void {
 8    // Parse arguments to find the target binary
 9    if (std.os.argv.len < 2) return error.NoInput;
10    const filename = std.os.argv[1];
11
12    const fd = try posix.open(mem.sliceTo(filename, 0), .{}, 0);
13    const file = std.fs.File{ .handle = fd };
14    var file_buffer: [page_size]u8 = undefined;
15    var file_reader = file.reader(&file_buffer);
16    const header = try elf.Header.read(&file_reader.interface);
17
18    // TODO: the actual loading ...
19}

Zig's standard library already has an ELF parser, which saves us from manually unpacking bytes. Once we have the header, we can move to the hard part.

The Address Space

Where do we load the code? It depends on the type of the executable which can be found by inspecting header.e_type. This can be one of NONE, REL, EXEC, DYN, or CORE.

For our purposes only two matter: EXEC, for fixed binaries that have a hardcoded address where they must be loaded; and DYN, for position independent code that use relative addressing. We can load them anywhere we want.

If it is a PIE binary, we need to pick a base address. The kernel usually randomizes the base address for security(Address space layout randomization, or ASLR). But we can just pick a safe spot or ask mmap to pick one for us.

Before we map the individual segments, we calculate the total memory span the program needs (min address to max address) and verify we can reserve that chunk1.

 1// Find boundaries
 2const minva, const maxva = bounds: {
 3    var minva: u64 = std.math.maxInt(u64);
 4    var maxva: u64 = 0;
 5    var phdrs = header.iterateProgramHeaders(&file_reader);
 6    while (try phdrs.next()) |phdr| {
 7        if (phdr.p_type != elf.PT_LOAD) continue;
 8        minva = @min(minva, phdr.p_vaddr);
 9        maxva = @max(maxva, phdr.p_vaddr + phdr.p_memsz);
10    }
11    minva = mem.alignBackward(usize, minva, page_size);
12    maxva = mem.alignForward(usize, maxva, page_size);
13    break :bounds .{ minva, maxva };
14};
15
16// Check that the needed memory region can be allocated as a whole.
17const pic = header.type == elf.ET.DYN;
18const hint = if (pic) null else @as(?[*]align(page_size) u8, @ptrFromInt(minva));
19const base = try posix.mmap(
20    hint,
21    maxva - minva,
22    posix.PROT.READ | posix.PROT.WRITE,
23    .{ .TYPE = .PRIVATE, .ANONYMOUS = true, .FIXED_NOREPLACE = !pic },
24    -1,
25    0,
26);
27const entry = header.entry + if (pic) @intFromPtr(base.ptr) else 0;

We use mmap here with ANONYMOUS=true to simply reserve a big block of zeroed RAM.

If pic is false (Fixed binary), we pass the required address as a hint and enforce it. If pic is true (PIE), we pass null and let the kernel decide where to put us.

Also note FIXED_NOREPLACE. This is a relatively new Linux flag that says "Try to map exactly here, but if something is already there, fail instead of overwriting it." This prevents us from accidentally stomping on our own loader's memory if addresses collide.

Mapping the Segments

Now we have a blank canvas (base). We iterate through the PT_LOAD segments again and actually map the file content onto that canvas.

We do this by overwriting the anonymous memory we just reserved with new mappings directly from the file.

 1// Map file
 2var phdrs = header.iterateProgramHeaders(&file_reader);
 3errdefer posix.munmap(base);
 4while (try phdrs.next()) |phdr| {
 5    if (phdr.p_type != elf.PT_LOAD) continue;
 6    if (phdr.p_memsz == 0) continue;
 7
 8    const offset = phdr.p_vaddr & (page_size - 1);
 9    const size = mem.alignForward(usize, phdr.p_memsz + offset, page_size);
10    var start = mem.alignBackward(usize, phdr.p_vaddr, page_size);
11    const base_for_dyn = if (pic) @intFromPtr(base.ptr) else 0;
12    start += base_for_dyn;
13    const ptr_base = @as([*]align(page_size) u8, @ptrFromInt(start));
14    const ptr = ptr_base[0..size];
15
16    try file_reader.seekTo(phdr.p_offset);
17    if (try file_reader.read(ptr[offset..][0..phdr.p_filesz]) != phdr.p_filesz)
18        return error.UnfinishedRead;
19    try posix.mprotect(ptr, elfToMmapProt(phdr.p_flags));
20}

There is some alignment math here. mmap only works on page boundaries (4KiB). However, ELF segments often start at weird offsets like 0x1040. We have to alignBackward to find the start of the page, map the whole page, and then trust that the file contents line up correctly (which the ELF spec guarantees).

We read the data (file_reader.read) directly into the mapped memory. You might ask: "Why not use mmap again to map the file directly?" You could! But reading allows us to handle the .bss section cleanly. Remember from the last post that .bss takes up memory (memsz) but not file space (filesz). By reading only filesz bytes into a buffer of size memsz (which was already zeroed by our initial anonymous map), the remaining bytes stay zero. Perfect for .bss.

Finally, we call mprotect to set the correct permissions (Read, Write, Execute). We need a helper for that because ELF flags don't match mmap flags 1:1:

1/// Converts ELF program header protection flags to mmap protection flags.
2fn elfToMmapProt(elf_prot: u64) u32 {
3    var result: u32 = posix.PROT.NONE;
4    if ((elf_prot & elf.PF_R) != 0) result |= posix.PROT.READ;
5    if ((elf_prot & elf.PF_W) != 0) result |= posix.PROT.WRITE;
6    if ((elf_prot & elf.PF_X) != 0) result |= posix.PROT.EXEC;
7    return result;
8}

The Stack Shuffle

The binary is in memory. Can we jump to it? No.

The Linux kernel provides arguments (argc, argv, envp) and the Auxiliary Vector (auxv) on the stack. Currently, our loader's stack looks like this:

1[ argc=2 ] [ "loader" ] [ "target_bin" ] [ NULL ] [ env... ] [ auxv... ]

If we jump to the target now, it will think its name is "loader" and it has an argument "target_bin". We need to shift the stack to remove the loader argument and also update the Auxiliary Vector, such that the program can find itself.

First, let's fix the Aux Vector. Zig provides std.os.linux.elf_aux_maybe which points to the vector the kernel gave us. We can edit it in place:

 1var i: usize = 0;
 2const auxv = std.os.linux.elf_aux_maybe.?;
 3while (auxv[i].a_type != elf.AT_NULL) : (i += 1) {
 4    auxv[i].a_un.a_val = switch (auxv[i].a_type) {
 5        elf.AT_PHDR => @intFromPtr(base.ptr) + header.phoff,
 6        elf.AT_PHENT => header.phentsize,
 7        elf.AT_PHNUM => header.phnum,
 8        elf.AT_ENTRY => entry,
 9        elf.AT_EXECFN => @intFromPtr(std.os.argv[1]),
10        else => auxv[i].a_un.a_val,
11    };
12}

We update AT_PHDR (Program Headers), AT_ENTRY, and AT_EXECFN (filename) to point to the new binary we just loaded.

Now, for the shuffle. We calculate the size of the entire block (from argv[1] all the way to the end of the auxv array). Then we use memmove to slide that whole block "up" by one slot (8 bytes), overwriting the old argv[0].

 1// The stack layout provided by the kernel is:
 2// argc, argv..., NULL, envp..., NULL, auxv...
 3// We need to shift this block of memory to remove the loader's own arguments before we jump to
 4// the new executable.
 5// The end of the block is one entry past the AT_NULL entry in auxv.
 6const end_of_auxv = &auxv[i + 1];
 7const dest_ptr = @as([*]u8, @ptrCast(std.os.argv.ptr));
 8const src_ptr = @as([*]u8, @ptrCast(&std.os.argv[1]));
 9const len = @intFromPtr(end_of_auxv) - @intFromPtr(src_ptr);
10@memmove(dest_ptr[0..len], src_ptr[0..len]);
11
12// `std.os.argv.ptr` points to the argv pointers. The word just before it is argc and also the
13// start of the stack.
14const argc: [*]usize = @as([*]usize, @ptrCast(@alignCast(&std.os.argv.ptr[0]))) - 1;
15argc[0] = std.os.argv.len - 1;

Finally, we decrement argc so the program knows it has one fewer argument.

The Jump

We are ready. We have the Entry Point (entry) and the clean Stack Pointer (argc).

We cannot simply call this as a function pointer. A Zig/C function compiles with a "prologue" that pushes things to the stack (like the return address). We need a "clean" jump where the stack pointer is exactly where we want it and registers are reset.

For that, we need assembly.

1// We can't just defer because we never return;
2file.close();
3
4trampoline(entry, argc);
5}

This tiny x86-64 assembly block puts our prepared stack pointer into rsp and then jumps straight to the entry point. We mark it noreturn and unreachable because once we make that jump, we aren't coming back.

 1fn trampoline(entry: usize, sp: [*]usize) noreturn {
 2    asm volatile (
 3        \\ mov %[sp], %%rsp
 4        \\ jmp *%[entry]
 5        : // No outputs
 6        : [entry] "r" (entry),
 7          [sp] "r" (sp),
 8        : .{ .rsp = true, .memory = true });
 9    unreachable;
10}

Does it work?

Let's try compiling a static C hello world:

1#include <stdio.h>
2
3int main(int argc, char *argv[])
4{
5    printf("Hello World!\n");
6    return 0;
7}

Compile it with -static to ensure it doesn't need external libraries:

1gcc -static hello.c -o hello

Now run it with our loader:

1zig build-exe -fPIE loader.zig
2./loader hello

Output: Hello World!

It works! We have successfully replicated the OS loader in user space.

We can even run the zig compiler itself:

1./loader $(which zig) version

Which for me outputs 0.15.1.

However, if you try to run /bin/ls (which is likely dynamically linked):

1./loader /bin/ls
2Segmentation fault

Why? Because /bin/ls doesn't contain the code for printf. It expects an interpreter (the dynamic linker) to be loaded alongside it to handle the wiring.

There is also a more subtle failure mode. Try running the Zig compiler itself:

1./loader $(which zig) build-exe -fPIE loader.zig
2error: unable to find zig installation directory '.../loader': FileNotFound

The loader successfully loaded and started Zig, but Zig crashed immediately. Why?

Complex tools often need to find their own "install location" to load standard libraries. They do this by reading /proc/self/exe, which the kernel points to the running executable file. Since the kernel started our loader, /proc/self/exe points to ./loader. Zig looks for its standard library in the current directory, fails to find it, and panics.

Solving the Dynamic Linker issue, by finding the dynamic linker mentioned in the INTERP segment, loading it as well, and letting it run first; and spoofing /proc/self/exe are exercises for the reader ;). Or topics for a future post!.

If you are curious, the full source code for the loader (including the INTERP handling) is available here on my Git or for the Microsoft shills here is the same code on GitHub.


  1. In theory we could also just look at the first and last loadable segment because the ELF spec mandates these to be in ascending order of p_vaddr, but better be safe than sorry. ↩︎