What happens before `main`?

2025-11-14

You may ask: "What do you mean? Nothing happens before main, it is the entry point". But how does main get its arguments? Where do argc and argv "live"? Today this is a quite loaded question (pun intended), and the answer requires a dive into linking, object formats, position-independent code, and shared libraries.

To understand why things are the way they are, we just need to look at history.

The "One Program" Era

In the very olden days, where computers only supported running one program at a time, loading was quite simple.

When a CPU resets, it doesn't magically know about files. It just sets its Instruction Pointer to a hardcoded physical address (like 0xFFFFFFF0 on x86) and starts chugging along executing code. But for an Operating System (like CP/M or early DOS) loading a user program, the contract was simple: "I will read your binary into RAM at physical address 0x100 and jump to it."

If you had an esoteric tool called an assembler, you just told it "Origin: 0x100". The assembler would calculate all your memory addresses based on that starting point. If you wanted to jump to a function 50 bytes into the code, the assembler generated a jump to absolute address 0x132. Simple. Fast.

Running Multiple Programs

But then people had the audacious idea of running multiple programs at the same time. This broke everything.

Imagine two programs, A and B. Both were compiled assuming they would live at address 0x100. You load Program A. It sits happily at 0x100. Now you want to load Program B. You can't load it at 0x100 because two things cannot occupy the same space at the same time¹.

So... Can't we just load Program B at 0x200 instead and move on with our lives? Unfortunately, no.

The problem is Absolute Addressing.

If Program B has an instruction like JMP 0x150 (jump to a function), and we load the program at 0x200, that instruction is still JMP 0x150. The CPU will jump into the middle of Program A (or garbage memory), and everything explodes.

The obvious solution seems to be: "Why don't we just make every instruction relative?" Instead of "jump to 0x150", say "jump 50 bytes forward". While this works for jumps (IP relative addressing), it was historically impossible for data.

On older architectures (and even 32-bit x86), you could not easily say "load the value into EAX from the address IP + 50". The CPU simply didn't have an instruction for that. You had to give it an absolute address for global variables.

Even if the hardware supported it, Position Independent Code (PIC) comes with a performance penalty. It requires indirection tables and extra math for every global variable access, burning precious registers and cycles. Which, back then, were more important than today.

So, we needed a way to fix the addresses.

Solution 1: Load-Time Relocation

Since we can't change the hardware, we have to change the software. This gave birth to Relocation.

The idea is simple: The compiler/linker leaves a "To-Do List" in the binary. It says: "Hey Loader, I assumed I would be loaded at 0x0, but if you load me somewhere else, please add the difference to the values at offset 0x10, 0x18, and 0x42."

This means that when the OS loads the program at 0x200:

It copies the code to 0x200.
It reads the To-Do list.
It patches the binary in memory, changing JMP 0x150 to JMP 0x350.

This works! But it has two massive downsides: (1) Slow Loading: The loader has to modify potentially millions of instructions before the program can start; (2) No Sharing: If you run two instances of text_editor.exe, they need to be loaded at different addresses. The loader patches them differently. This means the code in RAM is different, so the OS cannot share the physical RAM pages between the two processes. You waste memory.

Solution 2: Virtual Memory

Hardware engineers eventually took pity on software engineers and invented the Memory Management Unit (MMU).

With Virtual Memory, every process gets its own personal sandbox. Program A thinks it lives at 0x400000. Program B also thinks it lives at 0x400000. The MMU secretly maps them to different physical RAM addresses.

This basically solved the relocation problem for executable files. Linkers could go back to the "good old days" of linking for a fixed address (usually 0x400000 on Windows or 0x08048000 on Linux), knowing the OS would make that address available virtually.

The Return of Relocation

We still have a problem. We want to share code.

Nearly every program uses printf. It would be a waste to include the code for printf in every single binary on your disk. It makes more sense to put it in a Shared Library (like libc.so or kernel32.dll).

But here is the catch: We want libc to be loaded once into physical RAM and shared by everyone. But Process A might have mapped its executable at address 0x1000, and Process B might have mapped a giant image at 0x1000. We cannot guarantee that libc can sit at the same virtual address in every process.

So, libraries must be relocatable. They must be able to run no matter where they are dropped in memory.

This brought back the need for Position Independent Code and relocation schemes (GOT/PLT), which is exactly what makes writing a dynamic linker such a nightmare. But that'll wait for part 4.

The Lie about `main`

So, we have established that memory is virtual, libraries are shared, and code must be relocatable. But to answer the opening question: What happens before main?

Technically, main is just a function. The operating system doesn't know about it, and it certainly doesn't call it.

When the loader is done setting up the memory (the stack, the heap, the mapped libraries), it transfers control to a specific address defined in the executable's header. In Linux/ELF, this is usually a symbol called _start.

_start is a small piece of assembly code (usually provided by the C runtime) that does the final housekeeping. It looks at the stack where the kernel dutifully placed argc, argv, and the environment variables, organizes them into the format C expects, and then calls main².

But how does the kernel know where _start is located inside the file? How does it know which parts of the file are code and which are data? And how does it know which shared libraries to load?

It needs a map. A standard format that describes the binary.

In the next post, we will look at that map: Object Files and the ELF format.

Physics nerds shut up about Bosons and Fermions! ↩︎
For a very readable implementation of this, look at the Zig standard library's start.zig. Specifically, the posixCallMainAndExit function shows exactly how the arguments are pulled from the stack and passed to the user's main function. ↩︎