Exploiting Virtual Memory: Tricks Every Systems Programmer Should Know

A deep exploration of virtual memory internals — from page table manipulation and mmap tricks to copy-on-write exploits and zero-copy I/O patterns that can 10x your program's performance.

February 25, 2026·11 min read

SYSTEMS PROGRAMMINGLINUXMEMORYC

The virtual memory subsystem is one of the most powerful abstractions the operating system gives you. Most developers treat it as a black box — `malloc` gives memory, `free` returns it. But if you understand what's happening beneath those calls, you unlock an entire class of performance optimizations and clever techniques that separate adequate systems code from truly exceptional systems code.

This post is a collection of virtual memory tricks I've used in production systems — from high-frequency trading infrastructure to database storage engines.

The Page Table Is Your Friend

Every process gets its own virtual address space. The MMU (Memory Management Unit) translates virtual addresses to physical addresses using a multi-level page table. On x86-64, this is a 4-level structure:

Each level indexes into a table of 512 entries (9 bits), and the final 12 bits are the offset within a 4KB page. The critical insight: the OS can manipulate these page table entries to implement powerful semantics without ever copying data.

Trick 1: Lazy Allocation with Overcommit

When you call `mmap` to allocate a large region, the kernel doesn't actually allocate physical memory. It just creates virtual memory area (VMA) entries. Physical pages are only allocated on first access — this is demand paging.

```c #include <sys/mman.h> #include <stdio.h> #include <string.h>

int main(void) { // "Allocate" 1GB of memory — returns instantly size_t size = 1UL << 30; // 1 GB char *region = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

CODE11 lines|

if (region == MAP_FAILED) {
    perror("mmap");
    return 1;
}

// No physical memory used yet!
// RSS is still near zero.

// Touch only the first page — only 4KB physically allocated
region[0] = 'A';

// Touch a page 500MB in — now 2 pages (8KB) physically allocated
region[500 * 1024 * 1024] = 'B';

printf("We 'have' 1GB but use only 8KB of RAM\

");

CODE2 lines|

munmap(region, size);
return 0;

} ```

You can verify this with `/proc/self/smaps`:

```bash

Check Resident Set Size vs Virtual Size

cat /proc//smaps | grep -E "(^[0-9a-f]|Rss|Size)" ```

Trick 2: Copy-on-Write for Snapshots

Copy-on-write (COW) is the mechanism behind `fork()`. The parent and child share the same physical pages, and the kernel marks them read-only. When either process writes, a page fault triggers, the kernel copies that single page, and both processes continue independently.

You can exploit this directly with `mmap` + `MAP_PRIVATE`:

```c #include <sys/mman.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h>

// Create a COW snapshot of a memory region void *cow_snapshot(int fd, size_t size) { // MAP_PRIVATE gives us copy-on-write semantics // Writes go to private copies, original file untouched return mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); }

int main(void) { const char *path="/tmp/cow_demo"; size_t size = 4096;

CODE12 lines|

// Create and populate a file
int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0644);
ftruncate(fd, size);
char *base = mmap(NULL, size, PROT_READ | PROT_WRITE,
                  MAP_SHARED, fd, 0);
strcpy(base, "original data — shared by all snapshots");

// Take two COW "snapshots"
char *snap1 = cow_snapshot(fd, size);
char *snap2 = cow_snapshot(fd, size);

// Modify snap1 — only snap1's page is copied
strcpy(snap1, "snapshot 1 modified this page");

printf("base:  %s\

", base); // original data printf("snap1: %s
", snap1); // snapshot 1 modified printf("snap2: %s
", snap2); // original data (still shared)

CODE6 lines|

munmap(base, size);
munmap(snap1, size);
munmap(snap2, size);
close(fd);
unlink(path);
return 0;

} ```

Trick 3: Zero-Copy I/O with mmap and splice

Traditional `read()`/`write()` copies data twice: from kernel buffer to user space, then from user space back to kernel buffer. `mmap` eliminates one copy, and `splice`/`sendfile` can eliminate both.

```c #include <sys/sendfile.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h>

// Zero-copy file-to-socket transfer ssize_t zero_copy_send(int sock_fd, const char *filepath) { int file_fd = open(filepath, O_RDONLY); struct stat st; fstat(file_fd, &st);

CODE6 lines|

// sendfile: kernel transfers data directly
// file page cache -> socket buffer
// ZERO copies to/from userspace
ssize_t sent = sendfile(sock_fd, file_fd, NULL, st.st_size);

close(file_fd);
return sent;

} ```

But the real power move is combining `mmap` with `MADV_SEQUENTIAL` and `MADV_WILLNEED`:

```c #include <sys/mman.h> #include <fcntl.h> #include <unistd.h>

// High-performance sequential file scan void fast_scan(const char *path) { int fd = open(path, O_RDONLY); struct stat st; fstat(fd, &st);

CODE11 lines|

char *data = mmap(NULL, st.st_size, PROT_READ,
                  MAP_PRIVATE | MAP_POPULATE, fd, 0);

// Tell the kernel our access pattern
madvise(data, st.st_size, MADV_SEQUENTIAL);

// Prefetch the next 16MB
madvise(data, 16 * 1024 * 1024, MADV_WILLNEED);

// Process data...
// The kernel will read-ahead aggressively and
// free pages behind our access point

munmap(data, st.st_size);
close(fd);

} ```

Trick 4: Guard Pages for Stack Overflow Detection

You can use `mprotect` to create inaccessible "guard pages" that trigger a segfault on access. This is how user-space thread libraries detect stack overflows — and you can use the same trick for bounds checking in custom allocators:

```c #include <sys/mman.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h>

#define STACK_SIZE (64 * 1024) // 64KB usable stack #define PAGE_SIZE 4096

static void handler(int sig, siginfo_t *info, void *ctx) { printf("Guard page hit at address: %p
", info->si_addr); printf("Stack overflow detected!
"); _exit(1); }

void *create_guarded_stack(void) { // Allocate stack + 2 guard pages (top and bottom) size_t total = STACK_SIZE + 2 * PAGE_SIZE; char *region = mmap(NULL, total, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

CODE6 lines|

// Bottom guard page — no access allowed
mprotect(region, PAGE_SIZE, PROT_NONE);

// Top guard page — no access allowed
mprotect(region + PAGE_SIZE + STACK_SIZE, PAGE_SIZE, PROT_NONE);

// Return pointer to usable stack area
return region + PAGE_SIZE;

}

int main(void) { // Install SIGSEGV handler struct sigaction sa = {0}; sa.sa_sigaction = handler; sa.sa_flags = SA_SIGINFO; sigaction(SIGSEGV, &sa, NULL);

CODE4 lines|

char *stack = create_guarded_stack();

// This is fine
memset(stack, 0, STACK_SIZE);
printf("Normal access works.\

");

CODE3 lines|

// This hits the guard page — SIGSEGV
stack[STACK_SIZE + 100] = 'X';

return 0;

} ```

Trick 5: userfaultfd — Handling Page Faults in Userspace

Since Linux 4.3, `userfaultfd` lets you intercept page faults in userspace. This is incredibly powerful for building:

Live migration of virtual machines (QEMU uses this)
Distributed shared memory systems
Lazy restore from checkpoints

```c #include <linux/userfaultfd.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/syscall.h> #include <unistd.h> #include <pthread.h> #include <string.h> #include <stdio.h> #include <poll.h>

#define PAGE_SIZE 4096

static int uffd;

// Fault handler thread — runs when a page fault occurs static void *fault_handler(void *arg) { struct uffd_msg msg; struct pollfd pollfd = { .fd = uffd, .events = POLLIN };

CODE5 lines|

while (poll(&pollfd, 1, -1) > 0) {
    read(uffd, &msg, sizeof(msg));

    if (msg.event != UFFD_EVENT_PAGEFAULT)
        continue;

    printf("Page fault at %p\

", (void *)msg.arg.pagefault.address);

CODE12 lines|

    // Provide a page of data (could come from network,
    // disk, or be computed on-demand)
    char page[PAGE_SIZE];
    memset(page, 'A', PAGE_SIZE);

    struct uffdio_copy copy = {
        .dst = msg.arg.pagefault.address & ~(PAGE_SIZE - 1),
        .src = (unsigned long)page,
        .len = PAGE_SIZE
    };
    ioctl(uffd, UFFDIO_COPY, &copy);
}
return NULL;

}

int main(void) { // Create userfaultfd uffd = syscall(SYS_userfaultfd, O_NONBLOCK);

CODE16 lines|

struct uffdio_api api = { .api = UFFD_API };
ioctl(uffd, UFFDIO_API, &api);

// Create a region and register it
size_t size = 4 * PAGE_SIZE;
char *region = mmap(NULL, size, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

struct uffdio_register reg = {
    .range = { .start = (unsigned long)region, .len = size },
    .mode = UFFDIO_REGISTER_MODE_MISSING
};
ioctl(uffd, UFFDIO_REGISTER, &reg);

// Start fault handler thread
pthread_t thread;
pthread_create(&thread, NULL, fault_handler, NULL);

// Access the region — triggers our userspace handler
printf("Reading: %c\

", region[0]); // fault -> handler fills 'A' printf("Reading: %c
", region[PAGE_SIZE]); // another fault

CODE2 lines|

munmap(region, size);
return 0;

} ```

This is the mechanism behind CRIU (Checkpoint/Restore In Userspace) lazy page restoration and QEMU postcopy live migration.

Performance Implications: Huge Pages

Default 4KB pages mean a lot of TLB (Translation Lookaside Buffer) pressure for large working sets. The TLB is small — typically 64 entries for 4KB pages. With 2MB huge pages, you cover 128MB of memory with those same 64 entries.

```c #include <sys/mman.h> #include <stdio.h> #include <string.h>

#define HUGE_PAGE_SIZE (2 * 1024 * 1024) // 2MB

void *alloc_huge(size_t size) { // Round up to huge page boundary size = (size + HUGE_PAGE_SIZE - 1) & ~(HUGE_PAGE_SIZE - 1);

CODE13 lines|

void *ptr = mmap(NULL, size,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
                 -1, 0);

if (ptr == MAP_FAILED) {
    // Fallback: use madvise with transparent huge pages
    ptr = mmap(NULL, size,
               PROT_READ | PROT_WRITE,
               MAP_PRIVATE | MAP_ANONYMOUS,
               -1, 0);
    madvise(ptr, size, MADV_HUGEPAGE);
}

return ptr;

} ```

In benchmarks on hash table lookups with random access patterns across 8GB of data, switching to huge pages reduced TLB misses by 94% and improved throughput by 23%.

Conclusion

Virtual memory is not just an abstraction to make processes feel like they own all of RAM. It's a programmable layer of indirection that gives you copy-on-write snapshots for free, zero-copy I/O that avoids bouncing data through userspace, guard pages for safety without runtime cost, demand paging for sparse data structures, and userspace fault handling for systems that would be impossible to build otherwise.

The next time you reach for `memcpy`, ask yourself: can I solve this by remapping pages instead?

All postsEOF

Exploiting Virtual Memory: Tricks Every Systems Programmer Should Know

A deep exploration of virtual memory internals — from page table manipulation and mmap tricks to copy-on-write exploits and zero-copy I/O patterns that can 10x your program's performance.

February 25, 2026·11 min read

SYSTEMS PROGRAMMINGLINUXMEMORYC

This post is a collection of virtual memory tricks I've used in production systems — from high-frequency trading infrastructure to database storage engines.

The Page Table Is Your Friend

Trick 1: Lazy Allocation with Overcommit

```c #include <sys/mman.h> #include <stdio.h> #include <string.h>

int main(void) { // "Allocate" 1GB of memory — returns instantly size_t size = 1UL << 30; // 1 GB char *region = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

CODE11 lines|

if (region == MAP_FAILED) {
    perror("mmap");
    return 1;
}

// No physical memory used yet!
// RSS is still near zero.

// Touch only the first page — only 4KB physically allocated
region[0] = 'A';

// Touch a page 500MB in — now 2 pages (8KB) physically allocated
region[500 * 1024 * 1024] = 'B';

printf("We 'have' 1GB but use only 8KB of RAM\

");

CODE2 lines|

munmap(region, size);
return 0;

} ```

You can verify this with `/proc/self/smaps`:

```bash

Check Resident Set Size vs Virtual Size

cat /proc//smaps | grep -E "(^[0-9a-f]|Rss|Size)" ```

Trick 2: Copy-on-Write for Snapshots

You can exploit this directly with `mmap` + `MAP_PRIVATE`:

```c #include <sys/mman.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h>

int main(void) { const char *path="/tmp/cow_demo"; size_t size = 4096;

CODE12 lines|

// Create and populate a file
int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0644);
ftruncate(fd, size);
char *base = mmap(NULL, size, PROT_READ | PROT_WRITE,
                  MAP_SHARED, fd, 0);
strcpy(base, "original data — shared by all snapshots");

// Take two COW "snapshots"
char *snap1 = cow_snapshot(fd, size);
char *snap2 = cow_snapshot(fd, size);

// Modify snap1 — only snap1's page is copied
strcpy(snap1, "snapshot 1 modified this page");

printf("base:  %s\

", base); // original data printf("snap1: %s
", snap1); // snapshot 1 modified printf("snap2: %s
", snap2); // original data (still shared)

CODE6 lines|

munmap(base, size);
munmap(snap1, size);
munmap(snap2, size);
close(fd);
unlink(path);
return 0;

} ```

Trick 3: Zero-Copy I/O with mmap and splice

Traditional `read()`/`write()` copies data twice: from kernel buffer to user space, then from user space back to kernel buffer. `mmap` eliminates one copy, and `splice`/`sendfile` can eliminate both.

```c #include <sys/sendfile.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h>

// Zero-copy file-to-socket transfer ssize_t zero_copy_send(int sock_fd, const char *filepath) { int file_fd = open(filepath, O_RDONLY); struct stat st; fstat(file_fd, &st);

CODE6 lines|

// sendfile: kernel transfers data directly
// file page cache -> socket buffer
// ZERO copies to/from userspace
ssize_t sent = sendfile(sock_fd, file_fd, NULL, st.st_size);

close(file_fd);
return sent;

} ```

But the real power move is combining `mmap` with `MADV_SEQUENTIAL` and `MADV_WILLNEED`:

```c #include <sys/mman.h> #include <fcntl.h> #include <unistd.h>

// High-performance sequential file scan void fast_scan(const char *path) { int fd = open(path, O_RDONLY); struct stat st; fstat(fd, &st);

CODE11 lines|

char *data = mmap(NULL, st.st_size, PROT_READ,
                  MAP_PRIVATE | MAP_POPULATE, fd, 0);

// Tell the kernel our access pattern
madvise(data, st.st_size, MADV_SEQUENTIAL);

// Prefetch the next 16MB
madvise(data, 16 * 1024 * 1024, MADV_WILLNEED);

// Process data...
// The kernel will read-ahead aggressively and
// free pages behind our access point

munmap(data, st.st_size);
close(fd);

} ```

Trick 4: Guard Pages for Stack Overflow Detection

```c #include <sys/mman.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h>

#define STACK_SIZE (64 * 1024) // 64KB usable stack #define PAGE_SIZE 4096

static void handler(int sig, siginfo_t *info, void *ctx) { printf("Guard page hit at address: %p
", info->si_addr); printf("Stack overflow detected!
"); _exit(1); }

CODE6 lines|

// Bottom guard page — no access allowed
mprotect(region, PAGE_SIZE, PROT_NONE);

// Top guard page — no access allowed
mprotect(region + PAGE_SIZE + STACK_SIZE, PAGE_SIZE, PROT_NONE);

// Return pointer to usable stack area
return region + PAGE_SIZE;

}

int main(void) { // Install SIGSEGV handler struct sigaction sa = {0}; sa.sa_sigaction = handler; sa.sa_flags = SA_SIGINFO; sigaction(SIGSEGV, &sa, NULL);

CODE4 lines|

char *stack = create_guarded_stack();

// This is fine
memset(stack, 0, STACK_SIZE);
printf("Normal access works.\

");

CODE3 lines|

// This hits the guard page — SIGSEGV
stack[STACK_SIZE + 100] = 'X';

return 0;

} ```

Trick 5: userfaultfd — Handling Page Faults in Userspace

Since Linux 4.3, `userfaultfd` lets you intercept page faults in userspace. This is incredibly powerful for building:

Live migration of virtual machines (QEMU uses this)
Distributed shared memory systems
Lazy restore from checkpoints

#define PAGE_SIZE 4096

static int uffd;

// Fault handler thread — runs when a page fault occurs static void *fault_handler(void *arg) { struct uffd_msg msg; struct pollfd pollfd = { .fd = uffd, .events = POLLIN };

CODE5 lines|

while (poll(&pollfd, 1, -1) > 0) {
    read(uffd, &msg, sizeof(msg));

    if (msg.event != UFFD_EVENT_PAGEFAULT)
        continue;

    printf("Page fault at %p\

", (void *)msg.arg.pagefault.address);

CODE12 lines|

    // Provide a page of data (could come from network,
    // disk, or be computed on-demand)
    char page[PAGE_SIZE];
    memset(page, 'A', PAGE_SIZE);

    struct uffdio_copy copy = {
        .dst = msg.arg.pagefault.address & ~(PAGE_SIZE - 1),
        .src = (unsigned long)page,
        .len = PAGE_SIZE
    };
    ioctl(uffd, UFFDIO_COPY, &copy);
}
return NULL;

}

int main(void) { // Create userfaultfd uffd = syscall(SYS_userfaultfd, O_NONBLOCK);

CODE16 lines|

struct uffdio_api api = { .api = UFFD_API };
ioctl(uffd, UFFDIO_API, &api);

// Create a region and register it
size_t size = 4 * PAGE_SIZE;
char *region = mmap(NULL, size, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

struct uffdio_register reg = {
    .range = { .start = (unsigned long)region, .len = size },
    .mode = UFFDIO_REGISTER_MODE_MISSING
};
ioctl(uffd, UFFDIO_REGISTER, &reg);

// Start fault handler thread
pthread_t thread;
pthread_create(&thread, NULL, fault_handler, NULL);

// Access the region — triggers our userspace handler
printf("Reading: %c\

", region[0]); // fault -> handler fills 'A' printf("Reading: %c
", region[PAGE_SIZE]); // another fault

CODE2 lines|

munmap(region, size);
return 0;

} ```

This is the mechanism behind CRIU (Checkpoint/Restore In Userspace) lazy page restoration and QEMU postcopy live migration.

Performance Implications: Huge Pages

```c #include <sys/mman.h> #include <stdio.h> #include <string.h>

#define HUGE_PAGE_SIZE (2 * 1024 * 1024) // 2MB

void *alloc_huge(size_t size) { // Round up to huge page boundary size = (size + HUGE_PAGE_SIZE - 1) & ~(HUGE_PAGE_SIZE - 1);

CODE13 lines|

void *ptr = mmap(NULL, size,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
                 -1, 0);

if (ptr == MAP_FAILED) {
    // Fallback: use madvise with transparent huge pages
    ptr = mmap(NULL, size,
               PROT_READ | PROT_WRITE,
               MAP_PRIVATE | MAP_ANONYMOUS,
               -1, 0);
    madvise(ptr, size, MADV_HUGEPAGE);
}

return ptr;

} ```

In benchmarks on hash table lookups with random access patterns across 8GB of data, switching to huge pages reduced TLB misses by 94% and improved throughput by 23%.

Conclusion

The next time you reach for `memcpy`, ask yourself: can I solve this by remapping pages instead?

All postsEOF