Recurse Center lab notes 2015-06-24: playing with mmap
Today I wrote some code! None to show, just experiments. I’m investigating
mmap
and its interactions with mprotect
and ftruncate
. I got some good
responses to yesterday’s email thinking about approaches to a memory mapped
message builder for Cap’n Proto. In particular a message from Paul
Pelzl expanded on my memory protection / SIGSEGV
-based idea
and suggested playing tricks with MAP_FIXED
to repeatedly remap the file as
it grows.
I started doing some investigation to see what happens when you mix mmap
,
mprotect
, and ftruncate
. Here’s some stuff I found out:
- it is possible to create a non-zero length mapping with
mmap
on a zero-length file, eg, one just created bytouch
for the purpose - on Linux, attempts to access any of the memory result in
SIGBUS
- a
SIGBUS
handler can callftruncate
to extend the backing file; the access will then succeed!
This suggests an even simpler approach than what emerged on the mailing list:
mmap
a huge amount of address space backed by the target file, which starts off empty- allocate an initial segment of 4GB—the maximum in Cap’n Proto’s encoding
- have a
SIGBUS
handler callftruncate
to extend the file whenever an attempt is made to reach beyond the end
An initial page can be set aside for a segment table to allow compatibility with the existing message readers. On closing of the file, the segment size can be written in at whatever length was actually set aside on disk. There would likely be an opportunity to shrink the segment down so it only occupies as much space as was actually used; in this case the file could be truncated to that size.
This approach has a few nice features:
- it’s compatible with the existing flat array and stream based message readers
- it does not require sparse file support
- only one system call needs to be made in the signal handler
I still need to do a bit more research. In particular, I’m wondering:
- will this work on non-Linux systems? The Linux
mmap
manpage mentions theSIGBUS
behaviour, as does the NetBSD one, but the FreeBSD and OS X manpages do not; OpenBSD’s manpage says it will give aSIGSEGV
instead, and helpfully points out that POSIX says this situation should be aSIGBUS
. - how do I keep track of which regions belong to which files? The signal handler has the address that called the fault, but I’ll have to have some way to look it up.
- how should the filis file lookup structure interact with threads? Is it a global table for which we incur some synchronisation cost? Is it a thread-local table, and we require that a message only be written to by the creating thread?
I’m hoping to answer at least the first question tomorrow, mostly for curiosity’s sake. I’m happy to be Linux-only for now; I’m sure that if this doesn’t work on other platforms, one of the other approaches will. After testing this on someone’s Mac, I can start on a proof of concept message builder, which will hopefully inform the design more.