I went back to mmap today. I am finally attempting to put together what I picked up from experimenting in the last while towards writing a memory mapped file message builder for Cap’n Proto. Current status:

  • I’ve created a new class with a constructor MmapMessageBuilder(int fd) that maps the file descriptor
  • Things mostly magically work because I’m supplying an alternate implementation of an interface
  • I am not yet handling the resizing of the backing file in any way
  • I am not yet including a segment table at the front of the file

The last one means I can’t get the files read back in with the existing deserializers. I was totally able to see it in the bytes though!

00000000  00 00 00 00 00 00 01 00  01 00 00 00 57 00 00 00  |............W...|
00000010  08 00 00 00 01 00 04 00  7b 00 00 00 02 00 00 00  |........{.......|
00000020  21 00 00 00 32 00 00 00  21 00 00 00 92 00 00 00  |!...2...!.......|
00000030  29 00 00 00 17 00 00 00  39 00 00 00 22 00 00 00  |).......9..."...|
00000040  c8 01 00 00 00 00 00 00  35 00 00 00 22 00 00 00  |........5..."...|
00000050  35 00 00 00 82 00 00 00  39 00 00 00 27 00 00 00  |5.......9...'...|
00000060  00 00 00 00 00 00 00 00  41 6c 69 63 65 00 00 00  |........Alice...|
00000070  61 6c 69 63 65 40 65 78  61 6d 70 6c 65 2e 63 6f  |alice@example.co|
00000080  6d 00 00 00 00 00 00 00  04 00 00 00 01 00 01 00  |m...............|
00000090  00 00 00 00 00 00 00 00  01 00 00 00 4a 00 00 00  |............J...|
000000a0  35 35 35 2d 31 32 31 32  00 00 00 00 00 00 00 00  |555-1212........|
000000b0  4d 49 54 00 00 00 00 00  42 6f 62 00 00 00 00 00  |MIT.....Bob.....|
000000c0  62 6f 62 40 65 78 61 6d  70 6c 65 2e 63 6f 6d 00  |bob@example.com.|
000000d0  08 00 00 00 01 00 01 00  01 00 00 00 00 00 00 00  |................|
000000e0  09 00 00 00 4a 00 00 00  02 00 00 00 00 00 00 00  |....J...........|
000000f0  09 00 00 00 4a 00 00 00  35 35 35 2d 34 35 36 37  |....J...555-4567|
00000100  00 00 00 00 00 00 00 00  35 35 35 2d 37 36 35 34  |........555-7654|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Writing out the segment table mostly requires trawling through someone else’s code base. The Cap’n Proto code is fairly clear and well documented, especially the user interfaces. Some key innards are not as well documented, however. In particular, the arena abstraction used to look after the allocated segments could do with a short overview. I’m most of the way to understanding how it fits together, but it’s been slower than I would have liked.

The process of understanding the code has been slowed by needing to learn the kj library that Cap’n Proto makes heavy use of. I’m trying to find out the boundaries of what kj aims to be, but so far it looks to include:

  • replacements for some standard library classes, eg, kj::Own is used where you might use std::unique_ptr, and there is a separate stream abstraction that is used instead of the standard one
  • an event loop—the C++ standard library still has nothing here; it may in 2017 though!
  • some templated array classes that seem to be a pointer-length pair; there is some macro magic to allow allocating such an array on the stack

An overview of this library would also be really nice to have. My life would be a bit easier if this was all STL stuff.

All this frustration has me trying to use Google’s Kythe code indexing project on Cap’n Proto. This in turn has me messing around with their Bazel build tool. Both of these tools are up there on my not-while-at-RC list, but I’m giving myself a very short pass to see if I can port the build over. Building these tools is pretty slow going…