Using strace to figure out how git push over SSH works
Yesterday I was curious about how git push
works over SSH. I’m getting more
used to using strace
to figure this kind of thing out, so I gave it a shot.
If I strace
pushing to this site’s repository, this shows up:
[pid 15943] execve("/usr/bin/ssh", ["ssh", "git@github.com", "git-receive-pack 'kamalmarhubi/w"...], [/* 51 vars */]) = 0
So git push
eventually calls ssh git@github.com git-receive-pack <repo-path>
.
Trying this out at my terminal gives me this:
$ ssh git@github.com git-receive-pack kamalmarhubi/website
00bb29793c39c8e4bfec627d60938c4ed2086cc60bb1 refs/heads/gh-pagesreport-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2:2.4.8~upload-pack-wrapper-script-1211-gc27b061
003f04bfcb3e238e5660ae9e71a6ce99f472211fe85f refs/heads/master
0000
with the terminal waiting for my input. SSH is used to handle authentication and remote connection, and then it runs a command at the other end to handle the data exchange. These lines are the start of that exchange.
A tiny bit of looking around the internet told me that the protocol is made up
of lines prefixed by their length as 4 hex digits. Then it looks like a commit
SHA-1 and a ref. The sender terminates with 0000
.
There are a couple of lines here, one for each branch in the repository. The first line additionally has a bunch of stuff at the end that looks like a description of what the sending program is and some features it supports.
While I was looking into this, I used xsel
to copy the output to paste into
an editor. This was really confusing, because all that got pasted was the first
line without all the metadata!
00bb29793c39c8e4bfec627d60938c4ed2086cc60bb1 refs/heads/gh-pages
Looking at the entire output through hexdump -C
, it turns out that there’s a
null byte after refs/heads/gh-pages
, and then a newline at the end (marked with *
below):
00000000 30 30 62 62 32 39 37 39 33 63 33 39 63 38 65 34 |00bb29793c39c8e4|
00000010 62 66 65 63 36 32 37 64 36 30 39 33 38 63 34 65 |bfec627d60938c4e|
00000020 64 32 30 38 36 63 63 36 30 62 62 31 20 72 65 66 |d2086cc60bb1 ref|
00000030 73 2f 68 65 61 64 73 2f 67 68 2d 70 61 67 65 73 |s/heads/gh-pages|
00000040 *00*72 65 70 6f 72 74 2d 73 74 61 74 75 73 20 64 |.report-status d|
00000050 65 6c 65 74 65 2d 72 65 66 73 20 73 69 64 65 2d |elete-refs side-|
00000060 62 61 6e 64 2d 36 34 6b 20 71 75 69 65 74 20 61 |band-64k quiet a|
00000070 74 6f 6d 69 63 20 6f 66 73 2d 64 65 6c 74 61 20 |tomic ofs-delta |
00000080 61 67 65 6e 74 3d 67 69 74 2f 32 3a 32 2e 34 2e |agent=git/2:2.4.|
00000090 38 7e 75 70 6c 6f 61 64 2d 70 61 63 6b 2d 77 72 |8~upload-pack-wr|
000000a0 61 70 70 65 72 2d 73 63 72 69 70 74 2d 31 32 31 |apper-script-121|
000000b0 31 2d 67 63 32 37 62 30 36 31*0a*30 30 33 66 37 |1-gc27b061.003f7|
000000c0 39 32 66 34 39 36 65 37 35 33 64 62 39 33 33 30 |92f496e753db9330|
000000d0 66 30 61 34 65 38 32 39 30 62 38 61 36 63 62 61 |f0a4e8290b8a6cba|
000000e0 38 61 62 36 64 61 62 20 72 65 66 73 2f 68 65 61 |8ab6dab refs/hea|
000000f0 64 73 2f 6d 61 73 74 65 72 0a 30 30 30 30 |ds/master.0000|
000000fe
Without doing any research, here’s what I think happened. The git folks defined the fairly simple length-prefixed, newline-separated protocol. Then at some point they wanted to add some metadata to the protocol without breaking compatibility with older versions of git. They came up with a nifty hack that exploits C’s null-terminated strings: add the metadata after a null byte but before the newline. This way, reading up to a newline will get all the metadata. The metadata-processing code knows to look past the null byte, but the existing protocol code would see only the part up before it, presumably letting it worked unchanged!
And when I copied it using xsel
, the stuff past the null byte got skipped.
Cute hack, and mystery solved!