Getting git to play nicely with CDNs

Git is a really cool version control system. So cool in fact that I decided to use it to distribute the project I’m working on to several hundreds of Planetlab nodes. So I went ahead and created a repository with git init --bare somewhere in under the root of my local Apache2. Using pssh we can clone and pull from the repository simply by specifying the URL to that repo.

Obviously the traffic is still pretty high, after all every request still ends up at my machine, so I have to serve the whole repository once for each machine. Then I stumbled over CoralCDN, a free Content Distribution Network, that runs on Planetlab. So instead of cloning directly from my machine I took the URL of the repo, added .nyud.net to the domain and cloned from that URL instead.

The drop in traffic when cloning was immediate and I was happily working with this setup, for some time. Then I noticed that having the CDN cache the contents has its drawbacks: if I want to push changes quickly one after another, say, because I noticed a typo just after issuing the update, I have to wait for the cache to time out.

To solve this problem we have to set the objects files, which do not change because it is part of gits content addressable design, and set a short caching time for the few files that do change. Placing this .htaccess file in the repository and activating mod_headers and mod_expires should do the trick:

ExpiresActive On
ExpiresDefault A300
Header append Cache-Control "public"

<FilesMatch "(info|branches|refs|HEAD)">
  ExpiresDefault A10
  Header append Cache-Control "must-revalidate"
</FilesMatch>

1This sets everything to be cacheable for 5 minutes (300 seconds), except the references, which tells git where to look for the content.