On Sunday, I brought up EC2 instances to support the combined Debian Bug Squashing Party/Ubuntu Local Jam that took place at PuppetLabs in Portland, OR, USA. The intent was to provide each participant with their own sbuild environment on a 64bit machine, since we were going to be working on Multi-Arch support, and having both 64bit and 32bit chroots would be helpful. The host was an Ubuntu 11.10 (Oneiric) instance so it would be possible to do SRU verifications in the cloud too.
I was curious about the juju provisioning system, since it has an interesting plugin system, called “charms”, that can be used to build out services. I decided to write an sbuild charm, which was pretty straight forward and quite powerful (using this charm it would be possible to trigger the creation of new schroots across all instances at any time, etc).
The juju service itself works really well when it works correctly. When something goes wrong, unfortunately, it becomes nearly impossible to debug or fix. Repeatedly while working on charm development, the provisioning system would lose its mind, and I’d have to destroy the entire environment and re-bootstrap to get things running again. I had hoped this wouldn’t be the case while I was using it during “production” on Sunday, but the provisioner broke spectacularly on Sunday too. Due to the fragility of the juju agents, it wasn’t possible to restart the provisioner — it lost its mind, the other agent’s couldn’t talk to it any more, etc. I would expect the master services on a cloud instance manager to be extremely robust since having it die would mean totally losing control of all your instances.
On Sunday morning, I started 8 instances. 6 came up perfectly and were excellent work-horses all day at the BSP. 2 never came up. The EC2 instances started, but the service provisioner never noticed them. Adding new units didn’t work (instances would start, but no services would notice them), and when I tried to remove the seemingly broken machines, the instance provisioner completely went crazy and started dumping Python traces into the logs (which seems to be related to this bug, though some kind of race condition seems to have confused it much earlier than this total failure), and that was it. We used the instances we had, and I spent 3 hours trying to fix the provisioner, eventually giving up on it.
I was very pleased with EC2 and Ubuntu Server itself on the instances. The schroots worked, sbuild worked (though I identified some additional things that the charm should likely do for setup). I think juju has a lot of potential, but I’m surprised at how fragile it is. It didn’t help that Amazon had rebooted the entire West Coast the day before and there were dead Ubuntu Archive Mirrors in the DNS rotation.
For anyone else wanting to spin up builders in the cloud using juju, I have a run-down of what this looks like from the admin’s perspective, and even include a little script to produce little slips of paper to hand out to attendees with an instance’s hostname, ssh keys, and builder SSH password. Seemed to work pretty well overall; I just wish I could have spun up a few more. :)
So, even with the fighting with juju and a few extra instances that came up and I had to shut down again without actually using them, the total cost to run the instances for the whole BSP was about US$40, and including the charm development time, about US$45.
UPDATE: some more details on how to avoid the glitches I hit.
© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.