Using Packer for faster scaling
posted in DevOps on 2014-02-06 00:00:00 UTC by Dave Martorana
We have been playing with Packer for a little while, thanks to the behest of a friend of Flyclops. While I haven’t completely explained how we manage our server stack(s), I’ve hinted before at the following configuration for any of our ephemeral servers that automatically scale up and down with load:
- AWS CloudFormation templates
- Vanilla Ubuntu servers
- Just-in-time provisioning of specific servers at scale/deploy-time using CloudInit
This worked well for a good amount of time. Along with the use of Fabric, a single command-line compiles all of our CloudFormation templates, provisioning scripts, variables for scale alarms and triggers, update our stack parameters and notifications, and push new code to running servers. All of our scripts, config files for servers, etc., are Jinja2 templates that get compiled before being uploaded to Amazon.
The biggest bottle-neck was our scale-up. The process worked like this:
- Boot a new EC2 instance at desired size, using a vanilla Ubuntu AMI
- Attach server to the load balancer with a long initial wait period
- Run a series of in-order shell scripts (via CloudInit) to build the server in to whatever it needs to be
- Load balancer detects a fully-functional server
- Traffic begins to be routed to the new server
The beauty of our system was also its biggest fault. Our servers never got stale, and as they scaled up and down, benefitted from bug fixes to the core OS. Every server that started handling work was freshly built, and any error in the build process would kill the instance and boot a new server - so the fault-tolerance was fantastic, too.
But the downsides were many as well. The provisioning process was highly optimized and still took over 6 minutes to get a server from boot to answering traffic. Provisioning on the fly required access to several external services (APT repositories, PyPi, Beanstalk/Github, and so-on. Any service disruption to any of these would cause a failed build (and we were unable to scale until the external service issue had been resolved).
Caching as a bad first attempt
We went through several rounds of attempting to remove external dependencies to scaling - from committing code to the pip2pi repository to include the on-the-fly creation of an S3 bucket to serve as a PyPi mirror to bundling git snapshots, etc.
Eventually, the staleness we were trying to avoid was now possible in many other services we were attempting to force caches of on to the AWS network, and we were maintaining a lot more code.
Enter Packer
Packer brought us almost back to our roots. By ripping out massive amounts of support code and adding only a bit to include Packer, we were able to recreate on-the-fly builds of local VMs that were almost identical to those running in our stack. From then, it was a very easy process to pack the VM in to an AMI, and use that AMI, not a vanilla Ubuntu AMI, in scaling. Here’s the entirety of the pack command in our code (note, this is a Fabric command, and uses Fabric function calls):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #
#
@task
@runs_once
def pack(builder=None):
'''
Pack for the current
environment and name
'''
if not builder:
abort('Please provide a builder name for packing')
_confirm_stack()
_get_stack_context()
env.template_context['builder'] = builder
# This function compiles all of our provisioning
# shell scripts
_compile_userdata()
# This compiles our packer config file
packer_dir = _get_packer_folder()
packer_file = join(packer_dir, 'packer.json')
compiled_packer_file = join(packer_dir, 'packer_compiled.json')
print 'Using packer file: %s' % packer_file
# Get the compiled packer file
packer_compiled_contents = _get_jina2_compiled_string(packer_file)
with open(compiled_packer_file, 'w') as f:
f.write(packer_compiled_contents)
# Move in to the directory and pack it
local("cd %s && packer build -only=%s packer_compiled.json && cd .." % (
packer_dir,
builder
))
# Clean up
local("rm %s/*" % join(packer_dir, 'scripts'))
local("rm %s" % compiled_packer_file)
local("rm -rf %s" % join(packer_dir, 'packer_cache'))
|
Of course, we need to grab the AMI that packer just created. You can use tagging and all that fun, but here’s a quick way to do it with boto:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #
#
def _get_ami_for_stack():
ec2_conn = boto.connect_ec2(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
images = ec2_conn.get_all_images(
owners='self',
filters={
'name': '*%s*' % env.environment # This might be 'production' or 'staging'
}
)
images.sort(key=lambda i: i.name)
images.reverse()
ami_image = images[0]
return ami_image.id
|
Now we place the Packer-created AMI instance ID in to our CloudFormation template (in the AutoScaleGroup section) and we’re off to the races.
The benefits are plentiful:
- AMI is built at pack time, not scale time
- Can still use CloudInit to build, just happens at pack-time, not scale time
- No reliance on any external services to bring another server online
- Updated AMI is a command-line statement away
- Can pack local VMs using identical code
- Removal of all caching code
But most importantly…
Because of the fact that no code has to run for our services to scale up, we’ve gone from 6 minute scale-up times to approximately the time it takes for the EC2 instance to boot and the ELB to hit it once. That is currently < 60 seconds.
Packer has been a fantastic win for speeding up and simplifying deployments for us. I recommend you look in to it for your own stack.