Creating Machines with Chef
After ugrading Chef to
use environments, I needed to update my custom AMIs. These
custom AMIs were based on Ubuntu's Amazon EC2 Published AMIs, have
some extra software pre-installed, and have a custom chef client.rb
which gets configuration (chef server info, and client roles) from
EC2 userdata to bootstrap itself. I then use scripts to instantiate
machines from those AMIs, and pass them the appropriate userdata.
This has been working great, but is not "the chef way" -- the
recommendation is to use knife ec2 server create
,
which creates a machine, and then ssh'es in to bootstrap it. In
Chef 0.9 I ran into various routing and ssh timing bugs that made
this approach too unreliable, but in 0.10 that appears to have been
resolved. The main advantage of this approach is that you don't
need to make special AMIs; you just use the latest official Ubuntu
ones, in any region/arch/store. The disadvantage of that is that
you then have to wait for chef-client
to install all
the software, which in the case of Java and RVM/Ruby is a long
time.
So the challenge is to:
- make sure that the "knife ec2 server create" method produces functional machines from stock AMIs for all my roles
- use custom AMIs to preload software, and use them from my existing scripts (which use ec2-run-instances) for selected roles
I also wanted to take the opportunity to upgrade OS and sanitise my Ruby install.
For the OS I wanted to switch from Ubuntu 10.4 Maverick Meerkat to 11.4
Natty Narwhal. Chef 0.10.2 includes only templates for [ubuntu10.04-apt,
ubuntu10.04-gems.erb], which can be adapted for 11.4 by
changing the "lucid" to "natty" (or pull the release name out of
lsb_release
), but then you end up with Chef 0.9, so
you want to add "-0.10".
Here I ran into an interesting issue: the apt template does a
apt-get install -y chef
, and then writes settings to
the client.rb
, and then runs chef-client for the
initial bootstrap. The problem is that the install also starts the
/etc/init.d/chef-client
service, so that executes
before the modifications to client.rb
are made, and
before the chef-client bootstrap runs. In my template modifications
I set the node_name
, and as a result the first
chef-client registered the client with the default name (the host
name), and the subsequent invocation failed; and I ended up with
nodes in the wrong environment. I think there is actually a generic
template bug here.
We're using Ruby and
RVM for applications on some machine
roles, and I've run into various situations where there has been
confusion between the system ruby, apt, RVM in
/usr/local
, RVM in user home directories, various
gemsets, and the chef-client and our applications. To reduce that
confusion I wanted to try the apt install rather than the default
gem install, and limit RVM to a per-user install. [Update: there
are some unique issues, such as knife not finding plugins (CHEF-2483)]
The Knife Template
Pulling it all together I ended up with this knife
template ubuntu11.04-apt.erb:
#!/bin/bash
# This is a knife ec2 server create template for Ubuntu 11.4.
# It is based on the ubuntu10.04-apt.erb version in the 0.10.2 Chef distribution
# available here:
# https://github.com/opscode/chef/blob/master/chef/lib/chef/knife/bootstrap/
# with modifications to:
# - use the natty APT repository
# - install Chef 0.10.2
# - avoid starting the /etc/init.d/chef-client service until the client.rb
# has been written
# - let a CHEF_NODE_NAME_PREFIX environment variable prefix the node name
bash -c '
# MAK: use lsb-release to pick up release name, and add -0.10 to get chef 0.10
<%= chef_server_url = Chef::Config[:chef_server_url] %>
<%= validation_client_name = Chef::Config[:validation_client_name] %>
<%= environment = Chef::Config[:environment] %>
if [ ! -f /usr/bin/chef-client ]; then
echo "chef chef/chef_server_url string <%= chef_server_url %>" \
| debconf-set-selections
[ -f /etc/apt/sources.list.d/opscode.list ] || \
echo "deb http://apt.opscode.com "`lsb_release -cs`"-0.10 main" \
> /etc/apt/sources.list.d/opscode.list
wget -O- http://apt.opscode.com/packages@opscode.com.gpg.key | apt-key add -
fi
apt-get update
# MAK: use policy-rc.d to prevent chef-client starting and registering
# before we write client.rb
(cat <<'EOP'
#!/bin/sh
exit 101
EOP
) > /usr/sbin/policy-rc.d
chmod 755 /usr/sbin/policy-rc.d
apt-get install -y chef
# MAK: remove policy.rc
rm -f /usr/sbin/policy-rc.d
<% unless validation_client_name == "chef-validator" -%>
[ `grep -qx "validation_client_name \"<%= validation_client_name %>\"" \
/etc/chef/client.rb` ] \
|| echo "validation_client_name \"<%= validation_client_name %>\"" \
>> /etc/chef/client.rb
<% end -%>
(
cat <<'EOP'
<%= IO.read(Chef::Config[:validation_key]) %>
EOP
) > /tmp/validation.pem
awk NF /tmp/validation.pem > /etc/chef/validation.pem
rm /tmp/validation.pem
<% if @config[:chef_node_name] %>
[ `grep -qx "node_name \"<%= @config[:chef_node_name] %>\"" \
/etc/chef/client.rb` ] \
|| echo "node_name \"<%= @config[:chef_node_name] %>\"" \
>> /etc/chef/client.rb
<% end -%>
# MAK: use an environment variable to pass in a hostname prefix,
# so your node gets called e.g. web-server-i-123abc
<% if (! ENV['CHEF_NODE_NAME_PREFIX'].nil?) and
::File.exists?('/usr/bin/ec2metadata') %>
(
cat <<'EOP'
node_name "<%= ENV['CHEF_NODE_NAME_PREFIX'] %>`ec2metadata --instance-id`"
EOP
) >> /etc/chef/client.rb
<% end -%>
<% unless (environment == "" or environment == "_default") -%>
[ `grep -qx "environment \"<%= environment %>\"" /etc/chef/client.rb` ] \
|| echo "environment \"<%= environment %>\"" >> /etc/chef/client.rb
<% end -%>
(
cat <<'EOP'
<%= { "run_list" => @run_list }.to_json %>
EOP
) > /etc/chef/first-boot.json
/usr/bin/chef-client -j /etc/chef/first-boot.json
# MAK: start chef-client because we prevented that previously
/etc/init.d/chef-client start
'
which you can use likes this:
export CHEF_NODE_NAME_PREFIX=webserver-
knife ec2 server create -r "role[webserver]" \
-I ami-ab16d2c2 --flavor m1.large -G webserver_demo \
-x ubuntu --ssh-key demo-kp1 \
--template ubuntu11.04-apt.erb \
--environment demo
This works well for bringing up a generic instance with a given role from the command line, after which Chef kicks in and configures the machine.
The AMI
To create an AMI there are two approaches: snapshot a running instance, or build an AMI using loopback mounts and chroot. The former is somewhat easier, the latter is more secure and precise, and is recommended for public AMIs. For a discussion, see Eric Hammond's posts on Creating Public AMIs Securely for EC2 and Building EBS Boot AMIs Using Canonical's Downloadable EC2 Images.
For my private AMI I decided to use the simpler snapshot approach, at least initially to develop the install sequence, and I've split it into separate scripts for easier testing. See my github create-ami repo.