Portrait of Martijn

Cloning VMs with KVM

24 Mar 2013

Now that I have my shiny new server, it needs virtual machines.

I have two usecases:

  • long-running VMs to run services (such as OpenLDAP, Postfix, Jenkins), to do development, and be the target of ongoing deploys. These should be isolated, dependable, and have fast and consistent performance.
  • short-lived VMs to run single-node or multi-node tests. These should be quick to bring up in a consistent known state, and scriptable.

I use KVM, because it is open source, part of the kernel, widely supported and reliable. I've run it on a colo for years, and various ISP use it for production cloud platforms (ByteMark, Dutch Cloud, Digital Ocean). I manage it with virt-manager and script it with libvirt.

Not everything is rosy:

  • Documentation is scattered.
  • Virt-manager's UI is basic.
  • Libvirt's snapshot support is incomplete:
    • "snapshots of inactive domains not implemented yet"
    • "revert to external disk snapshot not supported yet"
  • the virt-clone tool canot write to a fresh LVM volume ("Clone onto existing storage volume is not supported").
  • on Ubuntu AppArmor gets in the way when you manage multiple snapshot image files (see this Launchpad bug).
  • libvirt's use of XML for domain configuration is a bit annoying to script.
  • QCow2's internal snapshots take longer than I would like; about 10 seconds instead of 1 second for external snapshots.
  • and whenever you do any virtualisation, networking details are always fiddly.

So here is the workflow that I settled on:

  • Create a base image with a standard OS install
  • For the long-running VMs, I use separate LVM volumes to serve as raw virtual disk, clone a base image, and configure the guest. That takes under a minute.
  • For the short-lived VMs, I convert the disk of a long-running VM to qcow2 to serve as a backing image, and then create qcow2 images for each domain that use this backing store, configure that guest, and then create another backing store based on that, which is what the domain is configured to use. Rolling back is then simply a matter of re-creating the last qcow image. That takes a few seconds.

I'll illustrate these in more detail.

Base Image Creation

I like to install the base from a GUI, using the standard installer, and complete it manually, so that it matches what users will see.

I run vn4server, and connect from my workstation. I install virt-manager, define a storage pool (my /dev/vg_vms LVM volume group), then create an 8G image base on ubuntu-12.10-server-amd64.iso, and name that domain ubuntu-base-vm.

For disk partitioning I use "Guided -- use entire disk" rather than my usual "Guided -- use entire disk and set up LVM". There are three reasons for this:

  • I don't really need LVM on these fairly small virtual disks
  • the installer names the volume group after the host, which will look strange on the clones
  • the volume group ends up in an extended partition, where virt-resize doesn't resize logical volumes

In the Software selection I pick "OpenSSH server".

Finally I login to the console, and add my ssh public key to .ssh/authorized_keys so that I can login to clones remotely later, and do an aptitude update; aptitude upgrade for good measure.

At some point I might swith to virt-install for this step.

Network Preparation

First, I'm making a list of VM names, IP addresses and MAC addresses. I'll later use this list to configure the network interface of the KVM domain configuration, and the networking configuration files in the guest. For convenience I match vm names with ip addresses, such that vm111 is on 192.168.0.111. I wrote a little script that generates the address and the MAC (per this tip). For long-term VMs you can just edit the resulting file and change the name.

#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# generate vm network range

import virtinst.util
for num in range(110, 130+1):
    print("vm{0}\t192.168.0.{0}\t{1}".format(num, virtinst.util.randomMAC()))

which I can use like this:

python generate-ips.py  > ips.txt

and produces:

$ head -n 2 ips.txt 
vm110   192.168.0.110   00:16:3e:39:a5:53
vm111   192.168.0.111   00:16:3e:1e:0b:51

The instructions in this document do not rely on DNS configuration, but it is nice to give your VMs DNS names. On my LAN I use OpenWRT, and I can generate the configuration for its /etc/config/dhcp:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# generate openwrt /etc/config/dhcp config

import sys
for line in sys.stdin:
    (ip, name, mac) = line.split()
    print("config domain\n\toption name '{0}'\n\toption ip '{1}'\n\n".format(name, ip))
    print("config host\n\toption mac '{0}'\n\toption name '{1}'\n\toption ip '{2}'\n\n".format(mac, name, ip))

run like:

python generate-openwrt.py < ips.txt

and copy/paste into my router.

Long-term VMs: Clone to raw LVM guests

Here are the steps to do a thick provisioning of a VM. I'll use bash variable $VM for the VM name.

VM=vm111

First create the volume, of the same size as the original. For example, for a VM named vm111, I create a volume named vms-vm111 in the volume group vg_vms:

size=`sudo lvs -o lv_size --unit=b --noheadings /dev/vg_vms/ubuntu-base-vm | sed 's/^ *//'`
echo size=$size
sudo lvcreate --size=$size --name=vms-$VM vg_vms

Alternatively you can specify a large size, e.g.:

sudo lvcreate --size=20G --name=vms-$VM vg_vms

Next, I copy the base image:

sudo virt-resize --expand sda1 \
    /dev/vg_vms/ubuntu-base-vm /dev/vg_vms/vms-$VM

The 'sda1' parameter indicates the partition inside the guest that should be exanded; in this case the root partition.

Now we need to create a KVM domain to use that disk. I can copy the XML definition of the base image, and update the device path, the MAC address, and generate a new UUID. To make that easier, I use this python script:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# modify-domain.py -- modify a KVM domain
#
# Copyright (C) 2013 Martijn Koster
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation files
# (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge,
# publish, distribute, sublicense, and/or sell copies of the Software,
# and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:  The above copyright notice and
# this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import re, sys, uuid
from lxml import etree
from optparse import OptionParser

parser = OptionParser()
parser.add_option("--name")
parser.add_option("--new-uuid", action="store_true")
parser.add_option("--device-path")
parser.add_option("--mac-address")
(options, args) = parser.parse_args()

tree = etree.parse(sys.stdin)

if options.name:
    name_el = tree.xpath("/domain/name")[0]
    name_el.text = options.name

if options.new_uuid:
    uuid_el = tree.xpath("/domain/uuid")[0]
    uuid_el.text = str(uuid.uuid1())

if options.device_path is not None:
    if options.device_path[0] is not '/':
        sys.exit("device_path is not an absolute path")
    source_el = tree.xpath("/domain/devices/disk[@device='disk']/source")[0]
    source_el.set('dev', options.device_path)
    if re.match('.*\.qcow2$', options.device_path):
        driver = 'qcow2'
    else:
        driver = 'raw'
    driver_el = tree.xpath("/domain/devices/disk[@device='disk']/driver")[0]
    driver_el.set('type', driver)

if options.mac_address is not None:
    if not re.match("([0-9a-f][0-9a-f]:){5}[0-9a-f][0-9a-f]", options.mac_address):
        sys.exit("{0} is not a valid MAC address".format(options.mac_address))
    mac_el = tree.xpath("/domain/devices/interface[@type='bridge']/mac")[0]
    mac_el.set('address', options.mac_address)

print(etree.tostring(tree, pretty_print=True))

so that I can do:

mkdir -p tmp
virsh dumpxml ubuntu-base-vm > tmp/ubuntu-base-vm.xml
mac=`egrep "^$VM"'\s' ips.txt | awk '{print $3}'`; echo $mac
python ./modify-domain.py \
    --name $VM \
    --new-uuid \
    --device-path=/dev/vg_vms/vms-$VM \
    --mac-address $mac \
    < tmp/ubuntu-base-vm.xml > tmp/$VM.xml
virsh define tmp/$VM.xml
virsh dumpxml $VM

Finally, we need to configure the guest's networking details. The virt-sysprep tool can help with that, but doesn't regenerate openssh keys, or /etc/hosts. So I wrap it with some scripting. I use some some templates:

For the networking:

mkdir -p templates
cat > templates/network-interfaces <<NET
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
    address IP_ADDRESS_GOES_HERE
    network 192.169.0.0
    netmask 255.255.255.0
    broadcast 192.168.0.255
    gateway 192.168.0.1
    dns-nameservers 192.168.0.1
    dns-search vlab1.stalworthy.net
NET

The hosts file:

cat > templates/hosts <<HOSTS
127.0.0.1   localhost
IP_ADDRESS_GOES_HERE   VM_NAME_GOES_HERE

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
HOSTS

And a script to run in the host:

cat > templates/configure.sh <<SCRIPT
#!/bin/bash
# Run in the host, with the cwd being the root of the guest

set -x
cp tmp/network_interfaces.VM_NAME_GOES_HERE etc/network/interfaces
cp tmp/hosts.VM_NAME_GOES_HERE etc/hosts

# re-generate the keys. Letting virt-sysprep remove the keys
# is insufficient, and they don't get automatically regenerated
# on boot by Ubuntu. A dpkg-reconfigure fails for some reason,
# and doing a boot-time script is overkill, so just do it now explicitly.
rm etc/ssh/ssh_host_rsa_key etc/ssh/ssh_host_rsa_key.pub
rm etc/ssh/ssh_host_dsa_key etc/ssh/ssh_host_dsa_key.pub
rm etc/ssh/ssh_host_ecdsa_key etc/ssh/ssh_host_ecdsa_key.pub
ssh-keygen -h -N '' -t rsa -f etc/ssh/ssh_host_rsa_key
ssh-keygen -h -N '' -t dsa -f etc/ssh/ssh_host_dsa_key
ssh-keygen -h -N '' -t ecdsa -f etc/ssh/ssh_host_ecdsa_key
SCRIPT

Now we can use those templates to generate host-specific versions, and prepare the image:

ip=`egrep "^$VM\s" ips.txt | awk '{print $2}'`; echo $ip
sed -e "s/IP_ADDRESS_GOES_HERE/$ip/g" -e "s/VM_NAME_GOES_HERE/$VM/g" < templates/hosts > tmp/hosts.$VM
sed -e "s/IP_ADDRESS_GOES_HERE/$ip/g" -e "s/VM_NAME_GOES_HERE/$VM/g" < templates/network-interfaces > tmp/network-interfaces.$VM
sed -e "s/IP_ADDRESS_GOES_HERE/$ip/g" -e "s/VM_NAME_GOES_HERE/$VM/g" < templates/configure.sh > tmp/configure.sh.$VM
chmod a+x tmp/configure.sh.$VM
sudo virt-sysprep -d $VM \
  --verbose \
  --enable udev-persistent-net,bash-history,hostname,logfiles,utmp,script \
  --hostname $VM \
  --script `pwd`/tmp/configure.sh.$VM

Now the guest is ready and can be started:

virsh start $VM

Now that you've gone through this once, and the templates are in place, you can use this script to make this a one-liner:

$ time ./clone.sh vm112
...
Domain vm112 started

real    0m55.401s

Which is nice.

Short-term VMs: Thin-provisioning with Qcow2

Ubuntu's libvirt installation has AppArmor configuration which limits what guests can write to. Here we'll be using different files, and you'll get permission errors. The easiest way around this is to use this workaround and reboot:

cat >> /etc/apparmor.d/abstractions/libvirt-qemu <<EOM
/var/lib/libvirt/images/** r,
EOM

For the short-lived VMs, we start by creating a clone of the base, in qcow2 format:

sudo qemu-img convert -O qcow2 /dev/vg_vms/ubuntu-base-vm /var/lib/libvirt/images/ubuntu-base-vm-readonly.qcow2
sudo chmod u-w /var/lib/libvirt/images/ubuntu-base-vm-readonly.qcow2

This can be used as backing file by multiple VMs, and must not be modified. From that base image, create a thin clone image for this specific VM:

VM=vm114
sudo qemu-img create -f qcow2 -b /var/lib/libvirt/images/ubuntu-base-vm-readonly.qcow2 /var/lib/libvirt/images/$VM.qcow2

then define a KVM domain for it, and configure it:

mkdir -p tmp
virsh dumpxml ubuntu-base-vm > tmp/ubuntu-base-vm.xml
mac=`egrep "^$VM"'\s' ips.txt | awk '{print $3}'`
python ./modify-domain.py \
    --name $VM \
    --new-uuid \
    --device-path=/var/lib/libvirt/images/$VM.qcow2 \
    --mac-address $mac \
    < tmp/ubuntu-base-vm.xml > tmp/$VM.xml
virsh define tmp/$VM.xml
virsh start $VM

at this point you can start the VM and check it works. Next we make a snapshot to serve as a clean starting point for future runs:

virsh destroy $VM
sudo qemu-img create -f qcow2 -b /var/lib/libvirt/images/$VM.qcow2 /var/lib/libvirt/images/$VM-start.qcow2
virsh dumpxml $VM > tmp/$VM.xml
python ./modify-domain.py \
    --device-path=/var/lib/libvirt/images/$VM-start.qcow2 \
    < tmp/$VM.xml > tmp/$VM-start.xml
virsh define tmp/$VM-start.xml

virsh start $VM

Now we can use it. When we're done and want to reset the VM, we can do:

virsh destroy $VM
sudo rm /var/lib/libvirt/images/$VM-start.qcow2
sudo qemu-img create -f qcow2 -b /var/lib/libvirt/images/$VM.qcow2 /var/lib/libvirt/images/$VM-start.qcow2
virsh start $VM

which takes less than 2 seconds:

$ time sudo ./reset-vm.sh vm114
Domain vm114 destroyed

Formatting '/var/lib/libvirt/images/vm114-start.qcow2', fmt=qcow2 size=8388608000 backing_file='/var/lib/libvirt/images/vm114.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Domain vm114 started

real    0m1.311s

Closing

We've seen you can automate thick-provisioning VMs through cloning in under a minute, and rollback thin-provisioned VMs in seconds. Which is nice.

I'm really looking forward to future versions of libvirt/virt-manager adding support for these things through their API/UI.

For now, I'll see how well this setup works in practice, and perhaps experiment with some alternatives.