In this post I wrote up my first experience setting up LXC on Ubuntu 11.04 Natty, mainly for my own reference, but possibly of interest to others.

Background

In the past I've used various technologies to slice up a computer in various ways:

  • For security I've used Linux chroots and BSD jails for isolation, which works well but has limited control
  • For hosting web/mail servers I've used Xen, but got frustrated at the Dom0/DomU OS restrictions and the lack of support (then) in the mainline kernels
  • I switched to KVM (through libvirt), and am pleased with how well that works
  • I've experimented with OpenVZ, but found it complex to manage, and again there was (then) no mainline kernel support. LXC will eventually replace OpenVZ in Ubuntu.
  • For test environments with different OSes I've run Qemu, VirtualBox, Parallels, and VMWare Workstation, which work well with ISOs and graphical user interfaces

I have been particularly keen on trying out Linux Containers (LXC):

  • the cgroups controls looks very powerful (see RHEL docs) and are integrated into Linux scheduler
  • performance is said to be good, because there is no virtualisation overhead
  • it's part of mainline kernels
  • the management uses command-line tools and seems reasonably straightforward
  • it's actively developed so there should be community support (lxc-users, irc, bugs)
  • it's not fragmented like the commercial product ranges
  • it's backed by vendors (e.g. part of RHEL6, Ubuntu)
  • it's free as an in beer, and free as in GPL 2

That makes it look ideal for achieving isolation for hosting purposes. But, it's young technology, so I expect bumps in the road.

Getting Started

My main goals:

  1. create a container, and get it talking to the network
  2. document a set of instructions for preparing the host for LXC from scratch
  3. document a set of instructions for creating a new LXC container
  4. configure some containers for actual use

The test machine is running Ubuntu 11.04 (Natty).

I've found several resources:

but it's not clear which are accurate, up-to-date etc. I tried with Ulli's instructions, but that resulted in a non-booting instance (quite possibly because I did something wrong), seems to duplicate some configuration that's built-in, and seems to have some unique external dependencies. I then tried Emanuelis's method, but that looks incomplete, at least by comparison. So I ended up with elements from each.

There is also the question about what networking configuration to use. My hosting provider has some restrictions for fully bridged setups, and recommend Proxy ARP or internal bridge]. If you use an internal bridge, you can either configure a traditional subnet, or use point-to-point transfer links using RFC1918 addresses (see this description by Marc Haber). When I initially tried the latter, I found that I needed to specify "scope link" on the internal link (and use "pinpoint" in /etc/network/interfaces) or specify an explicit "src" on the default route to make sure that the packets were sent from the public IP rather than the internal link IP (see these notes). Then it worked well. This scheme is appealing in that you don't waste IP addresses to the controlling host and subnet broadcast, and there's something satisfying about plumbing explicit links. But, it does make the network layout look more complex (for things like ip addr list, ip route list, and outbound traceroute), and complicates the configuration in /etc/network/interfaces, so in the end I changed to a traditional subnet.

Preparing the controlling host

To prepare the controlling host:

mkdir /lxc
#! mount a suitable partition (local, or NFS) on /lxc
ln -s /lxc /var/lib/

apt-get install lxc debootstrap

aptitude install vlan bridge-utils python-software-properties screen libpcap-dev
aptitude install tcpdump ntp

# prepare cgroup
mkdir -p /cgroup
echo "none /cgroup cgroup defaults 0 0" >>/etc/fstab
mount /cgroup

apt-get install bridge-utils
apt-get remove network-manager network-manager-pptp

# allow ip forwarding
cat <<EOD>/etc/sysctl.d/20-lxc.conf
net.ipv4.ip_forward = 1
EOD
sysctl -w net.ipv4.ip_forward=1

# create internal bridge and allow forwarding. Adjust address for your allocation
brctl addbr br0
cat >> /etc/network/interfaces <<EOM
auto br0
iface br0 inet static
    address 46.43.55.73
    netmask 255.255.255.248
EOM
ifup br0

Creating a new container

To create a new container:

# adjust these for your purposes
NAME=natty2
THIS_IP=46.43.55.74
GW_IP=46.43.55.73
NETMASK=255.255.255.248
BITS=/29

LXCDIR=/var/lib/lxc

ROOTFS=$LXCDIR/${NAME}/rootfs
CONFIG=/root/lxc-${NAME}-config.tmp
cat > $CONFIG <<EOM
lxc.network.type = veth
lxc.network.link = br0
lxc.network.name = eth0
EOM

# create (but not start) the container
lxc-create -n $NAME -t natty -f $CONFIG

# add the root user from the controlling host to the container password file
grep root /etc/shadow > $ROOTFS/etc/shadow.new
egrep -v '^root:' $ROOTFS/etc/shadow >> $ROOTFS/etc/shadow.new
mv $ROOTFS/etc/shadow.new $ROOTFS/etc/shadow
chgrp shadow $ROOTFS/etc/shadow
chmod o-rwx $ROOTFS/etc/shadow

# copy authorized_keys
if [ -f /root/.ssh/authorized_keys ]; then
  cp -a --parents /root/.ssh/authorized_keys $ROOTFS
fi

# copy some configs from the controlling host
cp -a --parents \
  /etc/ntp.conf  \
  /etc/timezone \
  $ROOTFS

# update the resolv.conf config (lxc-create put the controlling host resolv.conf into "original")
cat $ROOTFS/etc/resolvconf/resolv.conf.d/original > $ROOTFS/etc/resolvconf/resolv.conf.d/base

# configure the network
cat > $ROOTFS/etc/network/interfaces <<EOM
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
  address $THIS_IP
  netmask $NETMASK
  gateway $GW_IP
EOM

# update the apt sources to include multiverse
cat <<EOD> $ROOTFS/etc/apt/sources.list
deb http://de.archive.ubuntu.com/ubuntu/ natty          main restricted universe multiverse
deb http://de.archive.ubuntu.com/ubuntu/ natty-updates  main restricted universe multiverse
deb http://de.archive.ubuntu.com/ubuntu/ natty-security main restricted universe multiverse
EOD

# remove unneeded postinstall scripts (container does not have udev or the graphical boot animation)
rm $ROOTFS/var/lib/dpkg/info/udev.postinst
rm $ROOTFS/var/lib/dpkg/info/plymouth.postinst

# copy kernel modules
export kernel=$(uname -a | awk '{print $3}')
mkdir -p $ROOTFS/lib/modules/$kernel/kernel
cp /lib/modules/$kernel/modules.dep $ROOTFS/lib/modules/$kernel/
cp -R /lib/modules/$kernel/kernel/net $ROOTFS/lib/modules/$kernel/kernel/

# add tmpfs
cat <<EOD > $ROOTFS/etc/fstab
tmpfs  /dev/shm   tmpfs  defaults  0 0
EOD

# add ipv6 localhost
cat <<EOD >> $ROOTFS/etc/hosts

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
EOD

# set locale info
cat >> $ROOTFS/etc/environment <<EOM
LANG="en_US.UTF-8"
LANGUAGE="en_US:en"
EOM

cat > $ROOTFS/etc/default/locale <<EOM
LANG="en_US.UTF-8"
LANGUAGE="en_US:en"
EOM

# start the container
lxc-start --name $NAME --daemon \
       --console $LXCDIR/${NAME}.console \
       --logfile=${LXCDIR}/${NAME}.log

# connect to it, log in as root. Note you can escape out with Ctrl+a q
lxc-console --name ${NAME}

export LANG=C

# test networking
ping -c 1 www.google.com

# update packages and install more
apt-get update
apt-get install -y apt-utils iptables rsyslog sudo
apt-get install -y ssh ntp lsof wget
apt-get install -y iputils-ping mtr-tiny dnsutils bind9-host
apt-get install -y ia32-libs libterm-readline-gnu-perl dialog
apt-get install -y aptitude tcpdump man less curl

# set supported locales
cat <<EOD> /var/lib/locales/supported.d/en
en_US                   ISO-8859-15
en_US.Latin1            ISO-8859-1
en_US.Latin9            ISO-8859-15
en_US.ISO-8859-1        ISO-8859-1
en_US.ISO-8859-15       ISO-8859-15
en_US.UTF-8             UTF-8
en_GB.UTF-8             UTF-8
EOD
dpkg-reconfigure locales

# remove unused init.d scripts for the hardware clock
/usr/sbin/update-rc.d -f umountfs remove
/usr/sbin/update-rc.d -f hwclock.sh remove
/usr/sbin/update-rc.d -f hwclockfirst.sh remove
rm /etc/init.d/hwclock*

# remove unused audio devices
cd /dev
rm mixer* *midi*  audio* dsp* smpte* mpu* sequencer sndstat

# remove hardware clock and boot settings
cd /etc/init
rm -f hwclock*  plymouth*

# optional: reboot to test the container comes up, and log in again
reboot

lxc-start --name $NAME --daemon \
       --console $LXCDIR/${NAME}.console \
       --logfile=${LXCDIR}/${NAME}.log

# disconnect from the console
Ctrl+a q

# ssh in
ssh -l root $THIS_IP

I imagine there will be further tweaks, but this is a good start.

LXC Commands

We've already seen lxc-create, lxc-start, lxc-stop, and lxc-console.

To list processes belonging to all LXC containers:

root@thunder:/$ lxc-ps --lxc
CONTAINER    PID TTY          TIME CMD
natty1     19126 ?        00:00:00 init
natty1     19181 ?        00:00:00 upstart-udev-br
natty1     19183 ?        00:00:00 sshd
natty1     19186 ?        00:00:00 udevd
natty1     19269 ?        00:00:00 udevd
natty1     19270 ?        00:00:00 udevd
natty1     19568 ?        00:00:00 upstart-socket-
natty1     19594 pts/4    00:00:00 getty
natty1     19596 pts/2    00:00:00 getty
natty1     19597 pts/3    00:00:00 getty
natty1     19613 pts/1    00:00:00 login
natty1     19614 pts/5    00:00:00 getty
natty1     19706 pts/1    00:00:00 bash

or:

root@thunder:/$ lxc-ps --lxc --forest
CONTAINER    PID TTY          TIME CMD
natty1     19126 ?        00:00:00 init
natty1     19181 ?        00:00:00  \_ upstart-udev-br
natty1     19183 ?        00:00:00  \_ sshd
natty1     19186 ?        00:00:00  \_ udevd
natty1     19269 ?        00:00:00  |   \_ udevd
natty1     19270 ?        00:00:00  |   \_ udevd
natty1     19568 ?        00:00:00  \_ upstart-socket-
natty1     19594 pts/4    00:00:00  \_ getty
natty1     19596 pts/2    00:00:00  \_ getty
natty1     19597 pts/3    00:00:00  \_ getty
natty1     19613 pts/1    00:00:00  \_ login
natty1     19706 pts/1    00:00:00  |   \_ bash
natty1     19614 pts/5    00:00:00  \_ getty

I'm going to skip lxc-ls because it's needlessly confusing.

Sharing a filesystem

The recommended way of sharing part of the controlling host filesystems with the container is to use bind mounts. For example:

# on the controlling host
root@thunder:/# lxc-stop --name natty1
root@thunder:/# mkdir /mydata
root@thunder:/# mkdir /var/lib/lxc/natty1/rootfs/mydata
root@thunder:/# mount -o bind /mydata /var/lib/lxc/natty1/rootfs/mydata
root@thunder:/# lxc-start --name natty1 --daemon \
>       --console /var/lib/lxc/natty1.console \
>       --logfile=/var/lib/lxc/natty1.log

# in the container
root@natty1:/# ls -ld /mydata
drwxr-xr-x 2 root root 4096 Jun 13 11:53 /mydata
root@natty1:/# touch /mydata/hi.txt

# on the controlling host
root@thunder:/# ls /mydata
hi.txt

So far

LXC seems to work just fine. It's fast. Documentation is lacking, and tooling seems limited. In terms of complexity it's not dissimilar to the early days of Xen and KVM. So far so good. At this point I can see myself switching to LXC for my hosting purposes.

To do next (time permitting):

  • reconfigure the host and use LVM for the containers
  • actually use it for a while for for real work in multiple containers
  • try nested containers, to test the install instructions (and just because you can)
  • review online resources more to see what else I'm missing, and update this post accordingly
  • maybe do some performance tests
  • look into libvirt support
  • look into limiting CPU/IO etc through cgroups, and document that in a separate post
  • run some KVM guests alongside; that should just work
  • configure IPv6
  • experiment with NAT to see if I can add additional containers on RFC1918 addresses