Portrait of Martijn

ProjectCalico Experiments

22 May 2015

I've been using Docker a lot recently, mainly in the context of Jenkins continuous integration. Basically, I use the Jenkins docker plugin to create jenkins slave containers on-demand, run a test job there, which in turn launches other docker containers as data sources for those tests. This is great, but the technology is still somewhat rough around the edges, especially with regard to management: authentication, container cleanup, resource management, scaling. One area high on my priority list is seamless networking, where a lot of exciting development is being done.

I was particularly intrigued by Project Calico, which uses BGP route management rather than an encapsulating overlay network, and thus avoids NAT and port proxies, doesn't require a fancy virtual switch setup, and supports IPv6. See Learn Calico for the sales pitch, and Calico Architecture for an architectural overview.

I tried this out on my local Ubuntu 14.04.2 test lab. It consists of a router and three Linux hosts trinity10-30 (192.168.77.10-30). The router is also connected on 192.168.0.2 to the main LAN router (192.168.0.1) which routes (and SNATs) to the internet. Here is a diagram of the network layout:

The goal of this post is to install Calico such that containers can talk to eachother, to the internet, and to hosts elsewhwere in my network.

Installing etcd

Calico requires etcd, and Calico helpfully provides a PPA, documented in the openstack instructions:

sudo apt-add-repository --yes ppa:project-calico/icehouse

sudo bash -c "cat >>/etc/apt/preferences" <<EOM
Package: *
Pin: release o=LP-PPA-project-calico-*
Pin-Priority: 1001
EOM

sudo apt-get update
sudo apt-get install etcd

At time of writing this installs version 2.0.0-1~ubuntu14.04.1~ppa6, complete with Upstart script in /etc/init.d/etcd.conf.

Alternatively you can install binaries from a tarball from the etcd project's releases. Loosely based on Scott Lowe's blog post and Calico's PPA, I tried 2.0.11:

curl -L  https://github.com/coreos/etcd/releases/download/v2.0.11/etcd-v2.0.11-linux-amd64.tar.gz -o etcd-v2.0.11-linux-amd64.tar.gz
tar xzvf etcd-v2.0.11-linux-amd64.tar.gz
sudo cp etcd-v2.0.11-linux-amd64/{etcd,etcdctl} /usr/bin
rm -fr etcd-v2.0.11-linux-amd64
sudo mkdir /var/lib/etcd

sudo adduser --system --disabled-login --disabled-password --no-create-home --shell /bin/false --ingroup etcd etcd
sudo chown -R etcd:etcd /var/lib/etcd

sudo bash -c "cat >/etc/init/etcd.conf" <<'EOM'
# vim:set ft=upstart ts=2 et:
description "etcd"
author "etcd maintainers"

start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
respawn limit 10 5

setuid etcd

env ETCD_DATA_DIR=/var/lib/etcd
export ETCD_DATA_DIR

exec /usr/bin/etcd --listen-client-urls="http://0.0.0.0:2379,http://0.0.0.0:4001"
EOM

Don't start etcd yet; we need to configure it first.

Configuring etcd

There are various ways of configuring etcd. For my simple testbed I chose a fixed, file-based configuration. On trinity10:

sudo bash -c "cat > /etc/init/etcd.override" <<'EOM'
# Override file for etcd Upstart script providing some environment variables
env ETCD_INITIAL_CLUSTER="etcd-trinity10=http://192.168.77.10:2380,etcd-trinity20=http://192.168.77.20:2380,etcd-trinity30=http://192.168.77.30:2380"
env ETCD_INITIAL_CLUSTER_STATE="new"
env ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-trinity"
env ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.77.10:2380"
env ETCD_DATA_DIR="/var/etcd"
env ETCD_LISTEN_PEER_URLS="http://192.168.77.10:2380"
env ETCD_LISTEN_CLIENT_URLS="http://192.168.77.10:2379"
env ETCD_ADVERTISE_CLIENT_URLS="http://192.168.77.10:2379"
env ETCD_NAME="etcd-trinity10"
EOM

and start:

sudo service etcd start

and repeat the same on trinity20/trinity30, adjusting the configuration values as appropriate.

To inspect the cluster:

export ETCDCTL_PEERS=http://192.168.77.10:2379,http://192.168.77.20:2379,http://192.168.77.30:2379
etcdctl member list | sort -k 2

shows:

63412e35860644fa: name=etcd-trinity10 peerURLs=http://192.168.77.10:2380 clientURLs=http://192.168.77.10:2379
483b4dedff8b84f1: name=etcd-trinity20 peerURLs=http://192.168.77.20:2380 clientURLs=http://192.168.77.20:2379
5393e6873eb4c26e: name=etcd-trinity30 peerURLs=http://192.168.77.30:2380 clientURLs=http://192.168.77.30:2379

That's it for etcd.

Installing Calico

Next, Calico.

First download it:

wget -q http://projectcalico.org/latest/calicoctl
chmod a+x calicoctl
sudo mv calicoctl /usr/local/bin/

Then on trinity10:

sudo apt-get install ipset  # for `calicoctl diags`

sudo modprobe xt_set
sudo modprobe ip6_tables
export ETCD_AUTHORITY=192.168.77.10:2379
calicoctl node --ip=192.168.77.10

That prints out:

Pulling Docker image calico/node:v0.4.2
Docker Remote API is on port 2377.  Run 

export DOCKER_HOST=localhost:2377

before using `docker run` for Calico networking.

Calico node is running with id: fe29c1afc70d4f7aceffb95d4416c7c75c6151dc0a76becc4e94687081214b05

Do the same on trinity20/trinity30, changing the values as appropriate.

This starts a container which various services, including "felix" (the calico agent) and BIRD (for BGP):

root@trinity10:~# docker exec -it calico-node ps -efl
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S root         1     0  0  80   0 -  7981 wait   May20 ?        00:00:00 /usr/bin/python3 -u /sbin/my_init
0 S root        19     1  0  80   0 -    47 poll_s May20 ?        00:00:00 /usr/bin/runsvdir -P /etc/service
0 S root        20    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv powerstrip
0 S root        21    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv bird
0 S root        22    19  0  80   0 -    42 hrtime May20 ?        00:00:58 runsv bird6
0 S root        23    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv confd
0 S root        24    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv powerstrip-calico
0 S root        25    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv felix
0 S root        26    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv syslog-forwarder
0 S root        27    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv cron
0 S root        28    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv syslog-ng
0 S root        29    19  0  80   0 -    42 poll_s May20 ?        00:00:00 runsv sshd
4 S root        30    27  0  80   0 -  6686 hrtime May20 ?        00:00:00 /usr/sbin/cron -f
4 S root        31    28  0  80   0 - 16437 ep_pol May20 ?        00:00:00 syslog-ng -F -p /var/run/syslog-ng.pid --no-caps
0 S root        32    20  0  80   0 - 22748 ep_pol May20 ?        00:00:10 /usr/bin/python /usr/local/bin/twistd -noy powerstrip.tac
4 S root        33    21  0  80   0 -  1768 poll_s May20 ?        00:00:14 bird -s bird.ctl -d -c /config/bird.cfg
4 S root        35    23  0  80   0 -  2472 futex_ May20 ?        00:00:00 /confd -confdir=/ -interval=5 -watch --log-level=debug -node 192.168.77.10:2379
4 S root        36    25  0  80   0 - 37390 ep_pol May20 ?        00:00:03 /usr/bin/python /usr/bin/calico-felix
0 S root        37    26  0  80   0 -  1868 hrtime May20 ?        00:00:01 tail -f -n 0 /var/log/syslog
4 S root        39    24  0  80   0 - 20952 ep_pol May20 ?        00:00:09 python powerstrip-calico.py
4 R root     19410     0  0  80   0 -  4671 -      13:11 ?        00:00:00 ps -efl

IP Management

By default, Calico uses 192.168.0.0/16 as the IP pool from which it can use addresses. I want to restrict it to a 192.168.89.0/24 subnet.

On trinity10:

calicoctl pool show
calicoctl pool add 192.168.89.0/24
calicoctl pool remove 192.168.0.0/16
calicoctl pool show

Containers

Now we can start containers. We'll give them names just for ease of use.

I specify IP addresses with a CALICO_IP environment variable, just for purposes of this demo. You can specify CALICO_IP=auto to have Calico pick randomly from the pool. If you don't specify CALICO_IP you'll get a non-calico address on br0.

On trinity10 we create the first container:

export ETCDCTL_PEERS=http://192.168.77.10:2379,http://192.168.77.20:2379,http://192.168.77.30:2379
export ETCD_AUTHORITY=192.168.77.10:2379

export DOCKER_HOST=localhost:2377
docker run -it -e CALICO_IP=192.168.89.100 --name group1-node1 ubuntu /bin/bash

which created container 47f2320c9d08.

and on trinity20 create another container:

export ETCDCTL_PEERS=http://192.168.77.10:2379,http://192.168.77.20:2379,http://192.168.77.30:2379
export ETCD_AUTHORITY=192.168.77.20:2379

export DOCKER_HOST=localhost:2377
docker run -it -e CALICO_IP=192.168.89.120 --name group1-node2 ubuntu /bin/bash

Inside the container, it looks like:

root@47f2320c9d08:/# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
29: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 2e:f0:43:59:50:2f brd ff:ff:ff:ff:ff:ff
    inet 192.168.89.100/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::2cf0:43ff:fe59:502f/64 scope link 
       valid_lft forever preferred_lft forever

Note the 192.168.89.100/32 there -- Calico uses individual host addressing.

Next, we need to setup the "network profile" (a bit like AWS security groups, see the AdvancedNetworkPolicy docs):

On trinity10:

mak@trinity10:~$ calicoctl profile PROF_GROUP1 member add group1-node1
Added group1-node1 to PROF_GROUP1

On trinity20:

mak@trinity20:~$ ./calicoctl profile PROF_GROUP1 member add group1-node2
Added group1-node2 to PROF_GROUP1

that adds both nodes to the same group. Now they can ping each-other:

root@47f2320c9d08:/# ping -c 1 192.168.89.120
PING 192.168.89.120 (192.168.89.120) 56(84) bytes of data.
64 bytes from 192.168.89.120: icmp_seq=1 ttl=62 time=0.334 ms

--- 192.168.89.120 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.334/0.334/0.334/0.000 ms

Good stuff!

Routing

But two problems: they cannot talk to the internet, and my LAN can't talk to them.

To talk to the internet, the internal 192.168.89.0/24 addresses need to masquerade. This tidbit suggests you masquerade on the Docker hosts:

iptables -t nat -A POSTROUTING -s 192.168.89.0/16 ! -d 192.168.89.0/16 -j MASQUERADE

and indeed that works. But for my purposes I want to keep the packes un-masqueraded until they leave my network at the main router.

To be able to have my LAN talk to the containers, I'll need routes.

Further on we'll solve this all with BGP, but for now I'll use static routes.

On trinity-router I add a route 192.168.89.0/24 via gateway 192.168.77.10 (trinity10), using the Mikrotik UI (IP → Routes). Now if I ping (Tools → ping) 192.168.89.100 which is the container on trinity10, I get "timeout" results -- while the packets reach trinity10, iptables prevents them reaching the container. To add permissions I use:

cat >prof_group1.json <<EOM
{
  "id": "PROF_GROUP1", 
  "inbound_rules": [
    {
      "action": "allow", 
      "src_tag": "PROF_GROUP1"
    },
    {
      "action": "allow"
    }
  ], 
  "outbound_rules": [
    {
      "action": "allow"
    }
  ]
}
EOM

calicoctl profile PROF_GROUP1 rule update < prof_group1.json

and then the ping succeeds. This set of rules is overly permissive, but for my test purposes is okay. There is a currently a bug with more fine-grained rules, but I expect that will be fixed soon.

Next, I add a route on the main-router: 192.168.89.0/24 via gateway 192.168.0.2 (the home side of trinity10). Now I can ping from PCs on my work subnet too.

And because the main router was already configured to apply SNAT on traffic to the internet, the containers can talk to the internet too: if I do a nc www.google.com 80 on the container, I see the NAT mapping on the main-router (IP → Firewall → Connections) from 192.168.89.100:43672 to 216.58.210.4:80.

Yay!

BGP

With the previous solution I had to worry about adding static routes, and all traffic routes through trinity10, a single point of failure, and performance bottleneck.

What I really want is to have my trinity-router act as a Route Reflector, and have the Calico BIRD servers share their routes, so that packets from my LAN to a particular container go straight to its docker host.

On trinity10:

calicoctl bgppeer rr add 192.168.77.1

Then on the trinity-router:

  • Disable the old static route.
  • Change the AS number of the router: BGP → Instances, default, AS 64511, Client To Client Reflection ✓.
  • Add peers: BGP → Peers, Add New, Name "trinity10", Remote address 192.168.77.10, Remote AS 64511, Route Reflect ✓.
  • Add peers: BGP → Peers, Add New, Name "trinity20", Remote address 192.168.77.20, Remote AS 64511, Route Reflect ✓.
  • Check status: BGP → Peers, note state=established
  • Check routes: IP → Routes, note for example: 192.168.89.100/32 via 192.168.77.10. Marked DAb (Dynamic, Active, bgp).
  • verify hosts on the work subnet can ping 192.168.89.100

This is great!

What Calico did

So a quick look under the hood. On trinity10, ip route list says:

default via 192.168.77.1 dev br0 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.42.1 
192.168.77.0/24 dev br0  proto kernel  scope link  src 192.168.77.10 
192.168.89.100 dev cali6c5e6e96ffb  scope link 
192.168.89.120 via 192.168.77.20 dev br0  proto bird 

Note explicit routes for both containers. The one on the remote host routes to that host, and the local one goes via a cali6c5e6e96ffb interface (which is the host end of the virtual interface in the container I assume).

A docker inspect group1-node1 shows:

        "Env": [
            "CALICO_IP=192.168.89.100"
        ],
        "ExposedPorts": null,
...
    "NetworkSettings": {
        "Bridge": "",
        "Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "IPAddress": "",
        "IPPrefixLen": 0,
        "IPv6Gateway": "",
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "MacAddress": "",
        "PortMapping": null,
        "Ports": null
    },

It seems a little unfortunate you can't get the address from the NetworkSettings. You can see the containers with addresses using calicoctl shownodes --detailed:

+-----------+---------------+--------------+----------------------------------+-------------------+-------------------+--------+
|    Host   | Workload Type | Workload ID  |           Endpoint ID            |     Addresses     |        MAC        | State  |
+-----------+---------------+--------------+----------------------------------+-------------------+-------------------+--------+
| trinity20 |     docker    | 64a1389bfc94 | 38895e28ffbd11e480ef002590f518ca | 192.168.89.120/32 | ae:54:9f:cd:d8:56 | active |
| trinity10 |     docker    | 47f2320c9d08 | 6c5e6e96ffbd11e49553002590f526b6 | 192.168.89.100/32 | 2e:f0:43:59:50:2f | active |
+-----------+---------------+--------------+----------------------------------+-------------------+-------------------+--------+

The iptables-save shows the profile configuration:

# Completed on Thu May 21 20:05:30 2015
# Generated by iptables-save v1.4.21 on Thu May 21 20:05:30 2015
*mangle
:PREROUTING ACCEPT [21358889:2153598084]
:INPUT ACCEPT [21358851:2153595129]
:FORWARD ACCEPT [38:2955]
:OUTPUT ACCEPT [20543827:1779165969]
:POSTROUTING ACCEPT [20543857:1779168252]
COMMIT
# Completed on Thu May 21 20:05:30 2015
# Generated by iptables-save v1.4.21 on Thu May 21 20:05:30 2015
*filter
:INPUT ACCEPT [4732010:410531931]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [4571023:403998200]
:DOCKER - [0:0]
:felix-FORWARD - [0:0]
:felix-FROM-ENDPOINT - [0:0]
:felix-INPUT - [0:0]
:felix-TO-ENDPOINT - [0:0]
:felix-from-15193082ff4 - [0:0]
:felix-from-6c5e6e96ffb - [0:0]
:felix-p-PROF_GROUP1-i - [0:0]
:felix-p-PROF_GROUP1-o - [0:0]
:felix-to-15193082ff4 - [0:0]
:felix-to-6c5e6e96ffb - [0:0]
-A INPUT -j felix-INPUT
-A FORWARD -j felix-FORWARD
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A felix-FORWARD -i cali+ -j felix-FROM-ENDPOINT
-A felix-FORWARD -o cali+ -j felix-TO-ENDPOINT
-A felix-FORWARD -i cali+ -j ACCEPT
-A felix-FORWARD -o cali+ -j ACCEPT
-A felix-FROM-ENDPOINT -i cali15193082ff4 -g felix-from-15193082ff4
-A felix-FROM-ENDPOINT -i cali6c5e6e96ffb -g felix-from-6c5e6e96ffb
-A felix-FROM-ENDPOINT -j DROP
-A felix-INPUT -i cali+ -j felix-FROM-ENDPOINT
-A felix-INPUT -i cali+ -j ACCEPT
-A felix-TO-ENDPOINT -o cali15193082ff4 -g felix-to-15193082ff4
-A felix-TO-ENDPOINT -o cali6c5e6e96ffb -g felix-to-6c5e6e96ffb
-A felix-TO-ENDPOINT -j DROP
-A felix-from-6c5e6e96ffb -m conntrack --ctstate INVALID -j DROP
-A felix-from-6c5e6e96ffb -m conntrack --ctstate RELATED,ESTABLISHED -j RETURN
-A felix-from-6c5e6e96ffb -p udp -m udp --sport 68 --dport 67 -j RETURN
-A felix-from-6c5e6e96ffb -s 192.168.89.100/32 -m mac --mac-source 2E:F0:43:59:50:2F -g felix-p-PROF_GROUP1-o
-A felix-from-6c5e6e96ffb -m comment --comment "Anti-spoof DROP (endpoint 6c5e6e96ffbd11e49553002590f526b6):" -j DROP
-A felix-p-PROF_GROUP1-i -m set --match-set felix-v4-PROF_GROUP1 src -j RETURN
-A felix-p-PROF_GROUP1-i -j DROP
-A felix-p-PROF_GROUP1-i -m comment --comment "Default DROP rule (PROF_GROUP1):" -j DROP
-A felix-p-PROF_GROUP1-o -j RETURN
-A felix-p-PROF_GROUP1-o -m comment --comment "Default DROP rule (PROF_GROUP1):" -j DROP
-A felix-to-6c5e6e96ffb -m conntrack --ctstate INVALID -j DROP
-A felix-to-6c5e6e96ffb -m conntrack --ctstate RELATED,ESTABLISHED -j RETURN
-A felix-to-6c5e6e96ffb -g felix-p-PROF_GROUP1-i
-A felix-to-6c5e6e96ffb -m comment --comment "Endpoint 6c5e6e96ffbd11e49553002590f526b6:" -j DROP
COMMIT
# Completed on Thu May 21 20:05:30 2015

Conclusion

I'm excited -- this approach sounds natural, and it works as advertised. I look forward to further experimentation, and rolling it out on our build lab. Things I want to look at are resource orchestration (Mesos/kubernetes) and DNS integration. And it'd be nice to look at IPv6 again.

It will be interesting to see how this develops as part of the work on Docker's new libnetwork, and compares to other "batteries included, but swappable" options. They are working on it right now.

While working on this post I ran into some issues, and reached out to Calico developer Cory Benfield "Lukasa", who helped me with great knowledgeable and enthousiasm -- many thanks!

Keep up-to-date with Calico: @projectcalico on Twitter, and #calico on Freenode.