Hosting one domain on multiple servers

The main goal is to have a simple load-balancing and automatic failover.

1. Shared files

The first, most importart part, is sharing files. There's a lot to share: config files, SSL certificates or the contents of the website.

There are (at least) 2 options: glusterfs and csync2.

1.a. GlusterFS

GlusterFS is a network filesystem, where storage servers exports underlying disk storage (regular directories on XFS or EXT4 partitions), replicate data between themselves and provide data to clients through network. Here one machine is both a storage server and a client connecting to localhost.

Advantages:

  • Changes to data are automatically replicated to other nodes.

Annoyances:

  • It's not possible to watch for changes in files with inotify on network filesystems. But we can watch for changes directly in the underlying storage.

Disadvantages:

  • To avoid "split brain" it's required to have at least 3 servers, rather than 2. It's possible for one of them to be an arbiter, that doesn't store data. See documentation.

1.b. Csync2

Csync2 is a tool for asynchornous file synchronization.

Advantages:

  • Works well on any number of hosts.
  • Configuration allows to trigger commands, when specified files are changed.

Annoyances:

  • Potential synchronization conflicts must be resolved manually, or an automatic resolution algorithm must be chosen.

Disadvantages:

  • Synchronization must be manually triggered.

2. DNS

With DNS it's possible to specify multiple A records for a single domain. This provides a basic round-robin load balancing.

Unfortunately this doesn't provide automatic failover. If a client receive IP address of a server, which is not working, it won't pick another IP from DNS, it will just report connection error.

To provide automatic failover, the DNS server itself must monitor health of servers it's pointing to and return to clients records with working addresses and short TTL.

2.a. PowerDNS

In my setup I'm using PowerDNS hosted on my servers. My DNS registrar has "glue records" pointing to my servers.

Configuration is stored in BIND compatible plaintext format. It works well with both synchronization methods, because PowerDNS server is detecting changes in files itself and there's no need to trigger reloading manually.

Additionally PowerDNS supports Lua records. They contain short scripts, which are executed on request. One of provided functions is ifportup, which takes a port number and a list of addresses, and returns a random one that is responding.

Example Lua record:

300 IN LUA  A "ifportup(80, {'x.x.x.x', 'y.y.y.y'})"

3. Web server

The web server choice is not important here. It must serve static content and handle SSL.

I'm using Nginx, but Apache or anything else is fine too. When SSL certificates are changed it must be reloaded or restarted and how it's triggered is described below.

4. Using systemd to start services and mount filesystems in correct order

In my setup, when system starts, these steps must be executed in correct order:

  • Start GlusterFS daemon,
  • Mount glusterfs filesystem (at /mnt/gv0),
  • Start nginx.

Systemd allows to resolve this, because services and mountpoints are units, that can have declared dependencies.

Declaring a dependency between the mountpoint and a service can be done in /etc/fstab:

localhost:/gv0 /mnt/gv0 glusterfs defaults,acl,x-systemd.requires=glusterd.service 0 0

Declaring a dependency between nginx and the mountpoint can be done by adding an override file to nginx service /etc/systemd/system/nginx.service.d/override.conf:

[Unit]
Requires = mnt-gv0.mount
After = mnt-gv0.mount

5. Certbot for Letsencrypt SSL certificates

The simple way of obtaining SSL certificates with certbot is using "webroot" method. It works like this:

  • Ask Letsencrypt server for certificate.
  • Store a token file for web server to present under /.well-known/acme-challenge/.
  • Letsencrypt server checks token file.
  • Letsencrypt server gives SSL certificate.
  • Reload web server.

In case of multiple servers, the extra steps needs to be taken:

  • Token file must be distributed to all servers.
  • SSL certificates must be distributed to all servers.
  • Web servers on all servers must be reloaded.

5.a. Using GlusterFS

Storing token files and SSL certificates on GlusterFS solves the problem of distributing files.

The only problem is with reloading web servers. This can be solved with using inotify to watch for changes in certificates files. The problem with inotify is that it doesn't work on network filesystems. But I can solve this by watching for changes in GlusterFS storage.

In my setup GlusterFS stores files at /mnt/brick0/gv0.

/etc/systemd/system/nginx-reload.path:

[Unit]
Description=reload nginx
After=local-fs.target

[Path]
PathChanged=/mnt/brick0/gv0/letsencrypt/archive/etam-software.eu

[Install]
WantedBy=default.target

/etc/systemd/system/nginx-reload.service:

[Unit]
Description=Reload nginx
Requisite=nginx.service
Wants=local-fs.target
After=nginx.service

[Service]
Type=oneshot
ExecStart=/usr/bin/env systemctl reload nginx

[Install]
WantedBy=default.target

5.b. Using Csync2

With Csync2 every time shared files are updated, synchronization must be triggered.

This configuration contains an action that reloads nginx, when certificates change:

/etc/csync2.cfg:

group webserver {
    host ...;
    host ...;
    key     /etc/csync2_group.key;
    include /etc/csync2.cfg;
    include /etc/letsencrypt;
    include /srv/www/_letsencrypt;
    auto younger;

    action {
        pattern /etc/letsencrypt/archive/*/*;
        exec "systemctl reload nginx";
        do-local;
    }
}

Here's an reimplementation of "webroot" authentication method using "manual" scripts, with the addition of csync2:

/etc/letsencrypt/authenticator.sh (based on docs#hooks):

#!/bin/bash
token_file="/srv/www/_letsencrypt/.well-known/acme-challenge/${CERTBOT_TOKEN}"
echo "$CERTBOT_VALIDATION" > "$token_file"
csync2 -f "$token_file"
csync2 -x "$token_file"

/etc/letsencrypt/cleanup.sh:

#!/bin/bash
rm -f "/srv/www/_letsencrypt/.well-known/acme-challenge/${CERTBOT_TOKEN}"
csync2 -f -r /srv/www/_letsencrypt/.well-known/acme-challenge
csync2 -x /srv/www/_letsencrypt/.well-known/acme-challenge

/etc/letsencrypt/renewal-hooks/post/csync2.sh:

#!/bin/bash
csync2 -f -r /etc/letsencrypt
csync2 -x

Create certificate:

certbot certonly \
    --manual \
    --preferred-challenges=http \
    --manual-auth-hook /etc/letsencrypt/authenticator.sh \
    --manual-cleanup-hook /etc/letsencrypt/cleanup.sh \
    -d example.com \
    --register-unsafely-without-email \
    -n \
    --agree-tos
csync2 -xv

5.c. Certbot timer

In order to avoid certbot running on all hosts at the same time, it can be changed so that it runs once a day, each time on another host.

  • Host 0 /etc/systemd/system/certbot.timer.d/override.conf:

    [Timer]
    OnCalendar=*-*-1/2 01:00
    
  • Host 1 /etc/systemd/system/certbot.timer.d/override.conf:

    [Timer]
    OnCalendar=*-*-2/2 01:00
    

6. Summary

And that's all folks! We've got load balancing and automatic failover using DNS and data replication using GlusterFS or Csync2.