Infrastructure/VM cluster

From Open Food Facts wiki

Cluster setup

Open Food Facts uses a Proxmox based cluster to host different virtual machines (VM) on OVH provided servers.

The cluster is made of 4 physical machines ("nodes" or "hosts" in Proxmox jargon):

  • ovh1 and ovh2 are computation oriented nodes: 24 cores, 256 GB RAM, 1TB nvme SSD
  • ovh3 and ovh4 are storage oriented nodes: 32GB RAM, 6x12 TB HDD + 512GB NVMe cache

ovh1 and ovh3 are in Roubaix datacenter, ovh2 and ovh4 in Strasbourg.

At initial setup (january 2021), v6.3 of proxmox has been installed (based on Debian 10 "buster").

Proxmox GUI is available on any of the cluster nodes on port 8006.

Cluster networking

At the networking level, a vRack links the cluster nodes with a 3Gbps private network used to access data on storage servers and replicate data between nodes. MTU is set to 9000 in the private network to take advantage of the high bandwidth.

Cluster storage

All storage is managed using ZFS which provides:

  • volume management (like lvm)
  • redundancy (like mdadm)
  • encryption (like luks)
  • compression
  • snapshots
  • quota

Snapshots allow efficient synchronization between remote storage, and is used extensively by Proxmox to replicate data across the nodes. Snapshots simplify backups and allow rollbacks.

Virtualization / Containers

Proxmox allows full virtualization (VM, using QEMU) and containerization (CT using LXC).

We use LXC based containers (CT) to run our "Virtual Machines" as it has a much lower overhead compared to real VM using QEMU (required to run non Linux OS based VM). These containers can contain themselves containers like docker if needed.

All resources are shared and dynamically allocated, thus can be reallocated at any time without reboot.

Containers are numbered (CTID) from 100 and increasing.

Network allocation

To keep things simple, CT internal IP address are allocated using the rule CTID > 10.1.0.CTID, for example 10.1.0.100

Containers have access to other containers thru the 10.1.0.x IP addresses.

Storage / replication / backups / retention

Container storage is managed using ZFS subvolumes. They are dynamically allocated as quota at ZFS level, not like partitions or disk images and do not need resize. Promox GUI allows quota increase, reducing them can be done directly with ZFS CLI.

Container replication is done by Proxmox to copy a container storage on one or more other nodes. This replication creates a temporary snapshots, send the difference from the previous snapshot without scanning the filesystem itself like rsync. A typical replication takes seconds or a few minutes, not hours.

Container backups are done on snapshots which are tarred and compressed using zstd (default), providing high compression ratio with low CPU use). They are stored on a ZFS subvol named "backups", shared between nodes by NFS.

Promox allows backup retention management. It is possible to define how many daily, weekly, monthly backup to keep. Email is sent in case of backup troubles.

Migration / High-Availability

Proxmox handles container migration between nodes in the cluster. It is stopped, its storage in replicated, then it is restarted on the new node. A previously replicated container can migrate in a very short time from one node to another.

Promox High-Availablity (HA) feature automatically migrates containers when the node hosting them is down. This is detected by the other nodes, forming a "quorum" (a majority of nodes considering a node is down in the cluster). In that case, the container is started with the last replicated storage. This means that the replication frequency of these HA nodes should be high (a few minutes) as changes since the last replication we be lost in the migration.

Docker in a container

Deprecated: We don't use this at it leads to weird bugs and there is no support for this configuration.

When Docker is run inside a container, it needs a few tweaks:

  • first of all, install Docker from official repositories and NOT from your Linux distribution
  • edit /etc/pve/lxc/120.conf (if the CT number is 120), and add: features: keyctl=1,nesting=1
  • reboot the container.

Usage guidelines (to be completed)

Here is a few guides to follow for all new virtual machines:

  1. MUST: no SSH root login on the nodes, even with SSH key.
  2. MUST: sudoers (root access using sudo) limited to SSH key based authentication
  3. SHOULD: use SSH keys published on Github: giving access to a server is then simple and secure:
    curl https://github.com/CharlesNepote.keys | tee -a ~/.ssh/authorized_keys
  4. SHOULD: take care of production resources: use "nice" / "ionice" for scripts manually launched. Stéphane's tip: just use
    nice ./mycommand whatever arguments
    (nice default to lower the priority). CPU and I/O priorities can be set if needed at the virtualization level.

Usages

SSH connection

User accounts and SSH keys are directly created using data from github accounts.

At creation, the linux username matches the github account name, and the SSH keys are retrieved from your github public keys.

(TODO: detect SSH key changes on github and inform administrator to update them if confirmed)

If you have only one SSH key

You can connect directly with:

$ ssh -J [FQDN server] [internal IP]

Example:

$ ssh -J ovh1.openfoodfacts.org 10.1.0.103

If you have many SSH keys

ssh -i ~/.ssh/github_rsa -J stephanegigandet@ovh1.openfoodfacts.org stephanegigandet@10.1.0.103

You can modify your ~/.ssh/config file and add:

Host robotoff
identityfile ~/.ssh/github_rsa
proxyjump stephanegigandet@ovh1.openfoodfacts.org
hostname 10.1.0.103
user stephanegigandet

Then you can connect yourself with a simpler command:

$ ssh robotoff

Tip to copy files from the host to your machine:

$ scp -i ~/.ssh/id_rsa -J CharlesNepote@ovh1.openfoodfacts.org CharlesNepote@10.1.0.200:/home/CharlesNepote/file.csv ./myfile.csv

$ # or simpler, if you have created a new Host in your ~/.ssh/config file:

$ scp host_name:/home/CharlesNepote/file.csv ./myfile.csv

Container creation

You need to have an account on the Proxmox infra.

  • Login to the Proxmox web interface
  • Click on "Create CT"
  • Use a "Hostname" to let people know what it is about. Eg. "robotoff", "wiki", "proxy"...
  • Password: put something complex and forget it, as we will connect through SSH and not the web interface
  • Create a root password and add a ssh key.
  • Swap: 256
  • Network:
    • Bridge: vmbr0
    • IPv4: 10.1.0.104/24 (you need to use /24; end of IPv4 should be the same as the Proxmox container id; container 101 has the 10.1.0.101 IP).
    • Gateway: 10.0.0.1
  • Then connect to ovh1 or ovh2 machine, and launch the following scripts:
    • $ sudo /root/cluster-scripts/ct_postinstall # choose the container ID when asked
    • $ sudo /root/cluster-scripts/mkuser (if the GitHub username is the desired username)
    • or $ sudo /root/cluster-scripts/mkuseralias (to specify a different username than the GitHub username)
  • then you can login to the machine (see SSH connexion)
  • Also check "options" of the container and:
    • Start at boot: Yes
    • Protection: Yes (to avoid deleting it by mistake)
  • And add replication to ovh3:
    • In the Replication menu of the container, "Add" one
    • Target: ovh3
    • Schedule: */5 if you want every 5 minutes (takes less than 10 seconds, thanks to ZFS)

Container administration

From ovh1, you can use the "pct" (Proxmox container) command:

  • sudo pct list : list of containers
  • sudo pct enter 101 : enter container 101

Reverse proxy setup to access a web server on a container

To access a web server running on a container (e.g. 111), requests have first to pass through a nginx reverse proxy running on container 101:

  • create a CNAME record pointing the desired domain (e.g. metabase.openfoodfacts.org) to ovh1.openfoodfacts.org.
  • enter container 101 to setup the reverse proxy:
    • copy one of the existing config files (e.g. feedme.conf):
      • root@proxy:/etc/nginx/conf.d# cp feedme.conf metabase.conf # don't forget the .conf at the end
    • edit the config file
      • replace the domain name, directories, and target container (and possibly port)
      • remove the lines that load the SSL certificate (as it does not exist yet)
    • create the SSL certificate
      • root@proxy:/etc/nginx/conf.d# certbot --nginx

VM creation (QEMU)

  • Login to the Proxmox web interface
  • Click on "Create VM"
  • General:
    • USE A VM ID > 200; eg. 201
    • Use a "Hostname" to let people know what it is about. Eg. "robotoff", "wiki", "proxy"...
  • System: Leave all system settings to default
  • Hard Disk: select disk size (leave other settings to default)
  • CPU:
    • Type: host
    • Leave default for other settings
  • Memory: select the RAM you want
  • Network: leave all settings to default

Then install OS with noVNC in the proxmox UI.

Tips for Debian install:

  • IP: 10.1.0.201/8 for the installation and then switch to /24 when install is done
  • Gateway IP: 10.0.0.1/8 for the installation and then switch to 10.0.0.1/24 when install is done
  • Filesystem: ext4

See also, inspirations