All Kubernetes objects are stored on etcd. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes. The snapshot file contains all the Kubernetes states and critical information. For more information see the official guide.

Prerequisites

Within the cluster nodes that act only as etcd, the executable files are probably already present. If the files are not present in the etcd node or you want to use the client outside the etcd node/cluster, follow the steps below.

To be able to back up a k8s cluster, first we need the executable file etcdctl, downloadable from here (choose the appropriate release). After that, unpack the archive file (this results in a directory containing the binaries) and add the executable binaries to your path (i.e. /usr/local/bin)

Download binary

# For example, let's download release 3.5.4
$ wget https://github.com/etcd-io/etcd/releases/download/v3.5.4/etcd-v3.5.4-linux-amd64.tar.gz
$ tar xzvf etcd-v3.5.4-linux-amd64.tar.gz
$ sudo cp etcd-v3.5.4-linux-amd64/etcdctl /usr/local/bin/
$ etcdctl version
etcdctl version: 3.5.4
API version: 3.5

Once we have the executable file, we need the certificates to be able to communicate with the etcd node(s). If you don't know the location of the certificates, you can retrieve it using the grep command in the /etc/kubernetes folder on the master node (the default directory that holds the certificates in the etcd node is /etc/ssl/etcd/ssl). Save the location of the certificates in the following environment variables

etcd certificates

# Insert the following lines inside the ".bashrc" file, then use "$ source .bashrc" to apply the changes
export ETCDCTL_CERT=/<path>/cert.pem
export ETCDCTL_CACERT=/<path>/ca.pem
export ETCDCTL_KEY=/<path>/key.pem
ETCDCTL_ENDPOINTS=etcd1:2379,etcd2:2379,etcd3:2379

Let's try running some commands, to check the status of the etcd cluster

Example commands

$ etcdctl member list --write-out=table
+------------------+---------+-------+------------------------------+------------------------------+------------+
|        ID        | STATUS  | NAME  |          PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+------------------------------+------------------------------+------------+
| 10d6fab05b506a11 | started | etcd3 | https://192.168.100.180:2380 | https://192.168.100.180:2379 |      false |
| 263e9aba708b17f7 | started | etcd1 | https://192.168.100.151:2380 | https://192.168.100.151:2379 |      false |
| 74e5f49a4cd290f1 | started | etcd2 |  https://192.168.100.88:2380 |  https://192.168.100.88:2379 |      false |
+------------------+---------+-------+------------------------------+------------------------------+------------+

$ etcdctl endpoint health --write-out=table
+------------+--------+--------------+-------+
|  ENDPOINT  | HEALTH |     TOOK     | ERROR |
+------------+--------+--------------+-------+
| etcd2:2379 |   true |  10.314162ms |       |
| etcd1:2379 |   true |  10.775429ms |       |
| etcd3:2379 |   true | 114.846224ms |       |
+------------+--------+--------------+-------+

$ etcdctl endpoint status --write-out=table
+------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  ENDPOINT  |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| etcd1:2379 | 263e9aba708b17f7 |   3.5.4 |   31 MB |     false |      false |         5 |     296505 |             296505 |        |
| etcd2:2379 | 74e5f49a4cd290f1 |   3.5.4 |   30 MB |      true |      false |         5 |     296505 |             296505 |        |
| etcd3:2379 | 10d6fab05b506a11 |   3.5.4 |   30 MB |     false |      false |         5 |     296505 |             296505 |        |
+------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Save and Restore

If you have an etcd cluster, you can select only one node, otherwise you get the error snapshot must be requested to one selected node, not multiple. Then unset the ETCDCTL_ENDPOINTS environment variable, if present.

Take a snapshot of the etcd datastore using the following command (official documentation), which generates the <snapshot> file

Save snapshot

$ etcdctl snapshot save <path>/<snapshot> --endpoints=<endpoint>:<port>
# Instead of <endpoint> you can substitute a hostname or an IP
$ etcdctl snapshot save snapshot.db --endpoints=etcd1:2379
$ etcdctl snapshot save snapshot.db --endpoints=192.168.100.88:2379
# View that the snapshot was successful
$ etcdctl snapshot status snapshot.db --write-out=table
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| b89543b8 | 40881777 |      54340 |     106 MB |
+----------+----------+------------+------------+

To restore a cluster, all that is needed is a single snapshot db file. A cluster restore with etcdctl snapshot restore creates new etcd data directories; all members should restore using the same snapshot. Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.

Now we will use the snapshot backup to restore etcd as shown below. If you want to use a specific data directory for the restore, you can add the location using the --data-dir flag, but the destination directory must be empty and obviously have write permissions.

Restore snapshot

# Repeat this command for all etcd members
$ etcdctl snapshot restore <path>/<snapshot> [--data-dir <data_dir>] --name etcd1 --initial-cluster etcd1=https://<IP1>:2380,etcd2=https://<IP2>:2380,etcd3=https://<IP3>:2380 --initial-cluster-token <token> --initial-advertise-peer-urls https://<IP1>:2380
# For instance
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap1 --name etcd1 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd1:2380
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap2 --name etcd2 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd2:2380 
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap3 --name etcd3 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd3:2380  

# Paste the snapshot into the path where the etcd node data are stored
$ cp -r /tmp/snap_dir1/member /var/lib/etcd/

The restore command generates the member directory, which will be pasted into the path where the etcd node data are stored (the default path is /var/lib/etcd/)

Copy snapshot

# Paste the snapshot into the path where the etcd node data are stored
$ cp -r <path>/<restore> /var/lib/etcd/
# For each etcd node 
$ cp -r /tmp/restore_snap1/member /var/lib/etcd/
$ cp -r /tmp/restore_snap2/member /var/lib/etcd/
$ cp -r /tmp/restore_snap3/member /var/lib/etcd/

Page tree

6) etcd BackUp

Prerequisites

Save and Restore