Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To restore a cluster, all that is needed is a single snapshot snapshot.db file. A cluster restore with etcdctl snapshot restore creates new etcd data directories; all members should restore using the same snapshot. Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.

...

Code Block
languagebash
titleRestore snapshot
# Copy the snapshot.db file to all etcd nodes
$ scp snapshot.db etcd1:
# Repeat this command for all etcd members, to create the directory
$ etcdctl snapshot restore <path>/<snapshot> [--data-dir <data_dir>] --name etcd1 --initial-cluster etcd1=https://<IP1>:2380,etcd2=https://<IP2>:2380,etcd3=https://<IP3>:2380 --initial-cluster-token <token> --initial-advertise-peer-urls https://<IP1>:2380
# For instance, for the first node
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap1 --name etcd1 --name etcd1 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd1:2380
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap2 --name etcd2 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd2:2380 
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap3 --name etcd3 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=

Before we continue, let's stop all the API server instances. Then stop the etcd service on the nodes.

Code Block
languagebash
titlePause cluster
# Let's go to the master(s) and temporarily move the "kube-apiserver.yaml" file
$ sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/            
# Stop etcd service on etcd node(s)
$ sudo systemctl stop etcd.service

As said, the restore command generates the member directory, which will be pasted into the path where the etcd node data are stored (the default path is /var/lib/etcd/)

Code Block
languagebash
titleCopy snapshot
https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd3:2380  

# Paste the snapshot into the path where the etcd node data are stored
$ cp -r /tmp/snap_dir1/member<path>/<restore> /var/lib/etcd/

...


# For each etcd node 
$ cp -r $HOME/etcd1.etcd/member/ /var/lib/etcd/

...

Finally, we restart the etcd service on the nodes and restore the API server

Code Block
languagebash
titleRestart cluster
# Start etcd service on etcd node(s)
$ sudo systemctl start etcd.service
# Restore the API server from the master(s)
$ sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/              


Tip

We also recommend restarting any components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data. Note that in practice, the restore takes a bit of time. During the restoration, critical components will lose leader lock and restart themselves.

...

Code Block
languagebash
titleCopy snapshot
# Paste the snapshot into the path where the etcd node data are stored $ cp -r <path>/<restore> /var/lib/etcd/ # For each etcd node $ cp -r /tmp/restore_snap1/member /var/lib/etcd/ $ cp -r /tmp/restore_snap2/member /var/lib/etcd/ $ cp -r /tmp/restore_snap3/member /var/lib/etcd/