...
To restore a cluster, all that is needed is a single snapshot snapshot.db
file. A cluster restore with etcdctl snapshot restore
creates new etcd data directories; all members should restore using the same snapshot. Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.
...
Code Block |
---|
language | bash |
---|
title | Restore snapshot |
---|
|
# Copy the snapshot.db file to all etcd nodes
$ scp snapshot.db etcd1:
# Repeat this command for all etcd members, to create the directory
$ etcdctl snapshot restore <path>/<snapshot> [--data-dir <data_dir>] --name etcd1 --initial-cluster etcd1=https://<IP1>:2380,etcd2=https://<IP2>:2380,etcd3=https://<IP3>:2380 --initial-cluster-token <token> --initial-advertise-peer-urls https://<IP1>:2380
# For instance, for the first node
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap1 --name etcd1 --name etcd1 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd1:2380
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap2 --name etcd2 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd2:2380
$ etcdctl snapshot restore snapshot.db --data-dir /tmp/restore_snap3 --name etcd3 --initial-cluster etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3= |
Before we continue, let's stop all the API server instances. Then stop the etcd service on the nodes.
Code Block |
---|
language | bash |
---|
title | Pause cluster |
---|
|
# Let's go to the master(s) and temporarily move the "kube-apiserver.yaml" file
$ sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
# Stop etcd service on etcd node(s)
$ sudo systemctl stop etcd.service |
As said, the restore
command generates the member
directory, which will be pasted into the path where the etcd node data are stored (the default path is /var/lib/etcd/
)
Code Block |
---|
language | bash |
---|
title | Copy snapshot |
---|
|
https://etcd3:2380 --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://etcd3:2380
# Paste the snapshot into the path where the etcd node data are stored
$ cp -r /tmp/snap_dir1/member<path>/<restore> /var/lib/etcd/ |
...
# For each etcd node
$ cp -r $HOME/etcd1.etcd/member/ /var/lib/etcd/ |
...
Finally, we restart the etcd service on the nodes and restore the API server
Code Block |
---|
language | bash |
---|
title | Restart cluster |
---|
|
# Start etcd service on etcd node(s)
$ sudo systemctl start etcd.service
# Restore the API server from the master(s)
$ sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/ |
Tip |
---|
We also recommend restarting any components (e.g. kube-scheduler , kube-controller-manager , kubelet ) to ensure that they don't rely on some stale data. Note that in practice, the restore takes a bit of time. During the restoration, critical components will lose leader lock and restart themselves. |
...
Code Block |
---|
language | bash |
---|
title | Copy snapshot |
---|
|
# Paste the snapshot into the path where the etcd node data are stored
$ cp -r <path>/<restore> /var/lib/etcd/
# For each etcd node
$ cp -r /tmp/restore_snap1/member /var/lib/etcd/
$ cp -r /tmp/restore_snap2/member /var/lib/etcd/
$ cp -r /tmp/restore_snap3/member /var/lib/etcd/ |