Step-by-step guide
Let's suppose that osd.14 (/dev/vdd) is broken.
Let's verify that this is a Bluestore OSD:
# ceph osd metadata 14 | grep osd_ob "osd_objectstore": "bluestore",
Let's find the relevant devices:
[root@c-osd-5 /]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-14/ infering bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-14//block": { "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5", "size": 107369988096, "btime": "2019-01-30 16:33:54.429292", "description": "main", "bluefs": "1", "ceph_fsid": "7a8cb8ff-562b-47da-a6aa-507136587dcf", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQDWw1Fc6g0zARAAy97VirlJ+wC7FmjlM0w3aQ==", "ready": "ready", "whoami": "14" }, "/var/lib/ceph/osd/ceph-14//block.db": { "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5", "size": 53687091200, "btime": "2019-01-30 16:33:54.432415", "description": "bluefs db" } }
Let's find the volume groups used for the block and block.db:
[root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block lrwxrwxrwx 1 ceph ceph 27 May 13 15:34 /var/lib/ceph/osd/ceph-14//block -> /dev/ceph-block-14/block-14 [root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block.db lrwxrwxrwx 1 ceph ceph 24 May 13 15:34 /var/lib/ceph/osd/ceph-14//block.db -> /dev/ceph-db-12-15/db-14 [root@c-osd-5 /]#
Let's verify that vdd is indeed the physical volume used for this OSD:
[root@c-osd-5 /]# vgdisplay -v ceph-block-14 --- Volume group --- VG Name ceph-block-14 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 15 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size <100.00 GiB PE Size 4.00 MiB Total PE 25599 Alloc PE / Size 25599 / <100.00 GiB Free PE / Size 0 / 0 VG UUID lcEfNK-P7gw-ddeH-ijGC-2d6z-WuUo-hqI1H2 --- Logical volume --- LV Path /dev/ceph-block-14/block-14 LV Name block-14 VG Name ceph-block-14 LV UUID hu4Xop-481K-BJyP-b473-PjEW-OQFT-oziYnc LV Write Access read/write LV Creation host, time c-osd-5.novalocal, 2019-01-30 11:22:24 +0100 LV Status available # open 4 LV Size <100.00 GiB Current LE 25599 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 252:14 --- Physical volumes --- PV Name /dev/vdd PV UUID 2ab6Mn-8c5b-rN1H-zclU-uhnF-YJmF-e0ITMt PV Status allocatable Total PE / Free PE 25599 / 0 [root@c-osd-5 /]#
The following operations should be done to remove it from ceph:
ceph osd crush reweight osd.14 0 o ceph osd out osd.14
This will trigger a data movement from that OSD (ceph status will report many objects misplaced)
Wait until there are no more objects misplaced
Then:
ceph osd out osd.14
Verifichiamo che si possa "rimuovere" l'OSD
[root@ceph-mon-01 ~]# ceph osd safe-to-destroy 14 OSD(s) 14 are safe to destroy without reducing data durability.
[root@ceph-osd-02 ~]# systemctl kill ceph-osd@14 [root@ceph-osd-02 ~]# ceph osd destroy 14 --yes-i-really-mean-it [root@ceph-osd-02 ~]# umount /var/lib/ceph/osd/ceph-14
Cancelliamo il volume group:
[root@c-osd-5 /]# vgremove ceph-block-14 Do you really want to remove volume group "ceph-block-14" containing 1 logical volumes? [y/n]: y Do you really want to remove active logical volume ceph-block-14/block-14? [y/n]: y Logical volume "block-14" successfully removed Volume group "ceph-block-14" successfully removed [root@c-osd-5 /]#
ceph osd rm osd.14 ceph osd crush remove osd.14
Sostituiamo il disco. Supponiamo che quello nuovo si chiami sempre vdd.
Ricreo volume group e logical volume:
[root@c-osd-5 /]# vgcreate ceph-block-14 /dev/vdd Physical volume "/dev/vdd" successfully created. Volume group "ceph-block-14" successfully created [root@c-osd-5 /]# lvcreate -l 100%FREE -n block-14 ceph-block-14 Logical volume "block-14" created. [root@c-osd-5 /]#
Alla fine ricreiamo l'OSD:
ceph osd set norebalance ceph osd set nobackfill [root@c-osd-5 /]# ceph-volume lvm create --bluestore --data ceph-block-14/block-14 --block.db ceph-db-12-15/db-14 --osd-id 14
Dopo un po`, quando non ci sono piu` pg in peering:
ceph osd crush reweight osd.14 5.45609 ceph osd unset nobackfill ceph osd unset norebalance