Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Step-by-step guide

Let's suppose that osd.14 (/dev/vdd) is broken.

Let's verify that this is a Bluestore OSD:


Code Block
# ceph osd metadata 14 | grep osd_ob
"osd_objectstore": "bluestore",




Let's find the relevant devices:

Code Block
[root@c-osd-5 /]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-14/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-14//block": {
        "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5",
        "size": 107369988096,
        "btime": "2019-01-30 16:33:54.429292",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "7a8cb8ff-562b-47da-a6aa-507136587dcf",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQDWw1Fc6g0zARAAy97VirlJ+wC7FmjlM0w3aQ==",
        "ready": "ready",
        "whoami": "14"
    },
    "/var/lib/ceph/osd/ceph-14//block.db": {
        "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5",
        "size": 53687091200,
        "btime": "2019-01-30 16:33:54.432415",
        "description": "bluefs db"
    }
}



Let's find the volume groups used for the block and block.db:


Code Block
[root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block
lrwxrwxrwx 1 ceph ceph 27 May 13 15:34 /var/lib/ceph/osd/ceph-14//block -> /dev/ceph-block-14/block-14
[root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block.db
lrwxrwxrwx 1 ceph ceph 24 May 13 15:34 /var/lib/ceph/osd/ceph-14//block.db -> /dev/ceph-db-12-15/db-14
[root@c-osd-5 /]# 

Let's verify that vdd is indeed the physical volume used for this OSD:

Code Block
[root@c-osd-5 /]# vgdisplay -v ceph-block-14
  --- Volume group ---
  VG Name               ceph-block-14
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  15
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <100.00 GiB
  PE Size               4.00 MiB
  Total PE              25599
  Alloc PE / Size       25599 / <100.00 GiB
  Free  PE / Size       0 / 0   
  VG UUID               lcEfNK-P7gw-ddeH-ijGC-2d6z-WuUo-hqI1H2
 
  --- Logical volume ---
  LV Path                /dev/ceph-block-14/block-14
  LV Name                block-14
  VG Name                ceph-block-14
  LV UUID                hu4Xop-481K-BJyP-b473-PjEW-OQFT-oziYnc
  LV Write Access        read/write
  LV Creation host, time c-osd-5.novalocal, 2019-01-30 11:22:24 +0100
  LV Status              available
  # open                 4
  LV Size                <100.00 GiB
  Current LE             25599
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:14
 
  --- Physical volumes ---
  PV Name               /dev/vdd     
  PV UUID               2ab6Mn-8c5b-rN1H-zclU-uhnF-YJmF-e0ITMt
  PV Status             allocatable
  Total PE / Free PE    25599 / 0
 
[root@c-osd-5 /]# 

The following operations should be done to remove it from ceph:

Code Block
ceph osd crush reweight osd.14 0

o

ceph osd out osd.14

This will trigger a data movement from that OSD (ceph status will report many objects misplaced)

Wait until there are no more objects misplaced

Then:

Code Block
ceph osd out osd.14

Verifichiamo che si possa "rimuovere" l'OSD

Code Block
[root@ceph-mon-01 ~]# ceph osd safe-to-destroy 14
OSD(s) 14 are safe to destroy without reducing data durability.


Code Block
[root@ceph-osd-02 ~]# systemctl kill ceph-osd@14
[root@ceph-osd-02 ~]# ceph osd destroy 14 --yes-i-really-mean-it
[root@ceph-osd-02 ~]# umount /var/lib/ceph/osd/ceph-14

Cancelliamo il volume group:

Code Block
[root@c-osd-5 /]# vgremove ceph-block-14
Do you really want to remove volume group "ceph-block-14" containing 1 logical volumes? [y/n]: y
Do you really want to remove active logical volume ceph-block-14/block-14? [y/n]: y
  Logical volume "block-14" successfully removed
  Volume group "ceph-block-14" successfully removed
[root@c-osd-5 /]# 


Code Block
languagebash
ceph osd rm osd.14
ceph osd crush remove osd.14


Sostituiamo il disco. Supponiamo che quello nuovo si chiami sempre vdd.

Ricreo volume group e logical volume:


Code Block
[root@c-osd-5 /]# vgcreate ceph-block-14 /dev/vdd
  Physical volume "/dev/vdd" successfully created.
  Volume group "ceph-block-14" successfully created
[root@c-osd-5 /]# lvcreate -l 100%FREE -n block-14 ceph-block-14
  Logical volume "block-14" created.
[root@c-osd-5 /]# 

Alla fine ricreiamo l'OSD:


Code Block
ceph osd set norebalance
ceph osd set nobackfill
 
 
[root@c-osd-5 /]# ceph-volume lvm create --bluestore --data ceph-block-14/block-14 --block.db ceph-db-12-15/db-14 --osd-id 14


Dopo un po`, quando non ci sono piu` pg in peering:


info
Code Block
ceph osd crush reweight osd.14 5.45609
 
ceph osd unset nobackfill
ceph osd unset norebalance



Content by Label
showLabelsfalse
max5
spacesCV
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("osd","ceph","bluestore") and type = "page" and space = "CV"
labelsceph osd

...