Page History

Install the CentOS operation system.

Disable SELinux

Install ceph:

For C7:

Code Block

language	bash

rpm -Uvh https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm
yum install yum-plugin-priorities

For C8:

Code Block

language	bash

rpm -Uvh https://download.ceph.com/rpm-nautilus/el8/noarch/ceph-release-1-1.el8.noarch.rpm

Then:

Code Block

language	bash

yum clean all
yum update
yum install ceph

Add the following lines to /etc/security/limits.conf:

Code Block

language	bash

* soft nofile 65536
* hard nofile 65536

Add the following lines to /etc/sysctl.conf (to prevent "page allocation failure" errors, and to prevent swapping):

Code Block

language	bash

vm.min_free_kbytes = 1572864
vm.swappiness = 0

Remove swap:

Code Block

language	bash

swapoff -a

Edit /etc/fstab and comment the swap line

Disable huge page:

Edit /etc/sysconfig/grub adding "transparent_hugepage=never" to GRUB_CMDLINE_LINUX (e.g. GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never")

Code Block

language	bash

# cp /boot/grub2/grub.cfg ~
# grub2-mkconfig -o /boot/grub2/grub.cfg

Stop and disable puppet, and then reboot:

Code Block

language	bash

systemctl stop puppet; systemctl disable puppet
shutdown -r now

Verify that Transparent huge pages are disabled:

Code Block

language	bash

root@ceph-osd-01 ~]# grep transparent_hugepage /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1062.9.1.el7.x86_64 root=UUID=aa5f2c49-17cf-46fe-8c7a-20f44892c131 ro nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never
[root@ceph-osd-01 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[root@ceph-osd-01 ~]#

Move the host in the hosts_all/CephProd (hosts_all/CephProd-C8 dor CentOS8) hostgroup

Run once puppet:

Code Block

language	bash

puppet agent -t

Enable the nagios sensors.

Check if the machine appears in ganglia

Copy from a ceph-mon-xx host the file /etc/ceph/ceph.client.admin.keyring and sets its owernship and mode that should be:

Code Block

language	bash

-rw-------. 1 ceph ceph 137 Feb 20 13:51 /etc/ceph/ceph.client.admin.keyring

If it doesn't exist yet, create the rack in the ceph crush map:

Code Block

language	bash

ceph osd crush add-bucket Rack12-PianoAlto rack
ceph osd crush move Rack12-PianoAlto root=default

In the considered example, there are 10 SATA disks (/dev/sdc .. /dev/sdl) and 2 SSD disks (/dev/sda and /dev/sdb)

The SATA disks have to be used for data (block), while the SSD disks are for block.db (each SSD disk has to contain the block.db for 5 data disks).

In this example the OSDs will be osd-50 .. osd-59

Prepare the disks for block and block.db:

Code Block

language	bash

# Block
echo "vgcreate on SATA disks..."
vgcreate ceph-block-50 /dev/sdc
vgcreate ceph-block-51 /dev/sdd
vgcreate ceph-block-52 /dev/sde
vgcreate ceph-block-53 /dev/sdf
vgcreate ceph-block-54 /dev/sdg
vgcreate ceph-block-55 /dev/sdh
vgcreate ceph-block-56 /dev/sdi
vgcreate ceph-block-57 /dev/sdj
vgcreate ceph-block-58 /dev/sdk
vgcreate ceph-block-59 /dev/sdl
echo "lvcreate on SATA disks..."
lvcreate -l 100%FREE -n block-50 ceph-block-50
lvcreate -l 100%FREE -n block-51 ceph-block-51
lvcreate -l 100%FREE -n block-52 ceph-block-52
lvcreate -l 100%FREE -n block-53 ceph-block-53
lvcreate -l 100%FREE -n block-54 ceph-block-54
lvcreate -l 100%FREE -n block-55 ceph-block-55
lvcreate -l 100%FREE -n block-56 ceph-block-56
lvcreate -l 100%FREE -n block-57 ceph-block-57
lvcreate -l 100%FREE -n block-58 ceph-block-58
lvcreate -l 100%FREE -n block-59 ceph-block-59
#
# Block.db
echo "vgcreate on SSD disks..."
vgcreate ceph-db-50-54 /dev/sda
vgcreate ceph-db-55-59 /dev/sdb
echo "lvcreate on SSD disks..."
lvcreate -L 89GB -n db-50 ceph-db-50-54
lvcreate -L 89GB -n db-51 ceph-db-50-54
lvcreate -L 89GB -n db-52 ceph-db-50-54
lvcreate -L 89GB -n db-53 ceph-db-50-54
lvcreate -L 89GB -n db-54 ceph-db-50-54
lvcreate -L 89GB -n db-55 ceph-db-55-59
lvcreate -L 89GB -n db-56 ceph-db-55-59
lvcreate -L 89GB -n db-57 ceph-db-55-59
lvcreate -L 89GB -n db-58 ceph-db-55-59
lvcreate -L 89GB -n db-59 ceph-db-55-59

Possible error with vgcreate:

Code Block

language	bash

[root@c-osd-5 /]# vgcreate ceph-block-12 /dev/vdb
Device /dev/vdb excluded by a filter.

This is because the disk has a GPT. Lets delete it with gdisk:

Code Block

language	bash

[root@c-osd-5 /]# gdisk /dev/vdb 
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): x

Expert command (? for help): ?
a set attributes
c change partition GUID
d display the sector alignment value
e relocate backup data structures to the end of the disk
g change disk GUID
h recompute CHS values in protective/hybrid MBR
i show detailed information on a partition
l set the sector alignment value
m return to main menu
n create a new protective MBR
o print protective MBR data
p print the partition table
q quit without saving changes
r recovery and transformation options (experts only)
s resize partition table
t transpose two partition table entries
u replicate partition table on new device
v verify disk
w write table to disk and exit
z zap (destroy) GPT data structures and exit
? print this menu

Expert command (? for help): z
About to wipe out GPT on /dev/vdb. Proceed? (Y/N): y
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Blank out MBR? (Y/N): 
Your option? (Y/N): Y

If it doesn't not exist yet, create the file:

Code Block

language	bash

-rw------- 1 ceph ceph 71 Apr 28 12:18 /var/lib/ceph/bootstrap-osd/ceph.keyring

Code Block

language	bash

# cat /var/lib/ceph/bootstrap-osd/ceph.keyring
[client.bootstrap-osd]
key = AQA+Y6hYQTvEHRAAr4Q/mwHCByv/kokqnu6nCA==

It must match with what appears for the 'client.bootstrap-osd' entry in the 'ceph auth export' output. You can copy the file from another OSD node.

Add (via puppet) the new OSDs in the ceph.conf file

Code Block

language	bash

...
...

[osd.50]
host = ceph-osd-06 #manual deployments only.
public addr = 192.168.61.235
cluster addr = 192.168.222.235
osd memory target = 3221225472


[osd.51]
host = ceph-osd-06 #manual deployments only.
public addr = 192.168.61.235
cluster addr = 192.168.222.235
osd memory target = 3221225472

...
...

Run puppet once to have the file updated on the new OSD node

Code Block

language	bash

puppet agent -t

Disable data movements:

Code Block

language	bash

[root@c-osd-1 /]# ceph osd set norebalance
norebalance is set

[root@c-osd-1 /]# ceph osd set nobackfill
nobackfill is set

[root@c-osd-1 /]# ceph osd set noout
noout is set

Create a first OSD:

Code Block

language	bash

ceph-volume lvm create --bluestore --data ceph-block-50/block-50 --block.db ceph-db-50-54/db-50

The above command could trigger some data movement

Verify with ceph osd df and ceph osd tree that the new OSD is up.

Then move this host (ceph-osd-06 in our example) in the relevant rack:

Code Block

language	bash

ceph osd crush move ceph-osd-06 rack=Rack12-PianoAlto

Ri-verify with ceph osd df and ceph osd tree.

Verify that the OSD is using the right vgs:

Code Block

language	bash

[root@ceph-osd-06 ~]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-50
infering bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-50/block": {
"osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7",
"size": 10000827154432,
"btime": "2019-02-19 11:55:47.553215",
"description": "main",
"bluefs": "1",
"ceph_fsid": "8162f291-00b6-4b40-a8b4-1981a8c09b64",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQCu4Gtc+jKSJhAAKzaAAYuTKWZs9rjJlBXWww==",
"ready": "ready",
"whoami": "50"
},
"/var/lib/ceph/osd/ceph-50/block.db": {
"osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7",
"size": 95563022336,
"btime": "2019-02-19 11:55:47.573213",
"description": "bluefs db"
}
}
[root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block
lrwxrwxrwx 1 ceph ceph 27 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block -> /dev/ceph-block-50/block-50
[root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block.db
lrwxrwxrwx 1 ceph ceph 24 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block.db -> /dev/ceph-db-50-54/db-50
[root@ceph-osd-06 ~]#

Create the other OSDs (use also –osd-id if needed, e.g. when migrating OSDs from filestore to bluestore):

Code Block

language	bash

ceph-volume lvm create --bluestore --data ceph-block-51/block-51 --block.db ceph-db-50-54/db-51
ceph-volume lvm create --bluestore --data ceph-block-52/block-52 --block.db ceph-db-50-54/db-52
ceph-volume lvm create --bluestore --data ceph-block-53/block-53 --block.db ceph-db-50-54/db-53
ceph-volume lvm create --bluestore --data ceph-block-54/block-54 --block.db ceph-db-50-54/db-54
ceph-volume lvm create --bluestore --data ceph-block-55/block-55 --block.db ceph-db-55-59/db-55
ceph-volume lvm create --bluestore --data ceph-block-56/block-56 --block.db ceph-db-55-59/db-56
ceph-volume lvm create --bluestore --data ceph-block-57/block-57 --block.db ceph-db-55-59/db-57
ceph-volume lvm create --bluestore --data ceph-block-58/block-58 --block.db ceph-db-55-59/db-58
ceph-volume lvm create --bluestore --data ceph-block-59/block-59 --block.db ceph-db-55-59/db-59

Reboot the new osd node:

Code Block

language	bash

shutdown -r now

Verify that the new OSDs are up.

Check if the OSD log files belongs to ceph and if they are populated. Otherwise change the ownership and restart the osd

Verify that all buckets are using straw2:

Code Block

language	bash

ceph osd getcrushmap -o crush.map; crushtool -d crush.map | grep straw; rm -f crush.map

If not (i.e. if some are using straw), run the following command:

Code Block

language	bash

ceph osd crush set-all-straw-buckets-to-straw2

Warning: this could trigger a data rebalance

Enable and start puppet:

Code Block

language	bash

systemctl start puppet
systemctl enable puppet

Then, after a few minutes, check that "ceph status" doesn't report Pgs in peering.

Then:

Code Block

language	bash

[root@c-osd-1 /]# ceph osd unset nobackfill
nobackfill is unset
[root@c-osd-1 /]# ceph osd unset norebalance
norebalance is unset
[root@c-osd-1 /]# ceph osd unset noout
noout is unset
[root@c-osd-1 /]#

This should trigger a data movement.

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version Current

Key