Install the AlmaLinux8/AlmaLinux9 operation system.
Enable the hw check via nagios querying the idrac
Disable SELinux
Install epel
yum install epel-release
Install ceph:
For AlmaLinux8:
rpm -Uvh https://download.ceph.com/rpm-quincy/el8/noarch/ceph-release-1-1.el8.noarch.rpm
For AlmaLinux9:
rpm -Uvh https://download.ceph.com/rpm-quincy/el9/noarch/ceph-release-1-1.el9.noarch.rpm
Then:
yum clean all yum update yum install ceph
Add the following lines to /etc/security/limits.conf:
* soft nofile 65536 * hard nofile 65536
Add the following lines to /etc/sysctl.conf (to prevent "page allocation failure" errors, and to prevent swapping):
vm.min_free_kbytes = 1572864 vm.swappiness = 0
Remove swap:
swapoff -a
Edit /etc/fstab and comment the swap line
Disable huge page:
Edit /etc/sysconfig/grub adding "transparent_hugepage=never" to GRUB_CMDLINE_LINUX (e.g. GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never")
# cp /boot/grub2/grub.cfg ~ # grub2-mkconfig -o /boot/grub2/grub.cfg
PS: the pathname of the grub.cfg can be different on some machines: /boot/efi/EFI/almalinux/grub.cfg
Stop and disable puppet, and then reboot:
systemctl stop puppet; systemctl disable puppet shutdown -r now
Verify that Transparent huge pages are disabled (il primo check parrebbe non essere rilevante per AlmaLinux9):
root@ceph-osd-01 ~]# grep transparent_hugepage /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.9.1.el7.x86_64 root=UUID=aa5f2c49-17cf-46fe-8c7a-20f44892c131 ro nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never [root@ceph-osd-01 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] [root@ceph-osd-01 ~]#
Se non ha funzionato (dovrebbe essere per AlmaLinux9):
[root@c-osd-1 almalinux]# tuned-adm active Current active profile: virtual-guest [root@c-osd-1 almalinux]# mkdir /etc/tuned/myprofile-nothp
Creo il file /etc/tuned/myprofile-nothp/tuned.conf (il valore dell'include dipende dall'output precedente)
[root@c-osd-1 almalinux]# cat /etc/tuned/myprofile-nothp/tuned.conf [main] include=virtual-guest [vm] transparent_hugepages=never
[root@c-osd-1 almalinux]# chmod +x /etc/tuned/myprofile-nothp/tuned.conf [root@c-osd-1 almalinux]# tuned-adm profile myprofile-nothp reboot
e riverificare
Move the host in the hosts_all/CephProd (hosts_all/CephProd-C8 dor CentOS8) hostgroup
Run once puppet:
puppet agent -t
Enable the nagios sensors.
Check if the machine appears in ganglia
Copy from a ceph-mon-xx host the file /etc/ceph/ceph.client.admin.keyring and sets its owernship and mode that should be:
-rw-------. 1 ceph ceph 137 Feb 20 13:51 /etc/ceph/ceph.client.admin.keyring
If it doesn't exist yet, create the rack in the ceph crush map:
ceph osd crush add-bucket Rack12-PianoAlto rack ceph osd crush move Rack12-PianoAlto root=default
In the considered example, there are 10 SATA disks (/dev/sdc .. /dev/sdl) and 2 SSD disks (/dev/sda and /dev/sdb)
The SATA disks have to be used for data (block), while the SSD disks are for block.db (each SSD disk has to contain the block.db for 5 data disks).
In this example the OSDs will be osd-50 .. osd-59
Prepare the disks for block and block.db:
# Block echo "vgcreate on SATA disks..." vgcreate ceph-block-50 /dev/sdc vgcreate ceph-block-51 /dev/sdd vgcreate ceph-block-52 /dev/sde vgcreate ceph-block-53 /dev/sdf vgcreate ceph-block-54 /dev/sdg vgcreate ceph-block-55 /dev/sdh vgcreate ceph-block-56 /dev/sdi vgcreate ceph-block-57 /dev/sdj vgcreate ceph-block-58 /dev/sdk vgcreate ceph-block-59 /dev/sdl echo "lvcreate on SATA disks..." lvcreate -l 100%FREE -n block-50 ceph-block-50 lvcreate -l 100%FREE -n block-51 ceph-block-51 lvcreate -l 100%FREE -n block-52 ceph-block-52 lvcreate -l 100%FREE -n block-53 ceph-block-53 lvcreate -l 100%FREE -n block-54 ceph-block-54 lvcreate -l 100%FREE -n block-55 ceph-block-55 lvcreate -l 100%FREE -n block-56 ceph-block-56 lvcreate -l 100%FREE -n block-57 ceph-block-57 lvcreate -l 100%FREE -n block-58 ceph-block-58 lvcreate -l 100%FREE -n block-59 ceph-block-59 # # Block.db echo "vgcreate on SSD disks..." vgcreate ceph-db-50-54 /dev/sda vgcreate ceph-db-55-59 /dev/sdb echo "lvcreate on SSD disks..." lvcreate -L 89GB -n db-50 ceph-db-50-54 lvcreate -L 89GB -n db-51 ceph-db-50-54 lvcreate -L 89GB -n db-52 ceph-db-50-54 lvcreate -L 89GB -n db-53 ceph-db-50-54 lvcreate -L 89GB -n db-54 ceph-db-50-54 lvcreate -L 89GB -n db-55 ceph-db-55-59 lvcreate -L 89GB -n db-56 ceph-db-55-59 lvcreate -L 89GB -n db-57 ceph-db-55-59 lvcreate -L 89GB -n db-58 ceph-db-55-59 lvcreate -L 89GB -n db-59 ceph-db-55-59
Possible error with vgcreate:
[root@c-osd-5 /]# vgcreate ceph-block-12 /dev/vdb Device /dev/vdb excluded by a filter.
This is because the disk has a GPT. Lets delete it with gdisk:
[root@c-osd-5 /]# gdisk /dev/vdb GPT fdisk (gdisk) version 0.8.10 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): x Expert command (? for help): ? a set attributes c change partition GUID d display the sector alignment value e relocate backup data structures to the end of the disk g change disk GUID h recompute CHS values in protective/hybrid MBR i show detailed information on a partition l set the sector alignment value m return to main menu n create a new protective MBR o print protective MBR data p print the partition table q quit without saving changes r recovery and transformation options (experts only) s resize partition table t transpose two partition table entries u replicate partition table on new device v verify disk w write table to disk and exit z zap (destroy) GPT data structures and exit ? print this menu Expert command (? for help): z About to wipe out GPT on /dev/vdb. Proceed? (Y/N): y GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. Blank out MBR? (Y/N): Your option? (Y/N): Y
If it doesn't not exist yet, create the file:
-rw------- 1 ceph ceph 71 Apr 28 12:18 /var/lib/ceph/bootstrap-osd/ceph.keyring
# cat /var/lib/ceph/bootstrap-osd/ceph.keyring [client.bootstrap-osd] key = AQA+Y6hYQTvEHRAAr4Q/mwHCByv/kokqnu6nCA==
It must match with what appears for the 'client.bootstrap-osd' entry in the 'ceph auth export' output. You can copy the file from another OSD node.
Add (via puppet) the new OSDs in the ceph.conf file
... ... [osd.50] host = ceph-osd-06 #manual deployments only. public addr = 192.168.61.235 cluster addr = 192.168.222.235 osd memory target = 3221225472 [osd.51] host = ceph-osd-06 #manual deployments only. public addr = 192.168.61.235 cluster addr = 192.168.222.235 osd memory target = 3221225472 ... ...
Run puppet once to have the file updated on the new OSD node
puppet agent -t
Disable data movements:
[root@c-osd-1 /]# ceph osd set norebalance norebalance is set [root@c-osd-1 /]# ceph osd set nobackfill nobackfill is set [root@c-osd-1 /]# ceph osd set noout noout is set
Create a first OSD:
ceph-volume lvm create --bluestore --data ceph-block-50/block-50 --block.db ceph-db-50-54/db-50
The above command could trigger some data movement
Verify with ceph osd df and ceph osd tree that the new OSD is up.
Then move this host (ceph-osd-06 in our example) in the relevant rack:
ceph osd crush move ceph-osd-06 rack=Rack12-PianoAlto
Ri-verify with ceph osd df and ceph osd tree.
Verify that the OSD is using the right vgs:
[root@ceph-osd-06 ~]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-50 infering bluefs devices from bluestore path { "/var/lib/ceph/osd/ceph-50/block": { "osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7", "size": 10000827154432, "btime": "2019-02-19 11:55:47.553215", "description": "main", "bluefs": "1", "ceph_fsid": "8162f291-00b6-4b40-a8b4-1981a8c09b64", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQCu4Gtc+jKSJhAAKzaAAYuTKWZs9rjJlBXWww==", "ready": "ready", "whoami": "50" }, "/var/lib/ceph/osd/ceph-50/block.db": { "osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7", "size": 95563022336, "btime": "2019-02-19 11:55:47.573213", "description": "bluefs db" } } [root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block lrwxrwxrwx 1 ceph ceph 27 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block -> /dev/ceph-block-50/block-50 [root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block.db lrwxrwxrwx 1 ceph ceph 24 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block.db -> /dev/ceph-db-50-54/db-50 [root@ceph-osd-06 ~]#
Create the other OSDs (use also –osd-id if needed, e.g. when migrating OSDs from filestore to bluestore):
ceph-volume lvm create --bluestore --data ceph-block-51/block-51 --block.db ceph-db-50-54/db-51 ceph-volume lvm create --bluestore --data ceph-block-52/block-52 --block.db ceph-db-50-54/db-52 ceph-volume lvm create --bluestore --data ceph-block-53/block-53 --block.db ceph-db-50-54/db-53 ceph-volume lvm create --bluestore --data ceph-block-54/block-54 --block.db ceph-db-50-54/db-54 ceph-volume lvm create --bluestore --data ceph-block-55/block-55 --block.db ceph-db-55-59/db-55 ceph-volume lvm create --bluestore --data ceph-block-56/block-56 --block.db ceph-db-55-59/db-56 ceph-volume lvm create --bluestore --data ceph-block-57/block-57 --block.db ceph-db-55-59/db-57 ceph-volume lvm create --bluestore --data ceph-block-58/block-58 --block.db ceph-db-55-59/db-58 ceph-volume lvm create --bluestore --data ceph-block-59/block-59 --block.db ceph-db-55-59/db-59
Reboot the new osd node:
shutdown -r now
Verify that the new OSDs are up.
Check if the OSD log files belongs to ceph and if they are populated. Otherwise change the ownership and restart the osd
Verify that all buckets are using straw2:
ceph osd getcrushmap -o crush.map; crushtool -d crush.map | grep straw; rm -f crush.map
If not (i.e. if some are using straw), run the following command:
ceph osd crush set-all-straw-buckets-to-straw2
Warning: this could trigger a data rebalance
Enable and start puppet:
systemctl start puppet systemctl enable puppet
Then, after a few minutes, check that "ceph status" doesn't report Pgs in peering.
Then:
[root@c-osd-1 /]# ceph osd unset nobackfill nobackfill is unset [root@c-osd-1 /]# ceph osd unset norebalance norebalance is unset [root@c-osd-1 /]# ceph osd unset noout noout is unset [root@c-osd-1 /]#
This should trigger a data movement.