...
Disable SELinux
Install ceph:
For C7:
| Code Block |
|---|
|
rpm -Uvh https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm
yum install yum-plugin-priorities
yum clean all
yum update
yum install ceph |
For C8Add the following lines to /etc/security/limits.conf:
| Code Block |
|---|
|
rpm -Uvh https://download.ceph.com/rpm-nautilus/el8/noarch/ceph-release-1-1.el8.noarch.rpm |
Then:
| Code Block |
|---|
|
yum clean all
yum update
yum install ceph |
Add the following lines to /etc/security/limits.conf:
| Code Block |
|---|
|
* soft nofile 65536
* hard nofile 65536* soft nofile 65536
* hard nofile 65536 |
Add the following lines to /etc/sysctl.conf (to prevent "page allocation failure" errors, and to prevent swapping):
...
Edit /etc/sysconfig/grub adding "transparent_hugepage=never" to GRUB_CMDLINE_LINUX (e.g. GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never")
| Code Block |
|---|
|
# cp /boot/grub2/grub.cfg ~ |
...
# grub2-mkconfig -o /boot/grub2/grub.cfg |
...
Stop and disable puppet, and then reboot:reboot:
| Code Block |
|---|
|
systemctl stop puppet; systemctl disable puppet |
...
Verify that Transparent huge pages are disabled:
| Code Block |
|---|
|
root@ceph-osd-01 ~]# grep transparent_hugepage /proc/cmdline |
...
BOOT_IMAGE=/vmlinuz-3.10.0-1062.9.1.el7.x86_64 root=UUID=aa5f2c49-17cf-46fe-8c7a-20f44892c131 ro nofb splash=quiet crashkernel=auto rhgb quiet transparent_hugepage=never |
...
[root@ceph-osd-01 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled |
...
...
Move the host in the hosts_all/CephProd (hosts_all/CephProd hostgroup-C8 dor CentOS8) hostgroup
Run once puppet:
| Code Block |
|---|
|
puppet agent -t
|
Enable the nagios sensors.
...
Copy from a ceph-mon-xx host the file /etc/ceph/ceph.client.admin.keyring and sets its owernship and mode that should be:
| Code Block |
|---|
|
-rw-------. 1 ceph ceph 137 Feb 20 13:51 /etc/ceph/ceph.client.admin.keyring |
If it doesn't exist yet, create the rack in the ceph crush map:
| Code Block |
|---|
|
ceph osd crush add-bucket Rack12-PianoAlto rack |
...
ceph osd crush move Rack12-PianoAlto root=default |
In the considered example, there are 10 SATA disks (/dev/sdc .. /dev/sdl) and 2 SSD disks (/dev/sda and /dev/sdb)
...
Prepare the disks for block and block.db:
...
echo "vgcreate on SATA disks..." |
...
vgcreate ceph-block-50 /dev/sdc |
...
vgcreate ceph-block-51 /dev/sdd |
...
vgcreate ceph-block-52 /dev/sde |
...
vgcreate ceph-block-53 /dev/sdf |
...
vgcreate ceph-block-54 /dev/sdg |
...
vgcreate ceph-block-55 /dev/sdh |
...
vgcreate ceph-block-56 /dev/sdi |
...
vgcreate ceph-block-57 /dev/sdj |
...
vgcreate ceph-block-58 /dev/sdk |
...
vgcreate ceph-block-59 /dev/sdl |
...
echo "lvcreate on SATA disks..." |
...
lvcreate -l 100%FREE -n block-50 ceph-block-50 |
...
lvcreate -l 100%FREE -n block-51 ceph-block-51 |
...
lvcreate -l 100%FREE -n block-52 ceph-block-52 |
...
lvcreate -l 100%FREE -n block-53 ceph-block-53 |
...
lvcreate -l 100%FREE -n block-54 ceph-block-54 |
...
lvcreate -l 100%FREE -n block-55 ceph-block-55 |
...
lvcreate -l 100%FREE -n block-56 ceph-block-56 |
...
lvcreate -l 100%FREE -n block-57 ceph-block-57 |
...
lvcreate -l 100%FREE -n block-58 ceph-block-58 |
...
lvcreate -l 100%FREE -n block-59 ceph-block-59 |
...
...
...
echo "vgcreate on SSD disks..." |
...
vgcreate ceph-db-50-54 /dev/sda |
...
vgcreate ceph-db-55-59 /dev/sdb |
...
echo "lvcreate on SSD disks..." |
...
lvcreate -L 89GB -n db-50 ceph-db-50-54 |
...
lvcreate -L 89GB -n db-51 ceph-db-50-54 |
...
lvcreate -L 89GB -n db-52 ceph-db-50-54 |
...
lvcreate -L 89GB -n db-53 ceph-db-50-54 |
...
lvcreate -L 89GB -n db-54 ceph-db-50-54 |
...
lvcreate -L 89GB -n db-55 ceph-db-55-59 |
...
lvcreate -L 89GB -n db-56 ceph-db-55-59 |
...
lvcreate -L 89GB -n db-57 ceph-db-55-59 |
...
lvcreate -L 89GB -n db-58 ceph-db-55-59 |
...
lvcreate -L 89GB -n db-59 ceph-db-55-59 |
Possible error with vgcreate:
| Code Block |
|---|
|
[root@c-osd-5 /]# vgcreate ceph-block-12 /dev/vdb |
...
Device /dev/vdb excluded by a filter. |
This is because the disk has a GPT. Lets delete it with gdisk:
| Code Block |
|---|
|
[root@c-osd-5 /]# gdisk /dev/vdb |
...
GPT fdisk (gdisk) version 0.8.10 |
...
...
...
...
...
...
Found valid GPT with protective MBR; using GPT. |
...
...
Expert command (? for help): ? |
...
...
...
d display the sector alignment value |
...
e relocate backup data structures to the end of the disk |
...
...
h recompute CHS values in protective/hybrid MBR |
...
i show detailed information on a partition |
...
l set the sector alignment value |
...
...
n create a new protective MBR |
...
o print protective MBR data |
...
p print the partition table |
...
q quit without saving changes |
...
r recovery and transformation options (experts only) |
...
...
t transpose two partition table entries |
...
u replicate partition table on new device |
...
...
w write table to disk and exit |
...
z zap (destroy) GPT data structures and exit |
...
...
Expert command (? for help): z |
...
About to wipe out GPT on /dev/vdb. Proceed? (Y/N): y |
...
GPT data structures destroyed! You may now partition the disk using fdisk or |
...
...
...
...
Then edit ceph.conf to enable the following lines:
osd memory target = 3221225472
If it doesn't not exist yet, create the file:
| Code Block |
|---|
|
-rw------- 1 ceph ceph 71 Apr 28 12:18 /var/lib/ceph/bootstrap-osd/ceph |
...
| Code Block |
|---|
|
# cat /var/lib/ceph/bootstrap-osd/ceph.keyring |
...
...
key = AQA+Y6hYQTvEHRAAr4Q/mwHCByv/kokqnu6nCA== |
It must match with what appears for the 'client.bootstrap-osd' entry in the 'ceph auth export' output. You can copy the file from another OSD node.
Add (via puppet) the new OSDs in the ceph.conf file
...
...
...
host = ceph-osd-06 #manual deployments only. |
...
public addr = 192.168.61.235 |
...
cluster addr = 192.168.222.235 |
...
osd memory target = 3221225472 |
...
...
host = ceph-osd-06 #manual deployments only. |
...
public addr = 192.168.61.235 |
...
cluster addr = 192.168.222.235 |
...
osd memory target = 3221225472 |
...
...
Run puppet once to have the file updated on the new OSD node
| Code Block |
|---|
|
puppet agent -t
|
Disable data movements:
| Code Block |
|---|
|
[root@c-osd-1 /]# ceph osd set norebalance |
...
...
[root@c-osd-1 /]# ceph osd set nobackfill |
...
...
[root@c-osd-1 /]# ceph osd set noout
noout is set |
...
Create a first OSD:
| Code Block |
|---|
|
ceph-volume lvm create --bluestore --data ceph-block-50/block-50 --block.db ceph-db-50-54/db-50 |
The above command could trigger some data movement
...
Then move this host (ceph-osd-06 in our example) in the relevant rack:
| Code Block |
|---|
|
ceph osd crush move ceph-osd-06 rack=Rack12-PianoAlto |
Ri-verify with ceph osd df and ceph osd tree.
Verify that the OSD is using the right vgs:
| Code Block |
|---|
|
[root@ceph-osd-06 ~]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-50 |
...
infering bluefs devices from bluestore path |
...
...
"/var/lib/ceph/osd/ceph-50/block": { |
...
"osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7", |
...
...
"btime": "2019-02-19 11:55:47.553215", |
...
...
...
"ceph_fsid": "8162f291-00b6-4b40-a8b4-1981a8c09b64", |
...
...
"magic": "ceph osd volume v026", |
...
...
"osd_key": "AQCu4Gtc+jKSJhAAKzaAAYuTKWZs9rjJlBXWww==", |
...
...
...
...
"/var/lib/ceph/osd/ceph-50/block.db": { |
...
"osd_uuid": "dc72b996-d035-4dcd-ba42-1a6433eb78f7", |
...
...
"btime": "2019-02-19 11:55:47.573213", |
...
"description": "bluefs db" |
...
...
...
[root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block |
...
lrwxrwxrwx 1 ceph ceph 27 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block -> /dev/ceph-block-50/block-50 |
...
[root@ceph-osd-06 ~]# ls -l /var/lib/ceph/osd/ceph-50/block.db |
...
lrwxrwxrwx 1 ceph ceph 24 Feb 19 12:23 /var/lib/ceph/osd/ceph-50/block.db -> /dev/ceph-db-50-54/db-50 |
...
Create the other OSDs (use also –osd-id if needed, e.g. when migrating OSDs from filestore to bluestore):):
| Code Block |
|---|
|
ceph-volume lvm create --bluestore --data ceph-block-51/block-51 --block.db ceph-db-50-54/db-51 |
...
ceph-volume lvm create --bluestore --data ceph-block-52/block-52 --block.db ceph-db-50-54/db-52 |
...
ceph-volume lvm create --bluestore --data ceph-block-53/block-53 --block.db ceph-db-50-54/db-53 |
...
ceph-volume lvm create --bluestore --data ceph-block-54/block-54 --block.db ceph-db-50-54/db-54 |
...
ceph-volume lvm create --bluestore --data ceph-block-55/block-55 --block.db ceph-db-55-59/db-55 |
...
ceph-volume lvm create --bluestore --data ceph-block-56/block-56 --block.db ceph-db-55-59/db-56 |
...
ceph-volume lvm create --bluestore --data ceph-block-57/block-57 --block.db ceph-db-55-59/db-57 |
...
ceph-volume lvm create --bluestore --data ceph-block-58/block-58 --block.db ceph-db-55-59/db-58 |
...
ceph-volume lvm create --bluestore --data ceph-block-59/block-59 --block.db ceph-db-55-59/db-59 |
Reboot the new osd node:
| Code Block |
|---|
|
shutdown -r now |
Verify that the new OSDs are up.
...
Verify that all buckets are using straw2:
| Code Block |
|---|
|
ceph osd getcrushmap -o crush.map; crushtool -d crush.map | grep straw; rm -f crush.map |
If not (i.e. if some are using straw), run the following command:
| Code Block |
|---|
|
ceph osd crush set-all-straw-buckets-to-straw2
|
Warning: this could trigger a data rebalance
Enable and start puppet:
| Code Block |
|---|
|
systemctl start puppet |
...
Then, after a few minutes, check that "ceph status" doesn't report Pgs in peering.
Then:
| Code Block |
|---|
|
[root@c-osd-1 /]# ceph osd unset nobackfill |
...
...
[root@c-osd-1 /]# ceph osd unset norebalance |
...
...
[root@c-osd-1 /]# ceph osd unset noout |
...
...
This should trigger a data movement.