-
Antonio Falabella authoredAntonio Falabella authored
Manual Maintenance
Clean OSD removal
ceph osd safe-to-destroy osd.0
ceph osd out <ID>
systemctl stop ceph-osd.<ID>
ceph osd crush remove osd.<ID>
ceph osd down 0
ceph auth del osd.<ID>
ceph osd rm <ID>
Remove the logical volumes, volume groups and physical volumes
lvremove <list of volumes>
vgremove <list of volume groups>
pvremove <list of physical volumes>
Clean mds removal
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
Clean mgr removal
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
Clean mon removal
systemctl stop ceph-mon@<id>.service
rm -rf /var/lib/ceph/mon/ceph-<id>
ceph auth rm mon.<id>
Reattach expelled disk
When a disk is expelled and reattached the new device name is different so the OSD fails. To reattach it in the the correct way a few steps must be followed.
Let's suppose that the disk was first dev/sdv
. After reattachment the disk becomes /dev/sdah
, so the OSD is no more working.
First identifiy the full SCSI identifier, for example checking the /sys/block
folder:
sdah -> ../devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
or
udevadm info --query=path --name=/dev/sdah
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
This is a JBOD disk, to remove it you have issue this command:
echo 1 > /sys/block/sdah/device/delete
Now the device disappered.
Before rescanning the SCSI host you have to tweak the naming using udev rules. Create this rule /etc/udev/rules.d/20-disk-rename.rules
With this content:
KERNEL=="sd?", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv", RUN+="/usr/bin/logger My disk ATTR{partition}=$ATTR{partition}, DEVPATH=$devpath, ID_PATH=$ENV{ID_PATH}, ID_SERIAL=$ENV{ID_SERIAL}", GOTO="END_20_PERSISTENT_DISK"
KERNEL=="sd?*", ATTR{partition}=="1", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv%n" RUN+="/usr/bin/logger My partition parent=%p number=%n, ATTR{partition}=$ATTR{partition}"
LABEL="END_20_PERSISTENT_DISK"
Now if rescan the SCSI host the disk will be recognized again but the block device label will be forces to be /dev/sdv
echo "- - -" > /sys/class/scsi_host/host11/scan
Now retrieve tha OSD ids
ceph-volume lvm list
which gives the full informations:
====== osd.26 ======
[block] /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block device /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block uuid rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
cephx lockbox secret
cluster fsid 959f6ec8-6e8c-4492-a396-7525a5108a8f
cluster name ceph
crush device class None
db device /dev/cs-001_journal/sdv_db
db uuid QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
encrypted 0
osd fsid aad7b25d-1182-4570-9164-5c3d3a6a61b7
osd id 26
osdspec affinity
type block
vdo 0
wal device /dev/cs-001_journal/sdv_wal
wal uuid bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
devices /dev/sdv
[db] /dev/cs-001_journal/sdv_db
block device /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block uuid rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
cephx lockbox secret
cluster fsid 959f6ec8-6e8c-4492-a396-7525a5108a8f
cluster name ceph
crush device class None
db device /dev/cs-001_journal/sdv_db
db uuid QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
encrypted 0
osd fsid aad7b25d-1182-4570-9164-5c3d3a6a61b7
osd id 26
osdspec affinity
type db
vdo 0
wal device /dev/cs-001_journal/sdv_wal
wal uuid bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
devices /dev/sdb
output example
ceph-volume lvm activate --bluestore 26 aad7b25d-1182-4570-9164-5c3d3a6a61b7
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data --path /var/lib/ceph/osd/ceph-26 --no-mon-config
Running command: /usr/bin/ln -snf /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data /var/lib/ceph/osd/ceph-26/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-75
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_db /var/lib/ceph/osd/ceph-26/block.db
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_db
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_wal /var/lib/ceph/osd/ceph-26/block.wal
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_wal
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.wal
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76
Running command: /usr/bin/systemctl enable ceph-volume@lvm-26-aad7b25d-1182-4570-9164-5c3d3a6a61b7
Running command: /usr/bin/systemctl enable --runtime ceph-osd@26
Running command: /usr/bin/systemctl start ceph-osd@26
--> ceph-volume lvm activate successful for osd ID: 26
OSD map tweaking
ceph osd getcrushmap -o /tmp/crushmap
crushtool -d /tmp/crushmap -o crush_map
Now you can edit the crush_map
file recompile it and inject into the cluster
crushtool -c crush_map -o /tmp/crushmap
ceph osd setcrushmap -i /tmp/crushmap
Inconsistent PGs
rados list-inconsistent-pg {pool}
Slow ops
ceph daemon mon.cs-001 ops
Find OSD failures
ceph daemon mon.cs-001 ops | grep osd_failure
"description": "osd_failure(failed timeout osd.130 [v2:131.154.128.179:6876/13353,v1:131.154.128.179:6882/13353] for 24sec e76448 v76448)",
"description": "osd_failure(failed timeout osd.166 [v2:131.154.128.199:6937/13430,v1:131.154.128.199:6959/13430] for 24sec e76448 v76448)",
"description": "osd_failure(failed timeout osd.175 [v2:131.154.128.199:6924/13274,v1:131.154.128.199:6933/13274] for 24sec e76448 v76448)",