Skip to content
Snippets Groups Projects

Check mon status from socket

sudo ceph --admin-daemon /var/run/ceph/ceph-mon.qn-cnfslhc.asok config help
sudo ceph --admin-daemon /var/run/ceph/ceph-mon.qn-cnfslhc.asok config show
sudo ceph --admin-daemon /var/run/ceph/ceph-mon.qn-cnfslhc.asok

Enable msgr2

Enable msgr2 in nautilus

ceph mon enable-msgr2

OSD remove

$ ceph osd out osd.<ID>
$ systemctl stop osd.<ID>
$ ceph osd down osd.<ID>
$ ceph osd purge osd.<ID>
$ ceph auth rm  osd.<ID>

Disk clean up

check spurious partitions with lsblk and then remove them with

dmsetup remove <partition hash>

for i in `cat ~/disks.txt`;do id=$(lsblk $i  | grep ceph | awk '{print $1;}') && id_c=${id:2} && dmsetup remove $id_c ;done

Zap the the disk with gdisk

gdisk  /dev/sdbe < input

where input is a file with this content:

x
z
y
y

Pools

ceph osd pool ls
ceph osd pool  delete <pool> <pool> --yes-i-really-really-mean-it

OSD PG recreate

This is a destructive operation, to recover a severely degraded cluster. Retrieve the list of inactive PGs for a given :

ceph pg dump_stuck inactive | grep <state>
for i in `ceph pg dump_stuck inactive | grep <state> | awk '{print $1}'`;do ceph osd force-create-pg $i --yes-i-really-mean-it;done

BLUEFS_SPILLOVER BlueFS spillover detected

Sometimes the metadata can spillover to spinning disks. The way to solve this is to scrub the OSDs:

ceph osd scrub <id>

As a reference from the documentation:

Data Scrubbing: As part of maintaining data consistency and cleanliness, Ceph OSD Daemons can scrub objects within placement groups. That is, Ceph OSD Daemons can compare object metadata in one placement group with its replicas in placement groups stored on other OSDs. Scrubbing (usually performed daily) catches bugs or filesystem errors. Ceph OSD Daemons also perform deeper scrubbing by comparing data in objects bit-for-bit. Deep scrubbing (usually performed weekly) finds bad sectors on a drive that weren’t apparent in a light scrub. See Data Scrubbing for details on configuring scrubbing.

PG_DEGRADED (Degraded data redundancy: 10564/21718684 objects degraded (0.049%), 1 pg degraded)

Find the problematic PG

ceph health detail

With the id of the PG find the related OSD:

ceph pg 23.304 query | grep -A 1 acting

Restart the corresponding OSD.

Filesystem removal

First umount the fs from all the client, then stop all the metadata servers. After this issue:

ceph fs rm cephfs --yes-i-really-mean-it

Crash Warning

After checking with ceph crash ls-new, they can be cleaned with

for id in `ceph crash ls-new | awk '{print $1}' | awk '{if(NR>1)print}'`;do ceph crash archive $id;done

Slow ops

If you have warnings concerning slow ops that refer to timeout connection versus some osd you can solve in the following way. First of all login into the complaining host and find the slow ops by:

ceph daemon mon.ds-507 ops 2>&1 | grep "timeout osd"

Then try to restart the osd, this should clear the slow ops warnings.