Skip to content
Snippets Groups Projects
manual_maintenance.md 6.95 KiB
Newer Older
Antonio Falabella's avatar
Antonio Falabella committed
### Manual Maintenance

## Clean OSD removal
```
ceph osd safe-to-destroy osd.0

ceph osd out <ID>
systemctl  stop ceph-osd.<ID>
ceph osd crush remove osd.<ID>
ceph osd down 0
ceph auth del osd.<ID>
ceph osd rm <ID>
```
Remove the logical volumes, volume groups and physical volumes
```
lvremove <list of volumes>
vgremove <list of volume groups>
pvremove <list of physical volumes>
```

## Clean mds removal
```
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
```

## Clean mgr removal
```
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
```

## Clean mon removal
```
systemctl stop ceph-mon@<id>.service
rm -rf /var/lib/ceph/mon/ceph-<id>
ceph auth rm mon.<id>
```

Antonio Falabella's avatar
Antonio Falabella committed
## Reattach expelled disk
When a disk is expelled and reattached the new device name is different so the OSD fails. To reattach it in the the correct way a few steps must be followed.
Let's suppose that the disk was first `dev/sdv`. After reattachment the disk becomes `/dev/sdah`, so the OSD is no more working.
First identifiy the full SCSI identifier, for example checking the `/sys/block` folder:
```
sdah -> ../devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
```
or
```
udevadm info --query=path --name=/dev/sdah
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
```
This is a JBOD disk, to remove it you have issue this command:
```
echo 1 >  /sys/block/sdah/device/delete
```
Now the device disappered.
Before rescanning the SCSI host you have to tweak the naming using udev rules. Create this rule `/etc/udev/rules.d/20-disk-rename.rules`
With this content:
```
KERNEL=="sd?", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv", RUN+="/usr/bin/logger My disk ATTR{partition}=$ATTR{partition}, DEVPATH=$devpath, ID_PATH=$ENV{ID_PATH}, ID_SERIAL=$ENV{ID_SERIAL}", GOTO="END_20_PERSISTENT_DISK"

KERNEL=="sd?*", ATTR{partition}=="1", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv%n" RUN+="/usr/bin/logger My partition parent=%p number=%n, ATTR{partition}=$ATTR{partition}"
LABEL="END_20_PERSISTENT_DISK"
```
Now if rescan the SCSI host the disk will be recognized again but the block device label will be forces to be /dev/sdv
```
echo "- - -" > /sys/class/scsi_host/host11/scan
```
Now retrieve tha OSD ids
```
ceph-volume lvm list
```
which gives the full informations:
```
====== osd.26 ======
  [block]       /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data

      block device              /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
      block uuid                rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
      cephx lockbox secret
      cluster fsid              959f6ec8-6e8c-4492-a396-7525a5108a8f
      cluster name              ceph
      crush device class        None
      db device                 /dev/cs-001_journal/sdv_db
      db uuid                   QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
      encrypted                 0
      osd fsid                  aad7b25d-1182-4570-9164-5c3d3a6a61b7
      osd id                    26
      osdspec affinity
      type                      block
      vdo                       0
      wal device                /dev/cs-001_journal/sdv_wal
      wal uuid                  bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
      devices                   /dev/sdv
Antonio Falabella's avatar
Antonio Falabella committed
  [db]          /dev/cs-001_journal/sdv_db
Antonio Falabella's avatar
Antonio Falabella committed
      block device              /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
      block uuid                rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
      cephx lockbox secret      
      cluster fsid              959f6ec8-6e8c-4492-a396-7525a5108a8f
      cluster name              ceph
      crush device class        None
      db device                 /dev/cs-001_journal/sdv_db
      db uuid                   QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
      encrypted                 0
      osd fsid                  aad7b25d-1182-4570-9164-5c3d3a6a61b7
      osd id                    26
      osdspec affinity          
      type                      db
      vdo                       0
      wal device                /dev/cs-001_journal/sdv_wal
      wal uuid                  bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
      devices                   /dev/sdb
```
output example 
```
Antonio Falabella's avatar
Antonio Falabella committed
ceph-volume lvm activate --bluestore 26 aad7b25d-1182-4570-9164-5c3d3a6a61b7 
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-26 
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26 
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data --path /var/lib/ceph/osd/ceph-26 --no-mon-config 
Running command: /usr/bin/ln -snf /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data /var/lib/ceph/osd/ceph-26/block 
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block 
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-75 
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26 
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_db /var/lib/ceph/osd/ceph-26/block.db 
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_db 
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77 
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.db 
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77 
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_wal /var/lib/ceph/osd/ceph-26/block.wal 
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_wal 
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76 
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.wal 
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76 
Running command: /usr/bin/systemctl enable ceph-volume@lvm-26-aad7b25d-1182-4570-9164-5c3d3a6a61b7 
Running command: /usr/bin/systemctl enable --runtime ceph-osd@26 
Running command: /usr/bin/systemctl start ceph-osd@26 
--> ceph-volume lvm activate successful for osd ID: 26 
```
## OSD map tweaking
```
 ceph osd getcrushmap -o /tmp/crushmap
 crushtool -d /tmp/crushmap -o crush_map
```
Now you can edit the `crush_map` file recompile it and inject into the cluster
```
 crushtool -c crush_map -o /tmp/crushmap
 ceph osd setcrushmap -i /tmp/crushmap
```
Antonio Falabella's avatar
Antonio Falabella committed
## Inconsistent PGs
```
rados list-inconsistent-pg {pool}
```

## Slow ops
```
ceph daemon mon.cs-001 ops
```
Find OSD failures
```
ceph daemon mon.cs-001 ops | grep osd_failure
            "description": "osd_failure(failed timeout osd.130 [v2:131.154.128.179:6876/13353,v1:131.154.128.179:6882/13353] for 24sec e76448 v76448)",
            "description": "osd_failure(failed timeout osd.166 [v2:131.154.128.199:6937/13430,v1:131.154.128.199:6959/13430] for 24sec e76448 v76448)",
            "description": "osd_failure(failed timeout osd.175 [v2:131.154.128.199:6924/13274,v1:131.154.128.199:6933/13274] for 24sec e76448 v76448)",
```