Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
### Manual Maintenance
## Clean OSD removal
```
ceph osd safe-to-destroy osd.0
ceph osd out <ID>
systemctl stop ceph-osd.<ID>
ceph osd crush remove osd.<ID>
ceph osd down 0
ceph auth del osd.<ID>
ceph osd rm <ID>
```
Remove the logical volumes, volume groups and physical volumes
```
lvremove <list of volumes>
vgremove <list of volume groups>
pvremove <list of physical volumes>
```
## Clean mds removal
```
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
```
## Clean mgr removal
```
systemctl stop ceph-mds@<id>.service
rm -rf /var/lib/ceph/mds/ceph-<id>
ceph auth rm mds.<id>
```
## Clean mon removal
```
systemctl stop ceph-mon@<id>.service
rm -rf /var/lib/ceph/mon/ceph-<id>
ceph auth rm mon.<id>
```
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
## Reattach expelled disk
When a disk is expelled and reattached the new device name is different so the OSD fails. To reattach it in the the correct way a few steps must be followed.
Let's suppose that the disk was first `dev/sdv`. After reattachment the disk becomes `/dev/sdah`, so the OSD is no more working.
First identifiy the full SCSI identifier, for example checking the `/sys/block` folder:
```
sdah -> ../devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
```
or
```
udevadm info --query=path --name=/dev/sdah
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
```
This is a JBOD disk, to remove it you have issue this command:
```
echo 1 > /sys/block/sdah/device/delete
```
Now the device disappered.
Before rescanning the SCSI host you have to tweak the naming using udev rules. Create this rule `/etc/udev/rules.d/20-disk-rename.rules`
With this content:
```
KERNEL=="sd?", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv", RUN+="/usr/bin/logger My disk ATTR{partition}=$ATTR{partition}, DEVPATH=$devpath, ID_PATH=$ENV{ID_PATH}, ID_SERIAL=$ENV{ID_SERIAL}", GOTO="END_20_PERSISTENT_DISK"
KERNEL=="sd?*", ATTR{partition}=="1", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv%n" RUN+="/usr/bin/logger My partition parent=%p number=%n, ATTR{partition}=$ATTR{partition}"
LABEL="END_20_PERSISTENT_DISK"
```
Now if rescan the SCSI host the disk will be recognized again but the block device label will be forces to be /dev/sdv
```
echo "- - -" > /sys/class/scsi_host/host11/scan
```
Now retrieve tha OSD ids
```
ceph-volume lvm list
```
which gives the full informations:
```
====== osd.26 ======
[block] /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block device /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block uuid rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
cephx lockbox secret
cluster fsid 959f6ec8-6e8c-4492-a396-7525a5108a8f
cluster name ceph
crush device class None
db device /dev/cs-001_journal/sdv_db
db uuid QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
encrypted 0
osd fsid aad7b25d-1182-4570-9164-5c3d3a6a61b7
osd id 26
osdspec affinity
type block
vdo 0
wal device /dev/cs-001_journal/sdv_wal
wal uuid bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
devices /dev/sdv
block device /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
block uuid rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
cephx lockbox secret
cluster fsid 959f6ec8-6e8c-4492-a396-7525a5108a8f
cluster name ceph
crush device class None
db device /dev/cs-001_journal/sdv_db
db uuid QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
encrypted 0
osd fsid aad7b25d-1182-4570-9164-5c3d3a6a61b7
osd id 26
osdspec affinity
type db
vdo 0
wal device /dev/cs-001_journal/sdv_wal
wal uuid bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
devices /dev/sdb
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
ceph-volume lvm activate --bluestore 26 aad7b25d-1182-4570-9164-5c3d3a6a61b7
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data --path /var/lib/ceph/osd/ceph-26 --no-mon-config
Running command: /usr/bin/ln -snf /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data /var/lib/ceph/osd/ceph-26/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-75
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_db /var/lib/ceph/osd/ceph-26/block.db
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_db
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77
Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_wal /var/lib/ceph/osd/ceph-26/block.wal
Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_wal
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.wal
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76
Running command: /usr/bin/systemctl enable ceph-volume@lvm-26-aad7b25d-1182-4570-9164-5c3d3a6a61b7
Running command: /usr/bin/systemctl enable --runtime ceph-osd@26
Running command: /usr/bin/systemctl start ceph-osd@26
--> ceph-volume lvm activate successful for osd ID: 26
```
## OSD map tweaking
```
ceph osd getcrushmap -o /tmp/crushmap
crushtool -d /tmp/crushmap -o crush_map
```
Now you can edit the `crush_map` file recompile it and inject into the cluster
```
crushtool -c crush_map -o /tmp/crushmap
ceph osd setcrushmap -i /tmp/crushmap
```
## Inconsistent PGs
```
rados list-inconsistent-pg {pool}
```
## Slow ops
```
ceph daemon mon.cs-001 ops
```
Find OSD failures
```
ceph daemon mon.cs-001 ops | grep osd_failure
"description": "osd_failure(failed timeout osd.130 [v2:131.154.128.179:6876/13353,v1:131.154.128.179:6882/13353] for 24sec e76448 v76448)",
"description": "osd_failure(failed timeout osd.166 [v2:131.154.128.199:6937/13430,v1:131.154.128.199:6959/13430] for 24sec e76448 v76448)",
"description": "osd_failure(failed timeout osd.175 [v2:131.154.128.199:6924/13274,v1:131.154.128.199:6933/13274] for 24sec e76448 v76448)",
```