Skip to content
Snippets Groups Projects
manual_maintenance.md 6.37 KiB
Newer Older
  • Learn to ignore specific revisions
  • Antonio Falabella's avatar
    Antonio Falabella committed
    ### Manual Maintenance
    
    ## Clean OSD removal
    ```
    ceph osd safe-to-destroy osd.0
    
    ceph osd out <ID>
    systemctl  stop ceph-osd.<ID>
    ceph osd crush remove osd.<ID>
    ceph osd down 0
    ceph auth del osd.<ID>
    ceph osd rm <ID>
    ```
    Remove the logical volumes, volume groups and physical volumes
    ```
    lvremove <list of volumes>
    vgremove <list of volume groups>
    pvremove <list of physical volumes>
    ```
    
    ## Clean mds removal
    ```
    systemctl stop ceph-mds@<id>.service
    rm -rf /var/lib/ceph/mds/ceph-<id>
    ceph auth rm mds.<id>
    ```
    
    ## Clean mgr removal
    ```
    systemctl stop ceph-mds@<id>.service
    rm -rf /var/lib/ceph/mds/ceph-<id>
    ceph auth rm mds.<id>
    ```
    
    ## Clean mon removal
    ```
    systemctl stop ceph-mon@<id>.service
    rm -rf /var/lib/ceph/mon/ceph-<id>
    ceph auth rm mon.<id>
    ```
    
    
    Antonio Falabella's avatar
    Antonio Falabella committed
    ## Reattach expelled disk
    When a disk is expelled and reattached the new device name is different so the OSD fails. To reattach it in the the correct way a few steps must be followed.
    Let's suppose that the disk was first `dev/sdv`. After reattachment the disk becomes `/dev/sdah`, so the OSD is no more working.
    First identifiy the full SCSI identifier, for example checking the `/sys/block` folder:
    ```
    sdah -> ../devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
    ```
    or
    ```
    udevadm info --query=path --name=/dev/sdah
    /devices/pci0000:80/0000:80:02.0/0000:82:00.0/host11/port-11:0/expander-11:0/port-11:0:31/end_device-11:0:31/target11:0:31/11:0:31:0/block/sdah
    ```
    This is a JBOD disk, to remove it you have issue this command:
    ```
    echo 1 >  /sys/block/sdah/device/delete
    ```
    Now the device disappered.
    Before rescanning the SCSI host you have to tweak the naming using udev rules. Create this rule `/etc/udev/rules.d/20-disk-rename.rules`
    With this content:
    ```
    KERNEL=="sd?", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv", RUN+="/usr/bin/logger My disk ATTR{partition}=$ATTR{partition}, DEVPATH=$devpath, ID_PATH=$ENV{ID_PATH}, ID_SERIAL=$ENV{ID_SERIAL}", GOTO="END_20_PERSISTENT_DISK"
    
    KERNEL=="sd?*", ATTR{partition}=="1", SUBSYSTEM=="block", DEVPATH=="*port-11:0:31/end_device-11:0:31*", NAME="sdv%n" RUN+="/usr/bin/logger My partition parent=%p number=%n, ATTR{partition}=$ATTR{partition}"
    LABEL="END_20_PERSISTENT_DISK"
    ```
    Now if rescan the SCSI host the disk will be recognized again but the block device label will be forces to be /dev/sdv
    ```
    echo "- - -" > /sys/class/scsi_host/host11/scan
    ```
    Now retrieve tha OSD ids
    ```
    ceph-volume lvm list
    ```
    which gives the full informations:
    ```
    ====== osd.26 ======
      [block]       /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
    
          block device              /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
          block uuid                rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
          cephx lockbox secret
          cluster fsid              959f6ec8-6e8c-4492-a396-7525a5108a8f
          cluster name              ceph
          crush device class        None
          db device                 /dev/cs-001_journal/sdv_db
          db uuid                   QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
          encrypted                 0
          osd fsid                  aad7b25d-1182-4570-9164-5c3d3a6a61b7
          osd id                    26
          osdspec affinity
          type                      block
          vdo                       0
          wal device                /dev/cs-001_journal/sdv_wal
          wal uuid                  bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
          devices                   /dev/sdv
    
    Antonio Falabella's avatar
    Antonio Falabella committed
      [db]          /dev/cs-001_journal/sdv_db
    
    Antonio Falabella's avatar
    Antonio Falabella committed
          block device              /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data
          block uuid                rMNcOq-9Isr-3LJZ-gp6P-tZmi-fcJ0-d0D0Mx
          cephx lockbox secret      
          cluster fsid              959f6ec8-6e8c-4492-a396-7525a5108a8f
          cluster name              ceph
          crush device class        None
          db device                 /dev/cs-001_journal/sdv_db
          db uuid                   QaQmrJ-zdTu-UXZ4-oqt0-hXgM-emKe-fqtOaX
          encrypted                 0
          osd fsid                  aad7b25d-1182-4570-9164-5c3d3a6a61b7
          osd id                    26
          osdspec affinity          
          type                      db
          vdo                       0
          wal device                /dev/cs-001_journal/sdv_wal
          wal uuid                  bjLNLd-0o3q-haDa-eFyv-ILjx-v2yk-YtaHuo
          devices                   /dev/sdb
    ``` 
    ceph-volume lvm activate --bluestore 26 aad7b25d-1182-4570-9164-5c3d3a6a61b7 
    Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-26 
    Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26 
    Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data --path /var/lib/ceph/osd/ceph-26 --no-mon-config 
    Running command: /usr/bin/ln -snf /dev/18-2EH802TV-HGST-HUH728080AL4200/sdv_data /var/lib/ceph/osd/ceph-26/block 
    Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block 
    Running command: /usr/bin/chown -R ceph:ceph /dev/dm-75 
    Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-26 
    Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_db /var/lib/ceph/osd/ceph-26/block.db 
    Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_db 
    Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77 
    Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.db 
    Running command: /usr/bin/chown -R ceph:ceph /dev/dm-77 
    Running command: /usr/bin/ln -snf /dev/cs-001_journal/sdv_wal /var/lib/ceph/osd/ceph-26/block.wal 
    Running command: /usr/bin/chown -h ceph:ceph /dev/cs-001_journal/sdv_wal 
    Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76 
    Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-26/block.wal 
    Running command: /usr/bin/chown -R ceph:ceph /dev/dm-76 
    Running command: /usr/bin/systemctl enable ceph-volume@lvm-26-aad7b25d-1182-4570-9164-5c3d3a6a61b7 
    Running command: /usr/bin/systemctl enable --runtime ceph-osd@26 
    Running command: /usr/bin/systemctl start ceph-osd@26 
    --> ceph-volume lvm activate successful for osd ID: 26 
    ```
    ## OSD map tweaking
    ```
     ceph osd getcrushmap -o /tmp/crushmap
     crushtool -d /tmp/crushmap -o crush_map
    ```
    Now you can edit the `crush_map` file recompile it and inject into the cluster
    ```
     crushtool -c crush_map -o /tmp/crushmap
     ceph osd setcrushmap -i /tmp/crushmap
    ```
    
    Antonio Falabella's avatar
    Antonio Falabella committed
    ## Inconsistent PGs
    ```
    rados list-inconsistent-pg {pool}
    ```