Troubleshooting GlusterFS Pods

In some scenarios, after restarting a node, your GlusterFS cannot mount its volumes back.

If this happens to you, the steps below can help you identify the problem and then restore the pod to its expected behavior.

To troubleshoot your GlusterFS pods:

  1. In your CLI, run the following command:
    kubectl describe po heketi-745766fcd4-9jnnq
    In the Events section of the description, you will see an error that looks like this:
    Warning FailedMount 100s kubelet, node3 MountVolume.SetUp failed for volume "db" : mount failed: mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/e3a66223-ee75-496f-962f-093485dfedaf/volumes/kubernetes.io~glusterfs/db --scope -- mount -t glusterfs -o auto_unmount,backup-volfile-servers=10.16.29.6:10.16.29.7:10.16.29.8,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/db/heketi-745766fcd4-9jnnq-glusterfs.log,log-level=ERROR 10.16.29.8:heketidbstorage /var/lib/kubelet/pods/e3a66223-ee75-496f-962f-093485dfedaf/volumes/kubernetes.io~glusterfs/db Output: Running scope as unit: run-r576221f452954352bc8b520b277c2bc6.scope [2019-09-07 07:14:00.137545] E [glusterfsd.c:795:gf_remember_backup_volfile_server] 0-glusterfs: failed to set volfile server: File exists Mount failed. Please check the log file for more details.
    This error indicates that your GlusterFS volume could not be mounted as there is another voume already running on the server.
  2. Run the command the below to check the status of your GlusterFS volume:
    gluster volume status heketidbstorage 
    If your volume is running as expected, you should see an output like this:
    Status of volume: heketidbstorage
    Gluster process TCP Port RDMA Port Online Pid
    ------------------------------------------------------------------------------
    Brick 10.50.6.149:/var/lib/heketi/mounts/vg
    _4a5d18544111232fc76cdc9872d340d6/brick_75c
    ec7af846fbf2e770b9312c6bc56fe/brick 49152 0 Y 198
    Brick 10.50.6.151:/var/lib/heketi/mounts/vg
    _3cf3e12449047bba9b1260d301914187/brick_3ea
    27507e65df9ec9a4c0841d27962f9/brick 49153 0 Y 198
    Brick 10.50.6.150:/var/lib/heketi/mounts/vg
    _78c5285b77bde0e7ed344df72ef5c630/brick_d54
    3d1ea6e83036e7796a89b1594f4cc/brick 49152 0 Y 186
    Self-heal Daemon on localhost N/A N/A Y 177
    Self-heal Daemon on 10.50.6.151 N/A N/A Y 183
    Self-heal Daemon on node1 N/A N/A Y 189
  3. If none of the bricks are online, you should restart the Heketi storage with the commands below and this should resolve the issue:
    gluster volume stop heketidbstorage
    gluster volume start heketidbstorage

Click here for more useful GlusterFS commands.