[gpfsug-discuss] wait for mount during gpfs startup

Tue Apr 28 11:57:48 BST 2020

Hi,

when the gpfs systemd service returns from startup the filesystems are usually not mounted. So 
having another service depending on gpfs is not feasible if you require the filesystem(s).

Therefore we have added a script to the systemd gpfs service that waits for all local gpfs 
filesystems being mounted. We have added that script via ExecStartPost:

------------------------------------------------------------
# cat /etc/systemd/system/gpfs.service.d/waitmount.conf
[Service]
ExecStartPost=/usr/local/sc-gpfs/sbin/wait-for-all_local-mounts.sh
TimeoutStartSec=200
-------------------------------------------------------------

The script itself is not doing much:
-------------------------------------------------------------
#!/bin/bash
#
# wait until all _local_ gpfs filesystems are mounted. It ignored
# filesystems where mmlsfs -A does not report "yes".
#
# returns 0 if all fs are mounted (or none are found in gpfs configuration)
# returns non-0 otherwise

# wait for max. TIMEOUT seconds
TIMEOUT=180

# leading space is required!
FS=" $(/usr/lpp/mmfs/bin/mmlsfs all_local -Y 2>/dev/null | grep :automaticMountOption:yes: | cut -d: 
-f7 | xargs; exit ${PIPESTATUS[0]})"
# RC=1 and no output means there are no such filesystems configured in GPFS
[ $? -eq 1 ] && [ "$FS" = " " ] && exit 0

# uncomment this line for testing
#FS="$FS gpfsdummy"

while [ $TIMEOUT -gt 0 ]; do
     for fs in ${FS}; do
         if findmnt $fs -n &>/dev/null; then
             FS=${FS/ $fs/}
             continue 2;
         fi
     done
     [ -z "${FS// /}" ] && break
     (( TIMEOUT -= 5 ))
     sleep 5
done

if [ -z "${FS// /}" ]; then
     exit 0
else
     echo >&2 "ERROR: filesystem(s) not found in time:${FS}"
     exit 2
fi
--------------------------------------------------

This works without problems on _most_ of our clusters. However, not on all. Some of them show what I 
believe is a race condition and fail to startup after a reboot:

----------------------------------------------------------------------
# journalctl -u gpfs
-- Logs begin at Fri 2020-04-24 17:11:26 CEST, end at Tue 2020-04-28 12:47:34 CEST. --
Apr 24 17:12:13 myhost systemd[1]: Starting General Parallel File System...
Apr 24 17:12:17 myhost mmfs[5720]: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg.
Apr 24 17:13:44 myhost systemd[1]: gpfs.service start-post operation timed out. Stopping.
Apr 24 17:13:44 myhost mmremote[8966]: Shutting down!
Apr 24 17:13:48 myhost mmremote[8966]: Unloading modules from 
/lib/modules/3.10.0-1062.18.1.el7.x86_64/extra
Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfs26
Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfslinux
Apr 24 17:13:48 myhost systemd[1]: Failed to start General Parallel File System.
Apr 24 17:13:48 myhost systemd[1]: Unit gpfs.service entered failed state.
Apr 24 17:13:48 myhost systemd[1]: gpfs.service failed.
----------------------------------------------------------------------

The mmfs.log shows a bit more:
----------------------------------------------------------------------
# less /var/adm/ras/mmfs.log.previous
2020-04-24_17:12:14.609+0200: runmmfs starting (4254)
2020-04-24_17:12:14.622+0200: [I] Removing old /var/adm/ras/mmfs.log.* files:
2020-04-24_17:12:14.658+0200: runmmfs: [I] Unloading modules from 
/lib/modules/3.10.0-1062.18.1.el7.x86_64/extra
2020-04-24_17:12:14.692+0200: runmmfs: [I] Unloading module mmfs26
2020-04-24_17:12:14.901+0200: runmmfs: [I] Unloading module mmfslinux
2020-04-24_17:12:15.018+0200: runmmfs: [I] Unloading module tracedev
2020-04-24_17:12:15.057+0200: runmmfs: [I] Loading modules from 
/lib/modules/3.10.0-1062.18.1.el7.x86_64/extra
Module                  Size  Used by
mmfs26               2657452  0
mmfslinux             809734  1 mmfs26
tracedev               48618  2 mmfs26,mmfslinux

2020-04-24_17:12:16.720+0200: Node rebooted.  Starting mmautoload...
2020-04-24_17:12:17.011+0200: [I] This node has a valid standard license
2020-04-24_17:12:17.011+0200: [I] Initializing the fast condition variables at 0x5561DFC365C0 ...
2020-04-24_17:12:17.011+0200: [I] mmfsd initializing. {Version: 5.0.4.2   Built: Jan 27 2020 
12:13:06} ...
2020-04-24_17:12:17.011+0200: [I] Cleaning old shared memory ...
2020-04-24_17:12:17.012+0200: [I] First pass parsing mmfs.cfg ...
2020-04-24_17:12:17.013+0200: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg.

2020-04-24_17:12:20.667+0200: mmautoload: Starting GPFS ...
2020-04-24_17:13:44.846+0200: mmremote: Initiating GPFS shutdown ...
2020-04-24_17:13:47.861+0200: mmremote: Starting the mmsdrserv daemon ...
2020-04-24_17:13:47.955+0200: mmremote: Unloading GPFS kernel modules ...
2020-04-24_17:13:48.165+0200: mmremote: Completing GPFS shutdown ...
--------------------------------------------------------------------------

Starting the gpfs service again manually then works without problems. Interestingly the missing 
mmfs.cfg _is there_ after the shutdown, it gets created shortly after the failure. That's why I am 
assuming a race condition:
--------------------------------------------------------------------------
# stat /var/mmfs/gen/mmfs.cfg
   File: ‘/var/mmfs/gen/mmfs.cfg’
   Size: 408             Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 268998265   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:var_t:s0
Access: 2020-04-27 17:12:19.801060073 +0200
Modify: 2020-04-24 17:12:17.617823441 +0200
Change: 2020-04-24 17:12:17.659823405 +0200
  Birth: -
--------------------------------------------------------------------------

Now, the interesting part:
- removing the ExecStartPost script makes the issue vanish. Reboot is always startign gpfs successfully
- reducing the ExecStartPost to simply one line ("exit 0") makes the issue stay. gpfs startup always 
fails.

Unfortunately IBM is refusing support because "the script is not coming with gpfs".

So I am searching for a solution that makes the script work on those servers again. Or a better way 
to wait for all local gpfs mounts being ready. Has anyone written something like that already?

Thank you,

Uli
-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196