[gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add
Oesterlin, Robert
Robert.Oesterlin at nuance.com
Mon Jun 5 16:54:09 BST 2017
Our node build process re-adds a node to the cluster and then does a “service gpfs start”, but GPFS doesn’t start. From the build log:
+ ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com'
+ rc=0
+ chkconfig gpfs on
+ service gpfs start
The “service gpfs start” command hangs and never seems to return.
If I look at the process tree:
[root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs"
11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post
12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes no
12208 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
12271 ? S 0:00 /bin/sh /sbin/service gpfs start
12276 ? S 0:00 /bin/sh /etc/init.d/gpfs start
12278 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
12292 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num
12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e s/\.num$//
21639 ? S 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
This is GPFS 4.2.2-1
This seems to occur only on the initial startup after build - if I try to start GPFS again, it works just fine - any ideas on what it’s sitting here waiting? Nothing in mmfslog (does not exist)
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170605/aa54df89/attachment-0001.htm>
More information about the gpfsug-discuss
mailing list