[gpfsug-discuss] gpfs waiters debugging

Frederick Stock stockf at us.ibm.com
Tue Jun 6 17:54:06 BST 2017


On recent releases you can accomplish the same with the command, "mmlsnode 
-N waiters -L".

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com



From:   valdis.kletnieks at vt.edu
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   06/06/2017 12:46 PM
Subject:        Re: [gpfsug-discuss] gpfs waiters debugging
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



On Tue, 06 Jun 2017 15:06:57 +0200, Stijn De Weirdt said:
> oh sure, i meant waiters that last > 300 seconds or so (something that
> could trigger deadlock). obviously we're not interested in debugging the
> short ones, it's not that gpfs doesn't work or anything ;)

At least at one time, a lot of the mm(whatever) administrative commands
would leave one dangling waiter for the duration of the command - which
could be a while if the command was mmdeldisk or mmrestripefs. I admit
not having specifically checked for gpfs 4.2, but it was true for 3.2 
through
4.1....

And my addition to the collective debugging knowledge:  A bash one-liner 
to
dump all the waiters across a cluster, sorted by wait time.  Note that
our clusters tend to be 5-8 servers, this may be painful for those of you
who have 400+ node clusters. :)

##!/bin/bash
for i in ` mmlsnode | tail -1 | sed 's/^[ ]*[^ ]*[ ]*//'`; do  ssh $i 
/usr/lpp/mmfs/bin/mmfsadm dump waiters | sed "s/^/$i /"; done | sort -n -r 
-k 3 -t' '

We've found it useful - if you have 1 waiter on one node that's 1278 
seconds
old, and 3 other nodes have waiters that are 1275 seconds old, it's a good
chance the other 3 nodes waiters are waiting on the first node's waiter to
resolve itself....
[attachment "attltepl.dat" deleted by Frederick Stock/Pittsburgh/IBM] 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170606/8d83fdf1/attachment-0002.htm>


More information about the gpfsug-discuss mailing list