[gpfsug-discuss] gpfs waiters debugging
Frederick Stock
stockf at us.ibm.com
Tue Jun 6 17:54:06 BST 2017
On recent releases you can accomplish the same with the command, "mmlsnode
-N waiters -L".
Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com
From: valdis.kletnieks at vt.edu
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 06/06/2017 12:46 PM
Subject: Re: [gpfsug-discuss] gpfs waiters debugging
Sent by: gpfsug-discuss-bounces at spectrumscale.org
On Tue, 06 Jun 2017 15:06:57 +0200, Stijn De Weirdt said:
> oh sure, i meant waiters that last > 300 seconds or so (something that
> could trigger deadlock). obviously we're not interested in debugging the
> short ones, it's not that gpfs doesn't work or anything ;)
At least at one time, a lot of the mm(whatever) administrative commands
would leave one dangling waiter for the duration of the command - which
could be a while if the command was mmdeldisk or mmrestripefs. I admit
not having specifically checked for gpfs 4.2, but it was true for 3.2
through
4.1....
And my addition to the collective debugging knowledge: A bash one-liner
to
dump all the waiters across a cluster, sorted by wait time. Note that
our clusters tend to be 5-8 servers, this may be painful for those of you
who have 400+ node clusters. :)
##!/bin/bash
for i in ` mmlsnode | tail -1 | sed 's/^[ ]*[^ ]*[ ]*//'`; do ssh $i
/usr/lpp/mmfs/bin/mmfsadm dump waiters | sed "s/^/$i /"; done | sort -n -r
-k 3 -t' '
We've found it useful - if you have 1 waiter on one node that's 1278
seconds
old, and 3 other nodes have waiters that are 1275 seconds old, it's a good
chance the other 3 nodes waiters are waiting on the first node's waiter to
resolve itself....
[attachment "attltepl.dat" deleted by Frederick Stock/Pittsburgh/IBM]
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170606/8d83fdf1/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list