[gpfsug-discuss] Executing Callbacks on other Nodes

Marc A Kaplan makaplan at us.ibm.com
Tue Apr 12 23:01:40 BST 2016


My understanding is (someone will correct me if I'm wrong) ...

GPFS does not have true deadlock detection.  As you say it has time outs. 
The argument is:  As a practical matter, it makes not much difference to a 
sysadmin or user -- if things are gummed up "too long" they start to smell 
like a deadlock, so we may as well intervene as though there were a true 
technical deadlock.

A genuine true deadlock is a situation where things are gummed up, there 
is no progress, and one can prove that there will be no progress, no 
matter how long one waits.
E.g. Classically, you have locked resource A and I have locked resource B 
and now I decide I need resource A and I am waiting indefinitely long for 
that.  And you have decided you need resouce B and you are waiting 
indefinitely for that.  We are then deadlocked.    Deadlock can occur on a 
single node or over multiple nodes. 

 Technically it may be possible to execute a deadlock detection protocol 
that would identify cyclic, deadlocking dependencies, but it was decided 
that, for GPFS, it would be more practical to detect "very long 
waiters"...




From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>


Some general thoughts on “deadlocks” and automated deadlock detection.

I personally don’t like the term “deadlock” as it implies a condition that 
won’t ever resolve itself. In GPFS terms, a deadlock is really a “long RPC 
waiter” over a certain threshold. RPCs that wait on certain events can and 
do occur and they can take some time to complete. This is not necessarily 
a condition that is a problem, but you should be looking into them.

GPFS does have automated deadlock detection and collection, but in the 
early releases it was … well.. it’s not very “robust”. With later releases 
(4.2) it’s MUCH better. I personally don’t rely on it because in larger 
clusters it can be too aggressive and depending on what’s really going on 
it can make things worse. This statement is my opinion and it doesn’t mean 
it’s not a good thing to have. :-) 


...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160412/4865ce10/attachment-0002.htm>


More information about the gpfsug-discuss mailing list