[gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected

Billich Heinrich Rainer (ID SD) heinrich.billich at id.ethz.ch
Thu Apr 16 09:16:59 BST 2020


Hello,

I’m puzzled  about the difference between the two mmhealth events

longwaiters_found ERROR Detected Spectrum Scale long-waiters

and

deadlock_detected         WARNING    The cluster detected a Spectrum Scale filesystem deadlock

Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ‘mmdiag –deadlock’ and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by  mmfsd? Whenever  a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag –deadlock shows waiting threads? Shouldn’t  the severity be the opposite way?

Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens – just getting the output of ‘mmdiag –deadlock’ on the single node could give some hints. Without I don’t see any real chance to take any action.

Thank you,

Heiner
--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200416/4f6e1a50/attachment-0001.htm>


More information about the gpfsug-discuss mailing list