[gpfsug-discuss] Unkillable snapshots

Simon Thompson S.J.Thompson at bham.ac.uk
Thu Feb 20 20:13:14 GMT 2020


Hmm ... mmdiag --tokenmgr shows:


    Server stats: requests 195417431 ServerSideRevokes 120140
           nTokens 2146923 nranges 4124507
           designated mnode appointed 55481 mnode thrashing detected 1036


So how do I convert "1036" to a node?


Simon

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent: 20 February 2020 19:45:02
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Unkillable snapshots


Hi,


We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with:

Unable to quiesce all nodes; some processes are busy or holding required resources.
mmdelsnapshot: Command failed. Examine previous error messages to determine cause.


And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us.


What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset.


My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node.


So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive!


Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though.


Thanks


Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200220/bbbe37d4/attachment-0002.htm>


More information about the gpfsug-discuss mailing list