[gpfsug-discuss] Unkillable snapshots

Simon Thompson S.J.Thompson at bham.ac.uk
Thu Feb 20 19:45:02 GMT 2020


Hi,


We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with:

Unable to quiesce all nodes; some processes are busy or holding required resources.
mmdelsnapshot: Command failed. Examine previous error messages to determine cause.


And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us.


What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset.


My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node.


So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive!


Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though.


Thanks


Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200220/d1aed677/attachment-0001.htm>


More information about the gpfsug-discuss mailing list