[gpfsug-discuss] Quota issues, eviction, AFM won't stop throwing data to a full location - probably a rookie AFM mistake?

Jake Carroll jake.carroll at uq.edu.au
Sat Mar 4 20:34:51 GMT 2017


Hi all,

I think I need some help with GPFS quotas and hard limits vs soft limits + eviction in AFM scenarios. We’ve got a couple of issues:

One:
-------
We’ve come across a scenario where if a user hits the hard quota while ingesting into cache in an AFM “home to cache” relationship whilst an eviction loop is being triggered, things seem to go wrong – and the filesystem runs off into locking up territory.  The report I have on the last incident is that a file-set got stuck at 100% (capacity utilisation), the eviction loop either failed or blocked and the IO requests blocked and/or failed (this one I'm a little fuzzy on).

Maybe it isn’t a bug and our guess is that someone on here will probably tell us that the likely “fix” is to right-size our high and low water marks appropriately. We considered a potential bug mechanism or race condition if the eviction loop uses space in the file-set in the .afm directory – but I then thought better of it and though “Nah, surely IBM would have thought of that!”.

Two:
-------
We witness a scenario where AFM doesn't back off if it gets a filesystem full error code when trying to make the cache clean in migrating data to “home”. If this takes a couple of seconds to raise the error each attempt, gpfs/mmfsd will deplete NFS daemons causing a DoS against the NFS server that is powering the cache/home relationship for the AFM transport.

We had a mental model that AFM cache wouldn’t or shouldn’t overload hard and soft quota as the high and low watermarks for cache eviction policies. I guess in our heads, we’d like caches to also enforce and respect quotas based on requests received from home. There are probably lots of reasons this doesn’t make sense programmatically, or to the rest of scale – but it would (seem to us) that it would clean up this problem or at least some of it.

Happy to chat through his further and explain it more if anyone is interested. If there are any AFM users out there, we’d love to hear about how you deal with quotas, hwm/lwm and eviction over-flow scenarios, if they exist in your environment.

Thank you as always, list.

-jc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170304/6d214ea6/attachment-0001.htm>


More information about the gpfsug-discuss mailing list