[gpfsug-discuss] Quota issues, eviction, AFM won't stop throwing data to a full location - probably a rookie AFM mistake?

Venkateswara R Puvvada vpuvvada at in.ibm.com
Wed Mar 8 10:28:35 GMT 2017


1. What is the version of GPFS ? Eviction should not be blocking the 
applications. Was partial file caching enabled ? Eviction cannot evict 
partially cached files in recent releases. Eviction does not use space 
inside .afm directory, and its logs are stored under /var/mmfs/tmp by 
default.

2.  I did not understand this requirement.
  a. When IO to home fails with quota exceeded or  no space error , the 
messages are requeued at gateway node and will be retried later (usually 
15 minutes).  Cache cannot read home quotas today, and in most of the 
cases this is not valid. 
  b. When soft quota is exceeded on AFM fileset, auto eviction clears data 
blocks on files  based on LRU policy to bring quota below soft limit. 
These evicted files are uncached and there is no real migration of data to 
home during eviction. Eviction should get triggered before fileset usage 
nearing hard quota and applications getting errors.

~Venkat (vpuvvada at in.ibm.com)



From:   Jake Carroll <jake.carroll at uq.edu.au>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   03/05/2017 02:05 AM
Subject:        [gpfsug-discuss] Quota issues, eviction, AFM won't stop 
throwing data to a full location - probably a rookie AFM mistake?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi all,
 
I think I need some help with GPFS quotas and hard limits vs soft limits + 
eviction in AFM scenarios. We’ve got a couple of issues:
 
One:
-------
We’ve come across a scenario where if a user hits the hard quota while 
ingesting into cache in an AFM “home to cache” relationship whilst an 
eviction loop is being triggered, things seem to go wrong – and the 
filesystem runs off into locking up territory.  The report I have on the 
last incident is that a file-set got stuck at 100% (capacity utilisation), 
the eviction loop either failed or blocked and the IO requests blocked 
and/or failed (this one I'm a little fuzzy on). 
 
Maybe it isn’t a bug and our guess is that someone on here will probably 
tell us that the likely “fix” is to right-size our high and low water 
marks appropriately. We considered a potential bug mechanism or race 
condition if the eviction loop uses space in the file-set in the .afm 
directory – but I then thought better of it and though “Nah, surely IBM 
would have thought of that!”.
 
Two:
-------
We witness a scenario where AFM doesn't back off if it gets a filesystem 
full error code when trying to make the cache clean in migrating data to 
“home”. If this takes a couple of seconds to raise the error each attempt, 
gpfs/mmfsd will deplete NFS daemons causing a DoS against the NFS server 
that is powering the cache/home relationship for the AFM transport.

We had a mental model that AFM cache wouldn’t or shouldn’t overload hard 
and soft quota as the high and low watermarks for cache eviction policies. 
I guess in our heads, we’d like caches to also enforce and respect quotas 
based on requests received from home. There are probably lots of reasons 
this doesn’t make sense programmatically, or to the rest of scale – but it 
would (seem to us) that it would clean up this problem or at least some of 
it.
 
Happy to chat through his further and explain it more if anyone is 
interested. If there are any AFM users out there, we’d love to hear about 
how you deal with quotas, hwm/lwm and eviction over-flow scenarios, if 
they exist in your environment.
 
Thank you as always, list.
 
-jc
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170308/fa13e6f6/attachment-0002.htm>


More information about the gpfsug-discuss mailing list