[gpfsug-discuss] Large in doubt on fileset

Tomer Perry TOMP at il.ibm.com
Mon Oct 7 17:22:13 BST 2019


Hi,

The major change around 4.X in quotas was the introduction of dynamic 
shares. In the past, every client share request was for constant number of 
blocks ( 20 blocks by default). For high performing system, it wasn't 
enough sometime ( imagine 320M for nodes are writing at 20GB/s). So, 
dynamic shares means that a client node can request 10000 blocks etc. etc. 
( it doesn't mean that the server will provide those...).
OTOH, node failure will leave more "stale in doubt" capacity since the 
server don't know how much of the share was actually used.

Imagine a client node getting 1024 blocks ( 16G), using 20M and crashing. 
>From the server perspective, there are 16G "unknown", now multiple that by 
multiple nodes...
The only way to solve it is indeed to execute mmcheckquota - but as you 
probably know, its not cheap.

So, do you experience large number of node expels/crashes etc. that might 
be related to that ( otherwise, it might be some other bug that needs to 
be fixed...). 

Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: tomp at il.ibm.com
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625




From:   Jaime Pinto <pinto at scinet.utoronto.ca>
To:     gpfsug-discuss at spectrumscale.org
Date:   07/10/2019 17:40
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Large in doubt on fileset
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



We run DSS as well, also 4.2.x versions, and large indoubt entries are 
common on our file systems, much larger than what you are seeing, for USR, 
GRP 
and FILESET.

It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances 
(GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal 
automatic 
mechanism to reconcile accounting seems very laggy by default, and I 
couldn't find (yet) a config parameter to adjust this. I stopped trying to 

understand why this happens.

Our users are all subject to quotas, and can't wait indefinitely for this 
reconciliation. I just run mmcheckquota every 6 hours via a crontab.

I hope version 5 is better. Will know in a couple of months.
Jaime



On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote:
> 
> I have a DSS-G system running 4.2.3-7, and on Friday afternoon became
> aware that there is a very large (at least I have never seen anything
> on this scale before) in doubt on a fileset. It has persisted over the
> weekend and is sitting at 17.5TB, with the fileset having a 150TB quota
> and only 82TB in use.
> 
> There is a relatively large 26,500 files in doubt, though there is no
> quotas on file numbers for the fileset. This has come down from some
> 47,500 on Friday when the in doubt was a shade over 18TB.
> 
> The largest in doubt I have seen in the past was in the order of a few
> hundred GB under very heavy write that went away very quickly after the
> writing stopped.
> 
> There is no evidence of heavy writing going on in the file system so I
> am perplexed as to why the in doubt is remaining so high.
> 
> Any thoughts as to what might be going on?
> 
> 
> JAB.
> 



          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=esG-w1Wj_wInSHpT5fEhqVQMqpR15ZXaGxoQmjOKdDc&e= 

          ************************************
---
Jaime Pinto - Storage Analyst
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=dxj6p74pt5iaKKn4KvMmMPyLcUD5C37HbIc2zX-iWgY&e= 






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20191007/92f8e7ff/attachment-0002.htm>


More information about the gpfsug-discuss mailing list