[gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?

Frederik Ferner frederik.ferner at diamond.ac.uk
Wed Oct 2 19:41:14 BST 2019


Hello Heiner,

very interesting, thanks.

In our case we are seeing this problem on 
gpfs.nfs-ganesha-gpfs-2.5.3-ibm036.05.el7, so close to the version where 
you're seeing it.

Frederik

On 23/09/2019 10:33, Billich  Heinrich Rainer (ID SD) wrote:
> Hello Frederik,
> 
> Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings:
> 
> ! maxFilesToCache 1048576
> ! maxStatCache 2097152
> 
> Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern.
> 
> I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz.
> 
> Cheers,
> Heiner
> 
> Daniel Gryniewicz <dang at redhat.com>
> So, it's not impossible, based on the workload, but it may also be a bug.
> 
> For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
> when the client closes the FD, and opening/closing all the time causes a
> large performance hit.  So, we cache open FDs.
> 
> All handles in MDCACHE live on the LRU.  This LRU is divided into 2
> levels.  Level 1 is more active handles, and they can have open FDs.
> Various operation can demote a handle to level 2 of the LRU.  As part of
> this transition, the global FD on that handle is closed.  Handles that
> are actively in use (have a refcount taken on them) are not eligible for
> this transition, as the FD may be being used.
> 
> We have a background thread that runs, and periodically does this
> demotion, closing the FDs.  This thread runs more often when the number
> of open FDs is above FD_HwMark_Percent of the available number of FDs,
> and runs constantly when the open FD count is above FD_Limit_Percent of
> the available number of FDs.
> 
> So, a heavily used server could definitely have large numbers of FDs
> open.  However, there have also, in the past, been bugs that would
> either keep the FDs from being closed, or would break the accounting (so
> they were closed, but Ganesha still thought they were open).  You didn't
> say what version of Ganesha you're using, so I can't tell if one of
> those bugs apply.
> 
> Daniel
> 
> On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:
> 
>      Heiner,
>      
>      we are seeing similar issues with CES/ganesha NFS, in our case it
>      exclusively with NFSv3 clients.
>      
>      What is maxFilesToCache set to on your ganesha node(s)? In our case
>      ganesha was running into the limit of open file descriptors because
>      maxFilesToCache was set at a low default and for now we've increased it
>      to 1M.
>      
>      It seemed that ganesha was never releasing files even after clients
>      unmounted the file system.
>      
>      We've only recently made the change, so we'll see how much that improved
>      the situation.
>      
>      I thought we had a reproducer but after our recent change, I can now no
>      longer successfully reproduce the increase in open files not being released.
>      
>      Kind regards,
>      Frederik
>      
>      On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
>      > Hello,
>      >
>      > Is it usual to see 200’000-400’000 open files for a single ganesha
>      > process? Or does this indicate that something ist wrong?
>      >
>      > We have some issues with ganesha (on spectrum scale protocol nodes)
>      >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes
>      > have a large number of open files, 200’000-400’000 open files per daemon
>      > (and 500 threads and about 250 client connections). Other nodes have
>      > 1’000 – 10’000 open files by ganesha only and don’t show the issue.
>      >
>      > If someone could explain how ganesha decides which files to keep open
>      > and which to close that would help, too. As NFSv3 is stateless the
>      > client doesn’t open/close a file, it’s the server to decide when to
>      > close it? We do have a few NFSv4 clients, too.
>      >
>      > Are there certain access patterns that can trigger such a large number
>      > of open file? Maybe traversing and reading a large number of small files?
>      >
>      > Thank you,
>      >
>      > Heiner
>      >
>      > I did count the open files  by counting the entries in /proc/<pid of
>      > ganesha>/fd/ . With several 100k entries I failed to do a ‘ls -ls’ to
>      > list all the symbolic links, hence I can’t relate the open files to
>      > different exports easily.
>      >
>      > I did post this to the ganesha mailing list, too.
>      >
>      > --
>      >
>      > =======================
>      >
>      > Heinrich Billich
>      >
>      > ETH Zürich
>      >
>      > Informatikdienste
>      >
>      > Tel.: +41 44 632 72 56
>      >
>      > heinrich.billich at id.ethz.ch
>      >
>      > ========================
>      >
>      >
>      > _______________________________________________
>      > gpfsug-discuss mailing list
>      > gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
>      
>      
>      
>      --
>      This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
>      Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
>      Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
>      Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>      _______________________________________________
>      gpfsug-discuss mailing list
>      gpfsug-discuss at spectrumscale.org
>      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 


-- 
Frederik Ferner
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

Duty Sys Admin can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)



More information about the gpfsug-discuss mailing list