[gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?

Billich Heinrich Rainer (ID SD) heinrich.billich at id.ethz.ch
Mon Sep 23 10:33:02 BST 2019

Hello Frederik,

Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings:

! maxFilesToCache 1048576
! maxStatCache 2097152

Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern.

I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz.


Daniel Gryniewicz <dang at redhat.com>
So, it's not impossible, based on the workload, but it may also be a bug.

For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
when the client closes the FD, and opening/closing all the time causes a
large performance hit.  So, we cache open FDs.

All handles in MDCACHE live on the LRU.  This LRU is divided into 2
levels.  Level 1 is more active handles, and they can have open FDs.
Various operation can demote a handle to level 2 of the LRU.  As part of
this transition, the global FD on that handle is closed.  Handles that
are actively in use (have a refcount taken on them) are not eligible for
this transition, as the FD may be being used.

We have a background thread that runs, and periodically does this
demotion, closing the FDs.  This thread runs more often when the number
of open FDs is above FD_HwMark_Percent of the available number of FDs,
and runs constantly when the open FD count is above FD_Limit_Percent of
the available number of FDs.

So, a heavily used server could definitely have large numbers of FDs
open.  However, there have also, in the past, been bugs that would
either keep the FDs from being closed, or would break the accounting (so
they were closed, but Ganesha still thought they were open).  You didn't
say what version of Ganesha you're using, so I can't tell if one of
those bugs apply.


On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    Kind regards,
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > Is it usual to see 200’000-400’000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200’000-400’000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1’000 – 10’000 open files by ganesha only and don’t show the issue.
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn’t open/close a file, it’s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > Thank you,
    > Heiner
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ‘ls -ls’ to 
    > list all the symbolic links, hence I can’t relate the open files to 
    > different exports easily.
    > I did post this to the ganesha mailing list, too.
    > -- 
    > =======================
    > Heinrich Billich
    > ETH Zürich
    > Informatikdienste
    > Tel.: +41 44 632 72 56
    > heinrich.billich at id.ethz.ch
    > ========================
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org

More information about the gpfsug-discuss mailing list