[gpfsug-discuss] Filesystem access issues via CES NFS
Leonardo Sala
leonardo.sala at psi.ch
Fri Oct 4 07:32:42 BST 2019
Dear Malahal,
thanks for the answer. Concerning SSSD, we are also using it, should we
use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues
with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to
use 5.0.3.3?
cheers
leo
[*]
https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0
Paul Scherrer Institut
Dr. Leonardo Sala
Group Leader High Performance Computing
Deputy Section Head Science IT
Science IT
WHGA/106
5232 Villigen PSI
Switzerland
Phone: +41 56 310 3369
leonardo.sala at psi.ch
www.psi.ch
On 03.10.19 19:15, Malahal R Naineni wrote:
> >> @Malahal: Looks like you have written the netgroup caching code,
> feel free to ask for further details if required.
> Hi Ulrich, Ganesha uses innetgr() call for netgroup information and
> sssd has too many issues in its implementation. Redhat said that they
> are going to fix sssd synchronization issues in RHEL8. It is in my
> plate to serialize innergr() call in Ganesha to match kernel NFS
> server usage! I expect the sssd issue to give EACCESS/EPERM kind of
> issue but not EINVAL though.
> If you are using sssd, you must be getting into a sssd issue.
> Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you
> use ganesha version V2.5.3-ibm030.01 if you are using netgroups
> (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later)
> Regards, Malahal.
>
> ----- Original message -----
> From: Ulrich Sibiller <u.sibiller at science-computing.de>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Cc:
> Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS
> Date: Thu, Dec 13, 2018 7:32 PM
> On 23.11.2018 14:41, Andreas Mattsson wrote:
> > Yes, this is repeating.
> >
> > We’ve ascertained that it has nothing to do at all with file
> operations on the GPFS side.
> >
> > Randomly throughout the filesystem mounted via NFS, ls or file
> access will give
> >
> > ”
> >
> > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid
> argument
> >
> > “
> >
> > Trying again later might work on that folder, but might fail
> somewhere else.
> >
> > We have tried exporting the same filesystem via a standard
> kernel NFS instead of the CES
> > Ganesha-NFS, and then the problem doesn’t exist.
> >
> > So it is definitely related to the Ganesha NFS server, or its
> interaction with the file system.
> > > Will see if I can get a tcpdump of the issue.
>
> We see this, too. We cannot trigger it. Fortunately I have managed
> to capture some logs with
> debugging enabled. I have now dug into the ganesha 2.5.3 code and
> I think the netgroup caching is
> the culprit.
>
> Here some FULL_DEBUG output:
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250]
> export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for
> export id 1 path /gpfsexport
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] client_match
> :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1
> (options=421021e2root_squash , RWrw,
> 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2,
> anon_gid= -2, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
> :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] client_match
> :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2
> (options=421021e2root_squash , RWrw,
> 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2,
> anon_gid= -2, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
> :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] client_match
> :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3
> (options=421021e2root_squash , RWrw,
> 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2,
> anon_gid= -2, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
> :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250]
> export_check_access :EXPORT :M_DBG :EXPORT (options=03303002
> , , ,
> , , -- Deleg, , )
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250]
> export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS
> (options=42102002root_squash , ----, 3--, ---,
> TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid=
> -2, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250]
> export_check_access :EXPORT :M_DBG :default options
> (options=03303002root_squash , ----, 34-, UDP,
> TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid=
> -2, none, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250]
> export_check_access :EXPORT :M_DBG :Final options
> (options=42102002root_squash , ----, 3--, ---,
> TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid=
> -2, sys)
> 2018-12-13 11:53:41 : epoch 0009008d : server1 :
> gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute
> :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to
> access Export_Id 1 /gpfsexport,
> vers=3, proc=18
>
> The client "client1" is definitely a member of the "netgroup1".
> But the NETGROUP_CLIENT lookups for
> "netgroup2" and "netgroup3" can only happen if the netgroup
> caching code reports that "client1" is
> NOT a member of "netgroup1".
>
> I have also opened a support case at IBM for this.
>
> @Malahal: Looks like you have written the netgroup caching code,
> feel free to ask for further
> details if required.
>
> Kind regards,
>
> Ulrich Sibiller
>
> --
> Dipl.-Inf. Ulrich Sibiller science + computing ag
> System Administration Hagellocher Weg 73
> 72070 Tuebingen, Germany
> https://atos.net/de/deutschland/sc
> --
> Science + Computing AG
> Vorstandsvorsitzender/Chairman of the board of management:
> Dr. Martin Matzke
> Vorstand/Board of Management:
> Matthias Schempp, Sabine Hohenstein
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Aufsichtsrat/Supervisory Board:
> Martin Wibbe, Ursula Morgenstern
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20191004/3090cc50/attachment.htm>
More information about the gpfsug-discuss
mailing list