[gpfsug-discuss] NFS issues

Wed Apr 26 15:27:03 BST 2017

Would it help to lower the grace time?

mmnfs configuration change LEASE_LIFETIME=10
mmnfs configuration change GRACE_PERIOD=10

-jf
ons. 26. apr. 2017 kl. 16.20 skrev Simon Thompson (IT Research Support) <
S.J.Thompson at bham.ac.uk>:

> Nope, the clients are all L3 connected, so not an arp issue.
>
> Two things we have observed:
>
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
>
>
>
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
>
>
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
>
>
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
>
> Simon
>
> On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Greg.Lehmann at csiro.au> wrote:
>
> >Are you using infiniband or Ethernet? I'm wondering if IBM have solved
> >the gratuitous arp issue which we see with our non-protocols NFS
> >implementation.
> >
> >-----Original Message-----
> >From: gpfsug-discuss-bounces at spectrumscale.org
> >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
> >Thompson (IT Research Support)
> >Sent: Wednesday, 26 April 2017 3:31 AM
> >To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> >Subject: Re: [gpfsug-discuss] NFS issues
> >
> >I did some digging in the mmcesfuncs to see what happens server side on
> >fail over.
> >
> >Basically the server losing the IP is supposed to terminate all sessions
> >and the receiver server sends ACK tickles.
> >
> >My current supposition is that for whatever reason, the losing server
> >isn't releasing something and the client still has hold of a connection
> >which is mostly dead. The tickle then fails to the client from the new
> >server.
> >
> >This would explain why failing the IP back to the original server usually
> >brings the client back to life.
> >
> >This is only my working theory at the moment as we can't reliably
> >reproduce this. Next time it happens we plan to grab some netstat from
> >each side.
> >
> >Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
> >server that received the IP and see if that fixes it (i.e. the receiver
> >server didn't tickle properly). (Usage extracted from mmcesfuncs which is
> >ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
> >for anyone interested.
> >
> >Then try and kill he sessions on the losing server to check if there is
> >stuff still open and re-tickle the client.
> >
> >If we can get steps to workaround, I'll log a PMR. I suppose I could do
> >that now, but given its non deterministic and we want to be 100% sure
> >it's not us doing something wrong, I'm inclined to wait until we do some
> >more testing.
> >
> >I agree with the suggestion that it's probably IO pending nodes that are
> >affected, but don't have any data to back that up yet. We did try with a
> >read workload on a client, but may we need either long IO blocked reads
> >or writes (from the GPFS end).
> >
> >We also originally had soft as the default option, but saw issues then
> >and the docs suggested hard, so we switched and also enabled sync (we
> >figured maybe it was NFS client with uncommited writes), but neither have
> >resolved the issues entirely. Difficult for me to say if they improved
> >the issue though given its sporadic.
> >
> >Appreciate people's suggestions!
> >
> >Thanks
> >
> >Simon
> >________________________________________
> >From: gpfsug-discuss-bounces at spectrumscale.org
> >[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
> >Myklebust [janfrode at tanso.net]
> >Sent: 25 April 2017 18:04
> >To: gpfsug main discussion list
> >Subject: Re: [gpfsug-discuss] NFS issues
> >
> >I *think* I've seen this, and that we then had open TCP connection from
> >client to NFS server according to netstat, but these connections were not
> >visible from netstat on NFS-server side.
> >
> >Unfortunately I don't remember what the fix was..
> >
> >
> >
> >  -jf
> >
> >tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
> ><S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
> >Hi,
> >
> >From what I can see, Ganesha uses the Export_Id option in the config file
> >(which is managed by CES) for this. I did find some reference in the
> >Ganesha devs list that if its not set, then it would read the FSID from
> >the GPFS file-system, either way they should surely be consistent across
> >all the nodes. The posts I found were from someone with an IBM email
> >address, so I guess someone in the IBM teams.
> >
> >I checked a couple of my protocol nodes and they use the same Export_Id
> >consistently, though I guess that might not be the same as the FSID value.
> >
> >Perhaps someone from IBM could comment on if FSID is likely to the cause
> >of my problems?
> >
> >Thanks
> >
> >Simon
> >
> >On 25/04/2017, 14:51,
> >"gpfsug-discuss-bounces at spectrumscale.org<mailto:
> gpfsug-discuss-bounces at sp
> >ectrumscale.org> on behalf of Ouwehand, JJ"
> ><gpfsug-discuss-bounces at spectrumscale.org<mailto:
> gpfsug-discuss-bounces at sp
> >ectrumscale.org> on behalf of
> >j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
> >
> >>Hello,
> >>
> >>At first a short introduction. My name is Jaap Jan Ouwehand, I work at
> >>a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
> >>IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
> >>critical (office, research and clinical data) business process. We have
> >>three large GPFS filesystems for different purposes.
> >>
> >>We also had such a situation with cNFS. A failover (IPtakeover) was
> >>technically good, only clients experienced "stale filehandles". We
> >>opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
> >>months later, the solution appeared to be in the fsid option.
> >>
> >>An NFS filehandle is built by a combination of fsid and a hash function
> >>on the inode. After a failover, the fsid value can be different and the
> >>client has a "stale filehandle". To avoid this, the fsid value can be
> >>statically specified. See:
> >>
> >>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
> >>.
> >>scale.v4r22.doc/bl1adm_nfslin.htm
> >>
> >>Maybe there is also a value in Ganesha that changes after a failover.
> >>Certainly since most sessions will be re-established after a failback.
> >>Maybe you see more debug information with tcpdump.
> >>
> >>
> >>Kind regards,
> >>
> >>Jaap Jan Ouwehand
> >>ICT Specialist (Storage & Linux)
> >>VUmc - ICT
> >>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
> >>W: www.vumc.com<http://www.vumc.com>
> >>
> >>
> >>
> >>-----Oorspronkelijk bericht-----
> >>Van:
> >>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
> >>spectrumscale.org>
> >>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
> >>bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
> >>Verzonden: dinsdag 25 april 2017 13:21
> >>Aan:
> >>gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
> >>g>
> >>Onderwerp: [gpfsug-discuss] NFS issues
> >>
> >>Hi,
> >>
> >>We have recently started deploying NFS in addition our existing SMB
> >>exports on our protocol nodes.
> >>
> >>We use a RR DNS name that points to 4 VIPs for SMB services and
> >>failover seems to work fine with SMB clients. We figured we could use
> >>the same name and IPs and run Ganesha on the protocol servers, however
> >>we are seeing issues with NFS clients when IP failover occurs.
> >>
> >>In normal operation on a client, we might see several mounts from
> >>different IPs obviously due to the way the DNS RR is working, but it
> >>all works fine.
> >>
> >>In a failover situation, the IP will move to another node and some
> >>clients will carry on, others will hang IO to the mount points referred
> >>to by the IP which has moved. We can *sometimes* trigger this by
> >>manually suspending a CES node, but not always and some clients
> >>mounting from the IP moving will be fine, others won't.
> >>
> >>If we resume a node an it fails back, the clients that are hanging will
> >>usually recover fine. We can reboot a client prior to failback and it
> >>will be fine, stopping and starting the ganesha service on a protocol
> >>node will also sometimes resolve the issues.
> >>
> >>So, has anyone seen this sort of issue and any suggestions for how we
> >>could either debug more or workaround?
> >>
> >>We are currently running the packages
> >>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
> >>
> >>At one point we were seeing it a lot, and could track it back to an
> >>underlying GPFS network issue that was causing protocol nodes to be
> >>expelled occasionally, we resolved that and the issues became less
> >>apparent, but maybe we just fixed one failure mode so see it less often.
> >>
> >>On the clients, we use -o sync,hard BTW as in the IBM docs.
> >>
> >>On a client showing the issues, we'll see in dmesg, NFS related
> >>messages
> >>like:
> >>[Wed Apr 12 16:59:53 2017] nfs: server
> >>MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
> >>timed out
> >>
> >>Which explains the client hang on certain mount points.
> >>
> >>The symptoms feel very much like those logged in this Gluster/ganesha
> >>bug:
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
> >>
> >>
> >>Thanks
> >>
> >>Simon
> >>
> >>_______________________________________________
> >>gpfsug-discuss mailing list
> >>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>_______________________________________________
> >>gpfsug-discuss mailing list
> >>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >_______________________________________________
> >gpfsug-discuss mailing list
> >gpfsug-discuss at spectrumscale.org
> >http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170426/741e63c4/attachment-0002.htm>