[gpfsug-discuss] NFS issues

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Tue Apr 25 12:20:39 BST 2017


Hi,

We have recently started deploying NFS in addition our existing SMB
exports on our protocol nodes.

We use a RR DNS name that points to 4 VIPs for SMB services and failover
seems to work fine with SMB clients. We figured we could use the same name
and IPs and run Ganesha on the protocol servers, however we are seeing
issues with NFS clients when IP failover occurs.

In normal operation on a client, we might see several mounts from
different IPs obviously due to the way the DNS RR is working, but it all
works fine.

In a failover situation, the IP will move to another node and some clients
will carry on, others will hang IO to the mount points referred to by the
IP which has moved. We can *sometimes* trigger this by manually suspending
a CES node, but not always and some clients mounting from the IP moving
will be fine, others won't.

If we resume a node an it fails back, the clients that are hanging will
usually recover fine. We can reboot a client prior to failback and it will
be fine, stopping and starting the ganesha service on a protocol node will
also sometimes resolve the issues.

So, has anyone seen this sort of issue and any suggestions for how we
could either debug more or workaround?

We are currently running the packages
nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).

At one point we were seeing it a lot, and could track it back to an
underlying GPFS network issue that was causing protocol nodes to be
expelled occasionally, we resolved that and the issues became less
apparent, but maybe we just fixed one failure mode so see it less often.

On the clients, we use -o sync,hard BTW as in the IBM docs.

On a client showing the issues, we'll see in dmesg, NFS related messages
like:
[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
responding, timed out

Which explains the client hang on certain mount points.

The symptoms feel very much like those logged in this Gluster/ganesha bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1354439


Thanks

Simon




More information about the gpfsug-discuss mailing list