[gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting

Fri Dec 4 13:00:41 GMT 2015

One thing that we discovered very early on using CTDB (or CNFS for that matter) with GPFS is the importance of having the locking/sharing part of ctdb *not* be on the same filesystem that it is exporting. If they are the same, then as soon as the back-end main filesystem gets heavily loaded, ctdb will start timing out tickles and then you'll have all kinds of intermittent and inconvenient failures, often with manual recovery needed afterwards. We took some of the flash that we use for metadata and created a special cluster filesystem on that that has the ctdb locking database on it. Now, if the back-end main filesystem gets slow, it's just slow for all clients, instead of slow for GPFS clients and unavailable for NFS clients because all of the ctdb checks have failed.

Sent from my android device.

-----Original Message-----
From: "Howard, Stewart Jameson" <sjhoward at iu.edu>
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "Garrison, E Chris" <ecgarris at iu.edu>
Sent: Thu, 03 Dec 2015 22:45
Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting

Hi All,

At our site, we have very recently (as of ~48 hours ago) configured one of our supercomputers (an x86 cluster containing about 315 nodes) to be a GPFS client cluster and to access our core GPFS cluster using a remote mount, per the instuctions in the GFPS Advanced Administration Guide. In addition to allowing remote access from this newly-configured client cluster, we also export the filesystem via NFSv3 to two other supercomputers in our data center. We do not use the GPFS CNFS solution to provide NFS mounts. Instead, we use CTDB to manage NFS on the four core-cluster client nodes that re-export the filesystem.

The exports of NFSv3 managed by CTDB pre-date the client GPFS cluster deployment. Since deploying GPFS clients onto the one supercomputer, we have been experiencing a great deal of flapping in our CTDB layer. It's difficult to sort out what is causing what, but I can identify a handful of the symptoms that we're seeing:

1) In the CTDB logs of all the NFS server nodes, we see numerous complaints (on some nodes this is multiple times a day) that rpc.mountd is not running and is being restarted, i.e.,

“ERROR: MOUNTD is not running. Trying to restart it.”

2) In syslog, rpc.mountd can be seen complaining that it is unable to bind to a socket and that an address is already in use, i.e.,

“rpc.mountd[16869]: Could not bind socket: (98) Address already in use”

The rpc.mountd daemon on these nodes is manually constrained to use port 597. The mountd daemon seems able to listen for UDP connections on this port, but not for TCP connections. However, investigating `lsof` and `netstat` reveals no process that is using port 597 and preventing rpc.mountd from using it.

3) We also see nfsd failing its CTDB health check several times a day, i.e.,

“Event script timed out : 60.nfs monitor count : 0 pid : 7172”

Both the non-running state of rpc.mountd and the failure of nfsd to pass its CTDB health checks are causing multiple nodes in the NFS export cluster to become “UNHEALTHY” (the CTDB designation for it) multiple times a day, resulting in a lot of flapping and passing IP addresses back and forth.

I should mention here that nfsd on these nodes was running without any problems for the last month up until the night when we deployed the GPFS client cluster. After that deployment, the host of problems listed above suddenly started up. I should also mention that the new client GPFS cluster is running quite nicely, although it is generating a lot more open network sockets on the core-cluster side. We believe that the NFS problems starting at the same time as the GPFS client deployment is not a coincidence, and are inclined to conclude that something about deploying GPFS clients on the supercomputer in question is destabilizing the NFS instances running on the clients that belong to the core cluster.

Our current hypothesis is that introducing all of these new GPFS clients has caused contention for some resource on the core-cluster client nodes (ports?, open file handles?, something else?) and GPFS is winning out over NFS.

Does anyone have experience with running NFS and GPFS together in such an environment, especially with CTDB as a high-availability daemon? Has anyone perhaps seen these kinds of problems before or have any ideas as to what may be causing them?

We're happy to provide any additional diagnostics that the group would like to see in order to investigate. As always, we very much appreciate any help that you are able to provide.

Thank you so much!

Stewart Howard
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20151204/7f8d9d77/attachment-0002.htm>