[gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2

Bryan Hill bhill at physics.ucsd.edu
Fri Feb 14 18:10:04 GMT 2020


Hi All:

I'm performing a rolling upgrade of one of our GPFS clusters.  This
particular cluster has 2 CNFS servers for some of our NFS clients.  I wiped
one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2.  The filesystem
mounts fine on the node when I disable CNFS on the node, but with it
enabled it's a no go.  It appears mmnfsmonitor doesn't recognize that nfsd
has started, so it assumes the worst and shuts down the file system (I
currently have reboot on failure disabled to debug this).  The thing is, it
actually does start nfsd processes when running mmstartup on the node.
Doing a "ps" shows 32 nfsd threads are running.

Below is the CNFS-specific output from an attempt to start the node:

CNFS[27243]: Restarting lockd to start grace
CNFS[27588]: Enabling 172.16.69.76
CNFS[27694]: Restarting lockd to start grace
CNFS[27699]: Starting NFS services
CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
CNFS[27910]: Monitor has started pid=27787
CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it
CNFS[28705]: Starting NFS services
CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it
CNFS[28758]: Starting NFS services
CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it
CNFS[28816]: Starting NFS services
CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it
CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server
is not running or responding. Node failure initiated as configured.
CNFS[28924]: Unexporting all GPFS filesystems

Any thoughts?  My other CNFS node is handling everything for the time
being, thankfully!

Thanks,
Bryan

---
Bryan Hill
Lead System Administrator
UCSD Physics Computing Facility

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bhill at ucsd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200214/792ec953/attachment-0001.htm>


More information about the gpfsug-discuss mailing list