[gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2

Bryan Hill bhill at physics.ucsd.edu
Mon Jun 1 16:32:09 BST 2020


Hi:

Just a note on this:  the pidof fix was accepted upstream but has not
made its way into rhel 8.2 yet


Thanks,
Bryan

---
Bryan Hill
Lead System Administrator
UCSD Physics Computing Facility

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bhill at ucsd.edu

On Mon, Feb 17, 2020 at 12:02 AM Malahal R Naineni <mnaineni at in.ibm.com> wrote:
>
> I filed a defect here, let us see what Redhat says. Yes, it doesn't work for any kernel threads. It doesn't work for user level threads/processes.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1803640
>
> Regards, Malahal.
>
>
> ----- Original message -----
> From: Bryan Hill <bhill at physics.ucsd.edu>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2
> Date: Mon, Feb 17, 2020 8:26 AM
>
> Ah wait, I see what you might mean.  pidof works but not specifically for processes like nfsd.  That is odd.
>
> Thanks,
> Bryan
>
>
>
> On Sun, Feb 16, 2020 at 10:19 AM Bryan Hill <bhill at physics.ucsd.edu> wrote:
>
> Hi Malahal:
>
> Just to clarify, are you saying that on your VM pidof is missing?   Or that it is there and not working as it did prior to RHEL/CentOS 8?  pidof is returning pid numbers on my system.  I've been looking at the mmnfsmonitor script and trying to see where the check for nfsd might be failing, but I've not been able to figure it out yet.
>
>
>
> Thanks,
> Bryan
>
> ---
> Bryan Hill
> Lead System Administrator
> UCSD Physics Computing Facility
>
> 9500 Gilman Dr.  # 0319
> La Jolla, CA 92093
> +1-858-534-5538
> bhill at ucsd.edu
>
> On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni <mnaineni at in.ibm.com> wrote:
>
> I am not familiar with CNFS but looking at git source seems to indicate that it uses 'pidof' to check if a program is running or not. "pidof nfsd" works on RHEL7.x but  it fails on my centos8.1 I just created. So either we need to make sure pidof works on kernel threads or fix CNFS scripts.
>
> Regards, Malahal.
>
>
> ----- Original message -----
> From: Bryan Hill <bhill at physics.ucsd.edu>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Cc:
> Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2
> Date: Fri, Feb 14, 2020 11:40 PM
>
> Hi All:
>
> I'm performing a rolling upgrade of one of our GPFS clusters.  This particular cluster has 2 CNFS servers for some of our NFS clients.  I wiped one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2.  The filesystem mounts fine on the node when I disable CNFS on the node, but with it enabled it's a no go.  It appears mmnfsmonitor doesn't recognize that nfsd has started, so it assumes the worst and shuts down the file system (I currently have reboot on failure disabled to debug this).  The thing is, it actually does start nfsd processes when running mmstartup on the node.  Doing a "ps" shows 32 nfsd threads are running.
>
> Below is the CNFS-specific output from an attempt to start the node:
>
> CNFS[27243]: Restarting lockd to start grace
> CNFS[27588]: Enabling 172.16.69.76
> CNFS[27694]: Restarting lockd to start grace
> CNFS[27699]: Starting NFS services
> CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
> CNFS[27910]: Monitor has started pid=27787
> CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it
> CNFS[28705]: Starting NFS services
> CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
> CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it
> CNFS[28758]: Starting NFS services
> CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
> CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it
> CNFS[28816]: Starting NFS services
> CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks
> CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it
> CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server is not running or responding. Node failure initiated as configured.
> CNFS[28924]: Unexporting all GPFS filesystems
>
> Any thoughts?  My other CNFS node is handling everything for the time being, thankfully!
>
> Thanks,
> Bryan
>
> ---
> Bryan Hill
> Lead System Administrator
> UCSD Physics Computing Facility
>
> 9500 Gilman Dr.  # 0319
> La Jolla, CA 92093
> +1-858-534-5538
> bhill at ucsd.edu
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list