<html><body bgcolor="#FFFFFF"><p><font size="2">Hi Kevin,</font><br><br><font size="2">Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore?</font><br><br><font size="2">I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again.</font><br><br><font size="2">Such as:</font><p><font size="2" face="Times New Roman">1. Login to a node which has proper GPFS config, e.g NodeA<br>2. Shutdown daemon in all client cluster.<br>3. mmchcluster --ccr-disable -p NodeA<br>4. mmsdrrestore -a -p NodeA<br>5. mmauth genkey propagate -N </font><font face="Calibri">testnsd1</font><font size="2" face="Times New Roman">, </font><font face="Calibri">testnsd3</font><font size="2" face="Times New Roman"><br>6. mmchcluster --ccr-enable</font><br><br><font size="2">Regards, The Spectrum Scale (GPFS) team<br><br>------------------------------------------------------------------------------------------------------------------<br>If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at <a href="https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479">https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479</a>. <br><br>If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries. <br><br>The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.</font><br><br><img width="16" height="16" src="cid:1__=C7BB0B32DF9D41B48f9e8a93df938690918cC7B@" border="0" alt="Inactive hide details for "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK – I’ve run across this before, and it’s because"><font size="2" color="#424282">"Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK – I’ve run across this before, and it’s because of a bug (as I recall) having to do with CCR and</font><br><br><font size="2" color="#5F5F5F">From:        </font><font size="2">"Oesterlin, Robert" <Robert.Oesterlin@nuance.com></font><br><font size="2" color="#5F5F5F">To:        </font><font size="2">gpfsug main discussion list <gpfsug-discuss@spectrumscale.org></font><br><font size="2" color="#5F5F5F">Date:        </font><font size="2">09/20/2017 07:39 AM</font><br><font size="2" color="#5F5F5F">Subject:        </font><font size="2">Re: [gpfsug-discuss] CCR cluster down for the count?</font><br><font size="2" color="#5F5F5F">Sent by:        </font><font size="2">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><font face="Calibri">OK – I’ve run across this before, and it’s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster –ccr-disable) with all the nodes down, bring it back up and then re-enable ccr.</font><br><font face="Calibri"> </font><br><font face="Calibri">I’ll see if I can find this in one of the recent 4.2 release nodes.</font><br><font face="Calibri"> </font><br><font face="Calibri"> </font><br><font face="Calibri">Bob Oesterlin</font><br><font face="Calibri">Sr Principal Storage Engineer, Nuance</font><br><font face="Calibri"> </font><br><font face="Calibri"> </font><br><b><font face="Calibri">From: </font></b><font face="Calibri"><gpfsug-discuss-bounces@spectrumscale.org> on behalf of "Buterbaugh, Kevin L" <Kevin.Buterbaugh@Vanderbilt.Edu></font><b><font face="Calibri"><br>Reply-To: </font></b><font face="Calibri">gpfsug main discussion list <gpfsug-discuss@spectrumscale.org></font><b><font face="Calibri"><br>Date: </font></b><font face="Calibri">Tuesday, September 19, 2017 at 4:03 PM</font><b><font face="Calibri"><br>To: </font></b><font face="Calibri">gpfsug main discussion list <gpfsug-discuss@spectrumscale.org></font><b><font face="Calibri"><br>Subject: </font></b><font face="Calibri">[EXTERNAL] [gpfsug-discuss] CCR cluster down for the count?</font><br><font face="Calibri"> </font><br><font face="Calibri">Hi All, </font><br><font face="Calibri"> </font><br><font face="Calibri">We have a small test cluster that is CCR enabled.  It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients.  testnsd3 died a while back.  I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects.</font><br><font face="Calibri"> </font><br><font face="Calibri">Yesterday, testnsd1 also died, which took the whole cluster down.  So now resolving this has become higher priority… ;-)</font><br><font face="Calibri"> </font><br><font face="Calibri">I took two other boxes and set them up as testnsd1 and 3, respectively.  I’ve done a “mmsdrrestore -p testnsd2 -R /usr/bin/scp” on both of them.  I’ve also done a "mmccr setup -F” and copied the ccr.disks and ccr.nodes files from testnsd2 to them.  And I’ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3.  In case it’s not obvious from the above, networking is fine … ssh without a password between those 3 boxes is fine.</font><br><font face="Calibri"> </font><br><font face="Calibri">However, when I try to startup GPFS … or run any GPFS command I get:</font><br><font face="Calibri"> </font><br><font face="Calibri">/root</font><br><font face="Calibri">root@testnsd2# mmstartup -a</font><br><font face="Calibri">get file failed: Not enough CCR quorum nodes available (err 809)</font><br><font face="Calibri">gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158</font><br><font face="Calibri">mmstartup: Command failed. Examine previous error messages to determine cause.</font><br><font face="Calibri">/root</font><br><font face="Calibri">root@testnsd2#</font><br><font face="Calibri"> </font><br><font face="Calibri">I’ve got to run to a meeting right now, so I hope I’m not leaving out any crucial details here … does anyone have an idea what I need to do?  Thanks…</font><br><font face="Calibri"> </font><br><font face="Calibri">—</font><br><font face="Calibri">Kevin Buterbaugh - Senior System Administrator</font><br><font face="Calibri">Vanderbilt University - Advanced Computing Center for Research and Education</font><br><a href="mailto:Kevin.Buterbaugh@vanderbilt.edu"><u><font color="#0000FF" face="Calibri">Kevin.Buterbaugh@vanderbilt.edu</font></u></a><font face="Calibri"> - (615)875-9633</font><br><font face="Calibri"> </font><br><font face="Calibri"> </font><br><font face="Calibri"> </font><tt><font size="2">_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><tt><font size="2"><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e=</a></font></tt><tt><font size="2"> <br></font></tt><br><br><BR>


</body></html>