<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I’ve also been exploring the mmhealth and gpfsgui for the first time this week.<div class="">I have a test cluster where I’m trying the new stuff.  Running 4.2.2-2</div><div class=""><br class=""></div><div class="">mmhealth cluster show says everyone is in nominal status:</div><div class=""><div class="">Component           Total         Failed       Degraded        Healthy          Other</div><div class="">-------------------------------------------------------------------------------------</div><div class="">NODE                   12              0              0             12              0</div><div class="">GPFS                   12              0              0             12              0</div><div class="">NETWORK                12              0              0             12              0</div><div class="">FILESYSTEM              0              0              0              0              0</div><div class="">DISK                    0              0              0              0              0</div><div class="">GUI                     1              0              0              1              0</div><div class="">PERFMON                12              0              0             12              0</div></div><div class=""><br class=""></div><div class="">However on the GUI there is conflicting information:</div><div class="">1) Home page shows 3/8 NSD Servers unhealthy </div><div class="">2) Home page shows 3/21 Nodes unhealthy</div><div class=""> — where is it getting this notion?  </div><div class=""> — there are only 12 nodes in the whole cluster! </div><div class="">3) clicking on either NSD Servers or Nodes leads to the monitoring page</div><div class="">where the top half spins forever, bottom half is content-free.</div><div class=""><br class=""></div><div class="">I may have installed the pmsensors RPM on a couple of other nodes back in early April,</div><div class="">but have forgotten which ones.  They are in the production cluster.  </div><div class=""><br class=""></div><div class="">Also, the storage in this sandbox cluster has not been turned into a filesystem yet. </div><div class="">There are a few dozen free NSDs.  Perhaps the “FILESYSTEM CHECKING” status is somehow </div><div class="">wedging up the GUI?</div><div class=""><br class=""></div><div class=""><div class="">Node name:      <a href="http://storage005.oscar.ccv.brown.edu" class="">storage005.oscar.ccv.brown.edu</a></div><div class="">Node status:    HEALTHY</div><div class="">Status Change:  15 hours ago</div><div class=""><br class=""></div><div class="">Component      Status        Status Change     Reasons</div><div class="">------------------------------------------------------</div><div class="">GPFS           HEALTHY       16 hours ago      -</div><div class="">NETWORK        HEALTHY       16 hours ago      -</div><div class="">FILESYSTEM     CHECKING      16 hours ago      -</div><div class="">GUI            HEALTHY       15 hours ago      -</div><div class="">PERFMON        HEALTHY       16 hours ago      </div></div><div class=""><br class=""></div><div class="">I’ve tried restarting the GUI service and also rebooted the GUI server, but it comes back looking the same.</div><div class=""><br class=""></div><div class="">Any thoughts?</div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On May 11, 2017, at 7:28 AM, Anna Christina Wagner <<a href="mailto:Anna.Wagner@de.ibm.com" class="">Anna.Wagner@de.ibm.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><font size="2" face="sans-serif" class="">Hello Bob,</font><br class=""><br class=""><font size="2" face="sans-serif" class="">4.2.2 is the release were we introduced

"mmhealth cluster show". And you are totally right, it can be

a little fragile at times.</font><br class=""><br class=""><font size="2" face="sans-serif" class="">So a short explanation: </font><br class=""><font size="2" face="sans-serif" class="">We had this situation on test machines

as well. Because of issues with the system not only the mm-commands but

also usual Linux commands </font><br class=""><font size="2" face="sans-serif" class="">took more than 10 seconds to return.

We have internally a default time out of 10 seconds for cli commands. So

if you had a failover situation, in which the cluster manager </font><br class=""><font size="2" face="sans-serif" class="">was changed (we have our cluster state

manager (CSM) on the cluster manager) and the mmlsmgr command did not return

in 10 seconds the node does not</font><br class=""><font size="2" face="sans-serif" class="">know, that it is the CSM and will not

start the corresponding service for that. </font><br class=""><br class=""><br class=""><font size="2" face="sans-serif" class="">If you want me to look further into

it or if you have feedback regarding mmhealth please feel free to send

me an email (<a href="mailto:Anna.Wagner@de.ibm.com" class="">Anna.Wagner@de.ibm.com</a>)</font><br class=""><br class=""><font size="1" face="Arial" class="">Mit freundlichen Grüßen / Kind regards</font><br class=""><br class=""><font size="2" face="Arial" class=""><b class="">Wagner, Anna Christina</b></font><br class=""><br class=""><font size="1" color="#0060a0" face="Arial" class="">Software Engineer, Spectrum

Scale Development</font><br class=""><font size="1" color="#0060a0" face="Arial" class="">IBM Systems</font><br class=""><br class=""><font size="1" color="#a2a2a2" face="Arial" class="">IBM Deutschland Research &

Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz<br class="">Geschäftsführung: Dirk Wittkopp<br class="">Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,

HRB 243294 </font><br class=""><br class=""><br class=""><br class=""><font size="1" color="#5f5f5f" face="sans-serif" class="">From:      

 </font><font size="1" face="sans-serif" class="">"Oesterlin, Robert"

<<a href="mailto:Robert.Oesterlin@nuance.com" class="">Robert.Oesterlin@nuance.com</a>></font><br class=""><font size="1" color="#5f5f5f" face="sans-serif" class="">To:      

 </font><font size="1" face="sans-serif" class="">gpfsug main discussion

list <<a href="mailto:gpfsug-discuss@spectrumscale.org" class="">gpfsug-discuss@spectrumscale.org</a>></font><br class=""><font size="1" color="#5f5f5f" face="sans-serif" class="">Date:      

 </font><font size="1" face="sans-serif" class="">10.05.2017 18:21</font><br class=""><font size="1" color="#5f5f5f" face="sans-serif" class="">Subject:    

   </font><font size="1" face="sans-serif" class="">Re: [gpfsug-discuss]

"mmhealth cluster show" returns error</font><br class=""><font size="1" color="#5f5f5f" face="sans-serif" class="">Sent by:    

   </font><font size="1" face="sans-serif" class=""><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" class="">gpfsug-discuss-bounces@spectrumscale.org</a></font><br class=""><hr noshade="" class=""><br class=""><br class=""><br class=""><tt class=""><font size="2" class="">Yea, it’s fine. <br class=""><br class="">I did manage to get it to respond after I did a “mmsysmoncontrol restart”

but it’s still not showing proper status across the cluster.<br class=""><br class="">Seems a bit fragile :-) <br class=""><br class="">Bob Oesterlin<br class="">Sr Principal Storage Engineer, Nuance<br class=""> <br class=""> <br class=""><br class="">On 5/10/17, 10:46 AM, "<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" class="">gpfsug-discuss-bounces@spectrumscale.org</a> on

behalf of <a href="mailto:valdis.kletnieks@vt.edu" class="">valdis.kletnieks@vt.edu</a>" <<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" class="">gpfsug-discuss-bounces@spectrumscale.org</a>

on behalf of <a href="mailto:valdis.kletnieks@vt.edu" class="">valdis.kletnieks@vt.edu</a>> wrote:<br class=""><br class="">    On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert"

said:<br class="">    <br class="">    > [root]# mmhealth cluster show<br class="">    > nrg1-gpfs16.<a href="http://nrg1.us.grid.nuance.com" class="">nrg1.us.grid.nuance.com</a>: Could not find

the cluster state manager. It may be in an failover process. Please try

again in a few seconds.<br class="">    <br class="">    Does 'mmlsmgr' return something sane?<br class="">    <br class=""><br class="">_______________________________________________<br class="">gpfsug-discuss mailing list<br class="">gpfsug-discuss at <a href="http://spectrumscale.org" class="">spectrumscale.org</a><br class=""></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class=""><tt class=""><font size="2" class="">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt class=""><font size="2" class=""><br class=""></font></tt><br class=""><br class=""><br class="">_______________________________________________<br class="">gpfsug-discuss mailing list<br class="">gpfsug-discuss at <a href="http://spectrumscale.org" class="">spectrumscale.org</a><br class=""><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" class="">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br class=""></div></blockquote></div><br class=""></div></body></html>