[gpfsug-discuss] GPFS GA 5.0.0.0: mmces commands with inconsistent output

Mathias Dietz MDIETZ at de.ibm.com
Wed Jan 17 17:08:02 GMT 2018


Hi,

let me start with a recommendation first before I explain how the cluster 
state is build.
Starting with 4.2.1 please use the mmhealth command instead of using the 
mmces state/events command. The mmces state/event command will be 
deprecated in future releases. 
mmhealth node show -> show the node state for all components (incl. CES)
mmhealth node show CES -> shows the CES components only.
mmhealth cluster show -> show the cluster state 

Now to your problem:
The Spectrum Scale health monitoring is done by a daemon which runs on 
each cluster node. 
This daemon is monitoring the state of all Spectrum Scale components on 
the local system and based on the resulting monitoring events it compiles 
a local system state (shown by mmhealth node show).
By having a decentralized monitoring we reduce the monitoring overhead and 
increase resiliency against network glitches.

In order to show a cluster wide state view we have to consolidate the 
events from all cluster nodes on a single node. 
The health monitoring daemon running on the cluster manager is taking the 
role (CSM) to receive events from all nodes through RPC calls and to 
compile the cluster state (shown by mmhealth cluster show)
There can be cases where the (async) event forwarding to the CSM is 
delayed or dropped because of network delays, high system load, cluster 
manager failover or split brain cases.
Those cases should resolve automatically after some time when event is 
resend.

Summary: the cluster state might be temporary out of sync (eventually 
consistent), for getting a current state you should refer to mmhealth node 
show. 

If the problem does not resolve automatically, restarting the monitoring 
daemon will force a re-sync. Please open a PMR for the 5.0 issue too if 
the problem persist.
 

Mit freundlichen Grüßen / Kind regards

Mathias Dietz

Spectrum Scale Development - Release Lead Architect (4.2.x)
Spectrum Scale RAS Architect
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk 
WittkoppSitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
Stuttgart, HRB 243294



From:   "Ernst  Heinz (ID SD)" <heinz.ernst at id.ethz.ch>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   01/16/2018 06:09 PM
Subject:        [gpfsug-discuss] GPFS GA 5.0.0.0: mmces commands with 
inconsistent    output
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hello to all peers and gurus
 
Since more or less two weeks we have gpfs GA 5.0.0.0 running on our 
testenvironment
 
Today I?ve seen following behavior on our SpectrumScale-testcluster which 
slighdly surprised me
 
Following:
Checking status of the cluster on different ways
 
[root at testnas13ces01 idsd_erh_t1]# mmces state cluster
CLUSTER                                             AUTH          BLOCK    
  NETWORK          AUTH_OBJ         NFS                       OBJ   SMB    
            CES
testnas13.ethz.ch                           FAILED        DISABLED HEALTHY 
            DISABLED            DEPEND               DISABLED         
DEPEND               FAILED
 
[root at testnas13ces01 idsd_erh_t1]# mmces state show -a
NODE                                   AUTH                    BLOCK   
NETWORK          AUTH_OBJ         NFS                       OBJ   SMB      
 CES
testnas13ces01-i             HEALTHY             DISABLED HEALTHY  
DISABLED            HEALTHY            DISABLED                   HEALTHY  
         HEALTHY
testnas13ces02-i             HEALTHY             DISABLED HEALTHY  
DISABLED            HEALTHY            DISABLED                   HEALTHY  
         HEALTHY
 
does anyone of you guys has an explanation therefore?
Is there someone else who has seen a behavior like this?
 
By the way we have a similar view on one of our clusters on gpfs 4.2.3.4
(open PMR: 30218.112.848)
 
Any kind of response would be very grateful
 
Kind regards
Heinz
 
 
===============================================================
Heinz Ernst                             ID-Systemdienste
WEC C 16                              Weinbergstrasse 11
CH-8092 Zurich                         heinz.ernst at id.ethz.ch
Phone: +41 44 633 84 48                Mobile: +41 79 216 15 50
===============================================================
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180117/404ab3a0/attachment-0002.htm>


More information about the gpfsug-discuss mailing list