[gpfsug-discuss] GPFS GA 5.0.0.0: mmces commands with inconsistent output
Mathias Dietz
MDIETZ at de.ibm.com
Wed Jan 17 17:08:02 GMT 2018
Hi,
let me start with a recommendation first before I explain how the cluster
state is build.
Starting with 4.2.1 please use the mmhealth command instead of using the
mmces state/events command. The mmces state/event command will be
deprecated in future releases.
mmhealth node show -> show the node state for all components (incl. CES)
mmhealth node show CES -> shows the CES components only.
mmhealth cluster show -> show the cluster state
Now to your problem:
The Spectrum Scale health monitoring is done by a daemon which runs on
each cluster node.
This daemon is monitoring the state of all Spectrum Scale components on
the local system and based on the resulting monitoring events it compiles
a local system state (shown by mmhealth node show).
By having a decentralized monitoring we reduce the monitoring overhead and
increase resiliency against network glitches.
In order to show a cluster wide state view we have to consolidate the
events from all cluster nodes on a single node.
The health monitoring daemon running on the cluster manager is taking the
role (CSM) to receive events from all nodes through RPC calls and to
compile the cluster state (shown by mmhealth cluster show)
There can be cases where the (async) event forwarding to the CSM is
delayed or dropped because of network delays, high system load, cluster
manager failover or split brain cases.
Those cases should resolve automatically after some time when event is
resend.
Summary: the cluster state might be temporary out of sync (eventually
consistent), for getting a current state you should refer to mmhealth node
show.
If the problem does not resolve automatically, restarting the monitoring
daemon will force a re-sync. Please open a PMR for the 5.0 issue too if
the problem persist.
Mit freundlichen Grüßen / Kind regards
Mathias Dietz
Spectrum Scale Development - Release Lead Architect (4.2.x)
Spectrum Scale RAS Architect
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk
WittkoppSitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht
Stuttgart, HRB 243294
From: "Ernst Heinz (ID SD)" <heinz.ernst at id.ethz.ch>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 01/16/2018 06:09 PM
Subject: [gpfsug-discuss] GPFS GA 5.0.0.0: mmces commands with
inconsistent output
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Hello to all peers and gurus
Since more or less two weeks we have gpfs GA 5.0.0.0 running on our
testenvironment
Today I?ve seen following behavior on our SpectrumScale-testcluster which
slighdly surprised me
Following:
Checking status of the cluster on different ways
[root at testnas13ces01 idsd_erh_t1]# mmces state cluster
CLUSTER AUTH BLOCK
NETWORK AUTH_OBJ NFS OBJ SMB
CES
testnas13.ethz.ch FAILED DISABLED HEALTHY
DISABLED DEPEND DISABLED
DEPEND FAILED
[root at testnas13ces01 idsd_erh_t1]# mmces state show -a
NODE AUTH BLOCK
NETWORK AUTH_OBJ NFS OBJ SMB
CES
testnas13ces01-i HEALTHY DISABLED HEALTHY
DISABLED HEALTHY DISABLED HEALTHY
HEALTHY
testnas13ces02-i HEALTHY DISABLED HEALTHY
DISABLED HEALTHY DISABLED HEALTHY
HEALTHY
does anyone of you guys has an explanation therefore?
Is there someone else who has seen a behavior like this?
By the way we have a similar view on one of our clusters on gpfs 4.2.3.4
(open PMR: 30218.112.848)
Any kind of response would be very grateful
Kind regards
Heinz
===============================================================
Heinz Ernst ID-Systemdienste
WEC C 16 Weinbergstrasse 11
CH-8092 Zurich heinz.ernst at id.ethz.ch
Phone: +41 44 633 84 48 Mobile: +41 79 216 15 50
===============================================================
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180117/404ab3a0/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list