[gpfsug-discuss] Problem with mmlscluster and callback scripts
Aaron Knister
aaron.s.knister at nasa.gov
Fri Sep 7 14:35:24 BST 2018
Hi Matthias,
Looks like you lost quorum in the cluster (you've got to have (n/2+1)
quorum nodes up if you're using node-based quorum). Do you have a
tiebreaker disk defined? (i.e. mmlsconfig tiebreakerdisk).
-Aaron
On 9/7/18 7:51 AM, Matthias Knigge wrote:
> Hello together,
>
> I am using the version 5.0.2.0 of GPFS and have problems with the
> command mmlscluster and callback-scripts. It is a small cluster of two
> nodes only. If I shutdown one of the nodes sometimes mmlscluster reports
> the following output:
>
> [root at gpfs-tier1 gpfs5.2]# mmgetstate
>
> Node number Node name GPFS state
>
> -------------------------------------------
>
> 1 gpfs-tier1 arbitrating
>
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
>
> ssh: connect to host gpfs-tier2 port 22: No route to host
>
> mmlscluster: Unable to retrieve GPFS cluster files from node gpfs-tier2
>
> mmlscluster: Command failed. Examine previous error messages to
> determine cause.
>
> Normally the output is like this:
>
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
>
> GPFS cluster information
>
> ========================
>
> GPFS cluster name: TIERCLUSTER.gpfs-tier1
>
> GPFS cluster id: 12458173498278694815
>
> GPFS UID domain: TIERCLUSTER.gpfs-tier1
>
> Remote shell command: /usr/bin/ssh
>
> Remote file copy command: /usr/bin/scp
>
> Repository type: server-based
>
> GPFS cluster configuration servers:
>
> -----------------------------------
>
> Primary server: gpfs-tier2
>
> Secondary server: gpfs-tier1
>
> Node Daemon node name IP address Admin node name Designation
>
> ----------------------------------------------------------------------
>
> 1 gpfs-tier1 192.168.178.10 gpfs-tier1 quorum-manager
>
> 2 gpfs-tier2 192.168.178.11 gpfs-tier2 quorum-manager
>
> [root at gpfs-tier1 gpfs5.2]# mmlscallback
>
> NodeDownCallback
>
> command = /var/mmfs/rs/nodedown.ksh
>
> priority = 1
>
> event = quorumNodeLeave
>
> parms = %eventNode %quorumNodes
>
> NodeUpCallback
>
> command = /var/mmfs/rs/nodeup.ksh
>
> priority = 1
>
> event = quorumNodeJoin
>
> parms = %eventNode %quorumNodes
>
> If I shutdown the filesystem via mmshutdown the callback script works
> but if I shutdown the whole node the scripts does not run.
>
> The latest log-entry in mmfs.log.latest shows only this information:
>
> 2018-09-07_13:12:36.724+0200: [I] Cluster Manager connection broke.
> Probing cluster TIERCLUSTER.gpfs-tier1
>
> 2018-09-07_13:12:37.226+0200: [E] Unable to contact enough other quorum
> nodes during cluster probe.
>
> 2018-09-07_13:12:37.226+0200: [E] Lost membership in cluster
> TIERCLUSTER.gpfs-tier1. Unmounting file systems.
>
> 2018-09-07_13:12:38.448+0200: [N] Connecting to 192.168.178.11
> gpfs-tier2 <c0p1>
>
> Could anybody help me in this case? I want to try to start a script if
> one node goes down or up to change the roles for starting the
> filesystem. The callback event NodeLeave and NodeJoin do not run too.
>
> Any more information required? If yes, please let me know!
>
> Many thanks in advance and a nice weekend!
>
> Matthias
>
> Best Regards
>
> Matthias Knigge
> R&D File Based Media Solutions
>
> Rohde & Schwarz
> GmbH & Co. KG
> Hanomaghof 1
> 30449 Hannover
> Telefon +49 511 67 80 7 213
> Fax +49 511 37 19 74
> Internet: Matthias.Knigge at rohde-schwarz.com
> ------------------------------------------------------------
> Geschäftsführung / Executive Board: Christian Leicher (Vorsitzender /
> Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of
> Business: München, Registereintrag / Commercial Register No.: HRA 16
> 270, Persönlich haftender Gesellschafter / Personally Liable Partner:
> RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of
> Business: München, Registereintrag / Commercial Register No.: HRB 7 534,
> Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.:
> DE 130 256 683, Elektro-Altgeräte Register (EAR) / WEEE Register No.: DE
> 240 437 86
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list