[gpfsug-discuss] Problem with mmlscluster and callback scripts

Aaron Knister aaron.s.knister at nasa.gov
Fri Sep 7 14:35:24 BST 2018


Hi Matthias,

Looks like you lost quorum in the cluster (you've got to have (n/2+1) 
quorum nodes up if you're using node-based quorum). Do you have a 
tiebreaker disk defined? (i.e. mmlsconfig tiebreakerdisk).

-Aaron

On 9/7/18 7:51 AM, Matthias Knigge wrote:
> Hello together,
> 
> I am using the version 5.0.2.0 of GPFS and have problems with the 
> command mmlscluster and callback-scripts. It is a small cluster of two 
> nodes only. If I shutdown one of the nodes sometimes mmlscluster reports 
> the following output:
> 
> [root at gpfs-tier1 gpfs5.2]# mmgetstate
> 
> Node number  Node name        GPFS state
> 
> -------------------------------------------
> 
>         1      gpfs-tier1       arbitrating
> 
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
> 
> ssh: connect to host gpfs-tier2 port 22: No route to host
> 
> mmlscluster: Unable to retrieve GPFS cluster files from node gpfs-tier2
> 
> mmlscluster: Command failed. Examine previous error messages to 
> determine cause.
> 
> Normally the output is like this:
> 
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
> 
> GPFS cluster information
> 
> ========================
> 
>    GPFS cluster name:         TIERCLUSTER.gpfs-tier1
> 
>    GPFS cluster id:           12458173498278694815
> 
>    GPFS UID domain:           TIERCLUSTER.gpfs-tier1
> 
>    Remote shell command:      /usr/bin/ssh
> 
>    Remote file copy command:  /usr/bin/scp
> 
>    Repository type:           server-based
> 
> GPFS cluster configuration servers:
> 
> -----------------------------------
> 
>    Primary server:    gpfs-tier2
> 
>    Secondary server:  gpfs-tier1
> 
> Node  Daemon node name  IP address      Admin node name  Designation
> 
> ----------------------------------------------------------------------
> 
>     1   gpfs-tier1        192.168.178.10  gpfs-tier1       quorum-manager
> 
>     2   gpfs-tier2        192.168.178.11  gpfs-tier2       quorum-manager
> 
> [root at gpfs-tier1 gpfs5.2]# mmlscallback
> 
> NodeDownCallback
> 
>          command       = /var/mmfs/rs/nodedown.ksh
> 
>          priority      = 1
> 
>          event         = quorumNodeLeave
> 
>          parms         = %eventNode %quorumNodes
> 
> NodeUpCallback
> 
>          command       = /var/mmfs/rs/nodeup.ksh
> 
>          priority      = 1
> 
>          event         = quorumNodeJoin
> 
>          parms         = %eventNode %quorumNodes
> 
> If I shutdown the filesystem via mmshutdown the callback script works 
> but if I shutdown the whole node the scripts does not run.
> 
> The latest log-entry in mmfs.log.latest shows only this information:
> 
> 2018-09-07_13:12:36.724+0200: [I] Cluster Manager connection broke. 
> Probing cluster TIERCLUSTER.gpfs-tier1
> 
> 2018-09-07_13:12:37.226+0200: [E] Unable to contact enough other quorum 
> nodes during cluster probe.
> 
> 2018-09-07_13:12:37.226+0200: [E] Lost membership in cluster 
> TIERCLUSTER.gpfs-tier1. Unmounting file systems.
> 
> 2018-09-07_13:12:38.448+0200: [N] Connecting to 192.168.178.11 
> gpfs-tier2 <c0p1>
> 
> Could anybody help me in this case? I want to try to start a script if 
> one node goes down or up to change the roles for starting the 
> filesystem. The callback event NodeLeave and NodeJoin do not run too.
> 
> Any more information required? If yes, please let me know!
> 
> Many thanks in advance and a nice weekend!
> 
> Matthias
> 
> Best Regards
> 
> Matthias Knigge
> R&D File Based Media Solutions
> 
> Rohde & Schwarz
> GmbH & Co. KG
> Hanomaghof 1
> 30449 Hannover
> Telefon +49 511 67 80 7 213
> Fax +49 511 37 19 74
> Internet: Matthias.Knigge at rohde-schwarz.com
> ------------------------------------------------------------
> Geschäftsführung / Executive Board: Christian Leicher (Vorsitzender / 
> Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of 
> Business: München, Registereintrag / Commercial Register No.: HRA 16 
> 270, Persönlich haftender Gesellschafter / Personally Liable Partner: 
> RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of 
> Business: München, Registereintrag / Commercial Register No.: HRB 7 534, 
> Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.: 
> DE 130 256 683, Elektro-Altgeräte Register (EAR) / WEEE Register No.: DE 
> 240 437 86
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list