[gpfsug-discuss] Node crashes after upgrading to V5.1.2+ with msgqueue enabled

Wahl, Edward ewahl at osc.edu
Thu Oct 13 17:35:07 BST 2022


Silly cut/paste error.  APAR is IJ40726

Ed Wahl
Ohio Supercomputer Center


From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> On Behalf Of Wahl, Edward
Sent: Thursday, October 13, 2022 12:28 PM
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Node crashes after upgrading to V5.1.2+ with msgqueue enabled

I’ll just toss in here that efixes for this issue are available. As we’re in the tail end of a large data transition and cannot go to LATEST yet, we’re running some efixes for this and they do work. However mmhealth will still report it broken

I’ll just toss in here that efixes for this issue are available.  As we’re in the tail end of a large data transition and cannot go to LATEST yet, we’re running some efixes for this and they do work.  However mmhealth will still report it broken until we can delete the msgqueue.  On the positive side, the crashes went away.

I don’t recall the 5.1.2.6 efix #s, but 5.1.3.1. is efix26 for ppcle and 5.1.3.1 efix 27 for x86_64

Ed Wahl
Ohio Supercomputer Center


From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>> On Behalf Of Luke Sudbery
Sent: Thursday, October 13, 2022 6:19 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Node crashes after upgrading to V5.1.2+ with msgqueue enabled

IBM Spectrum Scale : Node crashes after upgrading to V5. 1. 2+ with msgqueue enabled https: //www. ibm. com/support/pages/node/6824149?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E says: ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

IBM Spectrum Scale : Node crashes after upgrading to V5.1.2+ with msgqueue enabled
https://www.ibm.com/support/pages/node/6824149?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E<https://urldefense.com/v3/__https:/www.ibm.com/support/pages/node/6824149?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E__;!!KGKeukY!39sbejpLUCp6er4BjDpjPh0bsTcoitFTC1frKvaULMLFEJk3_YlHVGUCYZ5c9mjAq_QLpUVUgnq5KTrM3Q3Ebw$>

says:
A node crash can occur due to resource exhaustion after upgrading to V5.1.2.0 or higher with msgqueue enabled. This affects all upgraded nodes until the filesystem version is moved to V5.1.2.0+ (mmchfs -V) and the migration off msgqueue occurs (mmmsgqueue config --remove-msgqueue).

My emphasis on and. But also says:

This issue will affect all versions mentioned above until msgqueue is disabled across the cluster.

So it's not entirely clear whether you have to just update client version, update the filesystem version or disable msgqueue or do all 3 to fix it...

Can an IBMer clarify?

Many thanks,

Luke
--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don’t work on Monday.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20221013/5f9a01ee/attachment-0002.htm>


More information about the gpfsug-discuss mailing list