[gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

IBM Spectrum Scale scale at us.ibm.com
Fri May 25 08:01:43 BST 2018


If you didn't run mmchconfig release=LATEST and didn't change the fs
version,  then you can downgrade either or both of them. Thanks.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:	valleru at cbio.mskcc.org
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	05/22/2018 11:54 PM
Subject:	[gpfsug-discuss] Critical Hang issues with GPFS 5.0.
            Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



Hello All,

We have recently upgraded from GPFS 4.2.3.2 to GPFS 5.0.0-2 about a month
ago. We have not yet converted the 4.2.2.2 filesystem version to 5. ( That
is we have not run the mmchconfig release=LATEST command)
Right after the upgrade, we are seeing many “ps hangs" across the cluster.
All the “ps hangs” happen when jobs run related to a Java process or many
Java threads (example: GATK )
The hangs are pretty random, and have no particular pattern except that we
know that it is related to just Java or some jobs reading from directories
with about 600000 files.

I have raised an IBM critical service request about a month ago related to
this - PMR: 24090,L6Q,000.
However, According to the ticket  - they seemed to feel that it might not
be related to GPFS.
Although, we are sure that these hangs started to appear only after we
upgraded GPFS to GPFS 5.0.0.2 from 4.2.3.2.

One of the other reasons we are not able to prove that it is GPFS is
because, we are unable to capture any logs/traces from GPFS once the hang
happens.
Even GPFS trace commands hang, once “ps hangs” and thus it is getting
difficult to get any dumps from GPFS.

Also  - According to the IBM ticket, they seemed to have a seen a “ps hang"
issue and we have to run  mmchconfig release=LATEST command, and that will
resolve the issue.
However we are not comfortable making the permanent change to Filesystem
version 5. and since we don’t see any near solution to these hangs - we are
thinking of downgrading to GPFS 4.2.3.2 or the previous state that we know
the cluster was stable.

Can downgrading GPFS take us back to exactly the previous GPFS config
state?
With respect to downgrading from 5 to 4.2.3.2 -> is it just that i
reinstall all rpms to a previous version? or is there anything else that i
need to make sure with respect to GPFS configuration?
Because i think that GPFS 5.0 might have updated internal default GPFS
configuration parameters , and i am not sure if downgrading GPFS will
change them back to what they were in GPFS 4.2.3.2

Our previous state:

2 Storage clusters - 4.2.3.2
1 Compute cluster - 4.2.3.2  ( remote mounts the above 2 storage clusters )

Our current state:

2 Storage clusters - 5.0.0.2 ( filesystem version - 4.2.2.2)
1 Compute cluster - 5.0.0.2

Do i need to downgrade all the clusters to go to the previous state ? or is
it ok if we just downgrade the compute cluster to previous version?

Any advice on the best steps forward, would greatly help.

Thanks,

Lohit_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180525/7d74f496/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180525/7d74f496/attachment-0002.gif>


More information about the gpfsug-discuss mailing list