[gpfsug-discuss] mmfsd segfault/signal 6 on dirop.C:4548 in GPFS 5.0.2.x

IBM Spectrum Scale scale at us.ibm.com
Wed Aug 21 18:46:40 BST 2019


We do appreciate the feedback.  Since Spectrum Scale is a cluster based 
solution we do not consider the failure of a single node significant since 
the cluster will adjust to the loss of the node and access to the file 
data is not lost.  It seems in this specific instance this problem was 
having a more significant impact in your environment.  Presumably you have 
installed the available fix and are no longer encountering the problem.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:   Ryan Novosielski <novosirj at rutgers.edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   08/21/2019 01:34 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 on 
dirop.C:4548    in      GPFS    5.0.2.x
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



If there is any means for feedback, I really think that anything that 
causes a crash of mmfsd is absolutely cause to send a notice. Regardless 
of data corruption, it makes the software unusable in production under 
certain circumstances. There was a large customer impact at our site. We 
have a reproducible case if it is useful. One customer workload crashed 
every time, though it took almost a full day to get to that point so you 
can imagine the time wasted.

> On Aug 21, 2019, at 1:20 PM, IBM Spectrum Scale <scale at us.ibm.com> 
wrote:
> 
> To my knowledge there has been no notification sent regarding this 
problem.  Generally we only notify customers about problems that impact 
file system data corruption or data loss.  This problem does cause the 
GPFS instance to abort and restart (assert) but it does not impact file 
system data.  It seems in your case you may have been encountering the 
problem frequently.
> 
> Regards, The Spectrum Scale (GPFS) team
> 
> 
------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum 
Scale (GPFS), then please post it to the public IBM developerWroks Forum 
at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 
> 
> If your query concerns a potential software error in Spectrum Scale 
(GPFS) and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 
> 
> The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.
> 
> 
> 
> From:        Ryan Novosielski <novosirj at rutgers.edu>
> To:        gpfsug main discussion list 
<gpfsug-discuss at spectrumscale.org>
> Date:        08/21/2019 01:14 PM
> Subject:        [EXTERNAL] Re: [gpfsug-discuss] mmfsd segfault/signal 6 
on dirop.C:4548 in        GPFS        5.0.2.x
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> Has there been any official notification of this one? I can’t see 
anything about it anyplace other than in my support ticket.
> 
> --
> ____
> || \\UTGERS, |---------------------------*O*---------------------------
> ||_// the State                  |         Ryan Novosielski - 
novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
Campus
> ||  \\    of NJ                  | Office of Advanced Research Computing 
- MSB C630, Newark
>     `'
> 
> > On Aug 21, 2019, at 1:10 PM, IBM Spectrum Scale <scale at us.ibm.com> 
wrote:
> > 
> > As was noted this problem is fixed in the Spectrum Scale 5.0.3 release 
stream.  Regarding the version number format of 5.0.2.0/1 I assume that it 
is meant to convey version 5.0.2 efix 1.
> > 
> > Regards, The Spectrum Scale (GPFS) team
> > 
> > 
------------------------------------------------------------------------------------------------------------------
> > If you feel that your question can benefit other users of  Spectrum 
Scale (GPFS), then please post it to the public IBM developerWroks Forum 
at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 
> > 
> > If your query concerns a potential software error in Spectrum Scale 
(GPFS) and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 
> > 
> > The forum is informally monitored as time permits and should not be 
used for priority messages to the Spectrum Scale (GPFS) team.
> > 
> > 
> > 
> > From:        Ryan Novosielski <novosirj at rutgers.edu>
> > To:        gpfsug main discussion list 
<gpfsug-discuss at spectrumscale.org>
> > Date:        08/21/2019 12:04 PM
> > Subject:        [EXTERNAL] [gpfsug-discuss] mmfsd segfault/signal 6 on 
dirop.C:4548 in GPFS        5.0.2.x
> > Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> > 
> > 
> > 
> > I posted this on Slack, but it’s serious enough that I want to make 
sure everyone sees it. Does anyone, from IBM or otherwise, have any more 
information about this/whether it was even announced anyplace? Thanks!
> > 
> > A little late, but we ran into a relatively serious problem at our 
site with 5.0.2.3 at our site. The symptom is a mmfsd crash/segfault 
related to fs/dirop.C:4548. We ran into this sporadically, but it was 
repeatable on the problem workload. From IBM Support:
> > 
> > 2. This is a known defect.
> > The problem has been fixed through
> > D.1073563: CTM_A_XW_FOR_DATA_IN_INODE related assert in DirLTE::lock
> > A companion fix is
> > D.1073753: Assert that the lock mode in DirLTE::lock is strong enough
> > 
> > 
> > The rep further said "It's not an APAR since it's found in internal 
testing. It's an internal function at a place it should not assert but a 
part of the condition as the code path is specific to the 
DIR_UPDATE_LOCKMODE optimization code... The assert was meant for certain 
file creation code path, but the condition wasn't set strictly for that 
code path that some other code path could also run into the assert. So we 
cannot predict on which node it would happen.” 
> > 
> > The fix was setting disableAssert="dirop.C:4548, which can be done 
live. Anyone seen anything else about this anyplace? The bug is fixed in 
5.0.3.x and was introduced in 5.0.2.0/1 (not sure what this version number 
means; I’ve seen them listed X.X.X.X.X.X, X.X.X-X.X, and others).
> > 
> > --
> > ____
> > || \\UTGERS, 
|---------------------------*O*---------------------------
> > ||_// the State                  |         Ryan Novosielski - 
novosirj at rutgers.edu
> > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS 
Campus
> > ||  \\    of NJ                  | Office of Advanced Research 
Computing - MSB C630, Newark
> >     `'
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= 

> > 
> > 
> > 
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= 

> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= 

> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= 


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=HCjHUpCQ9fP06jd_TYHBTYqKKqy5-Uz6_whU-Q2N7Sg&s=ZohtBw4iz6ohlaFeZWXuNdHzw59RCEwLBbCHMXRRAkk&e= 






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190821/b5869c6f/attachment-0002.htm>


More information about the gpfsug-discuss mailing list