<font size=2 face="sans-serif">Aaron,</font><br><br><font size=2 face="sans-serif">IBM's policy is to issue a flash when
such data corruption/loss problem has been identified, even if the problem
has never been encountered by any customer. In fact, most of the flashes
have been the result of internal test activity, even though the discovery
took place after the affected versions/PTFs have already been released.
This is the case of two of the recent flashes:</font><br><br><a href="http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010293"><font size=2 color=blue face="sans-serif">http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010293</font></a><br><br><a href="http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010487"><font size=2 color=blue face="sans-serif">http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010487</font></a><br><br><font size=2 face="sans-serif">The flashes normally do not indicate
the risk level that a given problem has of being hit, since there are just
too many variables at play, given that clusters and workloads vary significantly.</font><br><br><font size=2 face="sans-serif">The first issue above appears to be
uncommon (and potentially rare). The second issue seems to have a
higher probability of occurring -- and as described in the flash, the problem
is triggered by failures being encountered while running one of the commands
listed in the "Users Affected" section of the writeup.</font><br><br><font size=2 face="sans-serif">I don't think precise recommendations
could be given on</font><br><br><font size=3> if the bugs fall in the category of "drop everything
and patch *now*" or "this is a theoretically nasty bug but we've
yet to see it in the wild"</font><br><br><font size=2 face="sans-serif">since different clusters, configuration,
or workload may drastically affect the the likelihood of hitting the problem.
On the other hand, when coming up with the text for the flash, the
team attempts to provide as much information as possible/available on the
known triggers and mitigation circumstances.</font><br><br><font size=2 face="sans-serif"> Felipe</font><br><br><font size=2 face="sans-serif">----<br>Felipe Knop
knop@us.ibm.com<br>GPFS Development and Security<br>IBM Systems<br>IBM Building 008<br>2455 South Rd, Poughkeepsie, NY 12601<br>(845) 433-9314 T/L 293-9314<br><br></font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Aaron Knister <aaron.knister@gmail.com></font><br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">gpfsug main discussion
list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">08/22/2017 10:37 AM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: [gpfsug-discuss]
Again! Using IBM Spectrum Scale could lead to data loss</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=3>Hi Jochen,</font><br><br><font size=3>I share your concern about data loss bugs and I too have
found it troubling especially since the 4.2 stream is in my immediate future
(although I would have rather stayed on 4.1 due to my perception of stability/integrity
issues in 4.2). By and large 4.1 has been *extremely* stable for me.</font><br><br><font size=3>While not directly related to the stability concerns,
I'm curious as to why your customer sites are requiring downtime to do
the upgrades? While, of course, individual servers need to be taken offline
to update GPFS the collective should be able to stay up. Perhaps your customer
environments just don't lend themselves to that. </font><br><br><font size=3>It occurs to me that some of these bugs sound serious
(and indeed I believe this one is) I recently found myself jumping prematurely
into an update for the metanode filesize corruption bug that as it turns
out that while very scary sounding is not necessarily a particularly common
bug (if I understand correctly). Perhaps it would be helpful if IBM could
clarify the believed risk of these updates or give us some indication if
the bugs fall in the category of "drop everything and patch *now*"
or "this is a theoretically nasty bug but we've yet to see it in the
wild". I could imagine IBM legal wanting to avoid a situation where
IBM indicates something is low risk but someone hits it and it eats data.
Although many companies do this with security patches so perhaps it's a
non-issue.</font><br><br><font size=3>From my perspective I don't think existing customers are
being "forgotten". I think IBM is pushing hard to help Spectrum
Scale adapt to an ever-changing world and I think these features are necessary
and useful. Perhaps Scale would benefit from more resources being dedicated
to QA/Testing which isn't a particularly sexy thing-- it doesn't result
in any new shiny features for customers (although "not eating your
data" is a feature I find really attractive).</font><br><br><font size=3>Anyway, I hope IBM can find a way to minimize the frequency
of these bugs. Personally speaking, I'm pretty convinced, it's not for
lack of capability or dedication on the part of the great folks actually
writing the code.</font><br><br><font size=3>-Aaron</font><br><br><font size=3>On Tue, Aug 22, 2017 at 7:09 AM, Zeller, Jochen <</font><a href=mailto:Jochen.Zeller@sva.de target=_blank><font size=3 color=blue><u>Jochen.Zeller@sva.de</u></font></a><font size=3>>
wrote:</font><br><font size=2 face="Arial">Dear community,</font><br><font size=2 face="Arial"> </font><br><font size=2 face="Arial">this morning I started in a good mood, until
I’ve checked my mailbox. Again a reported bug in Spectrum Scale that could
lead to data loss. During the last year I was looking for a stable Scale
version, and each time I’ve thought: “Yes, this one is stable and without
serious data loss bugs” - a few day later, IBM announced a new APAR with
possible data loss for this version. </font><br><font size=2 face="Arial"> </font><br><font size=2 face="Arial">I am supporting many clients in central Europe.
They store databases, backup data, life science data, video data, results
of technical computing, do HPC on the file systems, etc. Some of them had
to change their Scale version nearly monthly during the last year to prevent
running in one of the serious data loss bugs in Scale. From my perspective,
it was and is a shame to inform clients about new reported bugs right after
the last update. From client perspective, it was and is a lot of work and
planning to do to get a new downtime for updates. And their internal customers
are not satisfied with those many downtimes of the clusters and applications.</font><br><font size=2 face="Arial"> </font><br><font size=2 face="Arial">For me, it seems that Scale development is
working on features for a specific project or client, to achieve special
requirements. But they forgot the existing clients, using Scale for storing
important data or running important workloads on it.</font><br><font size=2 face="Arial"> </font><br><font size=2 face="Arial">To make us more visible, I’ve used the IBM
recommended way to notify about mandatory enhancements, the less favored
RFE:</font><br><font size=2 face="Arial"> </font><br><a href="http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334" target=_blank><font size=2 color=#0082bf face="Arial"><u>http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334</u></font></a><br><font size=2 face="Calibri"> </font><br><font size=2 face="Arial">If you like, vote for more reliability in
Scale.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Arial">I hope this a good way to show development
and responsible persons that we have trouble and are not satisfied with
the quality of the releases.</font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Calibri"> </font><br><font size=2 face="Arial">Regards,</font><br><font size=2 color=#8f8f8f face="Arial"> </font><br><font size=2 color=#8f8f8f face="Arial">Jochen </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=2 color=#8f8f8f face="Calibri"> </font><br><font size=3><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at </font><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=-fp39C0mIHzPe7AhJGIwRCpmdKn0jC1QYEyM2DzYFZQ&e=" target=_blank><font size=3 color=blue><u>spectrumscale.org</u></font></a><font size=3 color=blue><u><br></u></font><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e=" target=_blank><font size=3 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></a><font size=3><br></font><br><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e="><tt><font size=2>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e=</font></tt></a><tt><font size=2><br></font></tt><br><BR>