<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Thanks Felipe, and everything you said makes sense and I think
holds true to my experiences concerning different workloads
affecting likelihood of hitting various problems (especially being
one of only a handful of sites that hit that 301 SGpanic error
from several years back). Perhaps language as subtle as "internal
testing revealed" vs "based on reports from customer sites" could
be used? But then again I imagine you could encounter a case where
you discover something in testing that a customer site
subsequently experiences which might limit the usefulness of the
wording. I still think it's useful to know if an issue has been
exacerbated or triggered by in the wild workloads vs what I
imagine to be quite rigorous lab testing perhaps deigned to shake
out certain bugs.<br>
</p>
-Aaron<br>
<br>
<div class="moz-cite-prefix">On 8/23/17 12:40 AM, Felipe Knop wrote:<br>
</div>
<blockquote type="cite"
cite="mid:OF7434C3B9.AA514BAB-ON00258185.00191884-85258185.0019A99E@notes.na.collabserv.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<font face="sans-serif" size="2">Aaron,</font><br>
<br>
<font face="sans-serif" size="2">IBM's policy is to issue a flash
when
such data corruption/loss problem has been identified, even if
the problem
has never been encountered by any customer. In fact, most of the
flashes
have been the result of internal test activity, even though the
discovery
took place after the affected versions/PTFs have already been
released.
This is the case of two of the recent flashes:</font><br>
<br>
<a
href="http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010293"
moz-do-not-send="true"><font face="sans-serif" size="2"
color="blue">http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010293</font></a><br>
<br>
<a
href="http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010487"
moz-do-not-send="true"><font face="sans-serif" size="2"
color="blue">http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010487</font></a><br>
<br>
<font face="sans-serif" size="2">The flashes normally do not
indicate
the risk level that a given problem has of being hit, since
there are just
too many variables at play, given that clusters and workloads
vary significantly.</font><br>
<br>
<font face="sans-serif" size="2">The first issue above appears to
be
uncommon (and potentially rare). The second issue seems to have
a
higher probability of occurring -- and as described in the
flash, the problem
is triggered by failures being encountered while running one of
the commands
listed in the "Users Affected" section of the writeup.</font><br>
<br>
<font face="sans-serif" size="2">I don't think precise
recommendations
could be given on</font><br>
<br>
<font size="3"> if the bugs fall in the category of "drop
everything
and patch *now*" or "this is a theoretically nasty bug but we've
yet to see it in the wild"</font><br>
<br>
<font face="sans-serif" size="2">since different clusters,
configuration,
or workload may drastically affect the the likelihood of hitting
the problem.
On the other hand, when coming up with the text for the flash,
the
team attempts to provide as much information as
possible/available on the
known triggers and mitigation circumstances.</font><br>
<br>
<font face="sans-serif" size="2"> Felipe</font><br>
<br>
<font face="sans-serif" size="2">----<br>
Felipe Knop
<a class="moz-txt-link-abbreviated" href="mailto:knop@us.ibm.com">knop@us.ibm.com</a><br>
GPFS Development and Security<br>
IBM Systems<br>
IBM Building 008<br>
2455 South Rd, Poughkeepsie, NY 12601<br>
(845) 433-9314 T/L 293-9314<br>
<br>
</font><br>
<br>
<br>
<br>
<font face="sans-serif" size="1" color="#5f5f5f">From:
</font><font face="sans-serif" size="1">Aaron Knister
<a class="moz-txt-link-rfc2396E" href="mailto:aaron.knister@gmail.com"><aaron.knister@gmail.com></a></font><br>
<font face="sans-serif" size="1" color="#5f5f5f">To:
</font><font face="sans-serif" size="1">gpfsug main discussion
list <a class="moz-txt-link-rfc2396E" href="mailto:gpfsug-discuss@spectrumscale.org"><gpfsug-discuss@spectrumscale.org></a></font><br>
<font face="sans-serif" size="1" color="#5f5f5f">Date:
</font><font face="sans-serif" size="1">08/22/2017 10:37 AM</font><br>
<font face="sans-serif" size="1" color="#5f5f5f">Subject:
</font><font face="sans-serif" size="1">Re: [gpfsug-discuss]
Again! Using IBM Spectrum Scale could lead to data loss</font><br>
<font face="sans-serif" size="1" color="#5f5f5f">Sent by:
</font><font face="sans-serif" size="1"><a class="moz-txt-link-abbreviated" href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a></font><br>
<hr noshade="noshade"><br>
<br>
<br>
<font size="3">Hi Jochen,</font><br>
<br>
<font size="3">I share your concern about data loss bugs and I too
have
found it troubling especially since the 4.2 stream is in my
immediate future
(although I would have rather stayed on 4.1 due to my perception
of stability/integrity
issues in 4.2). By and large 4.1 has been *extremely* stable for
me.</font><br>
<br>
<font size="3">While not directly related to the stability
concerns,
I'm curious as to why your customer sites are requiring downtime
to do
the upgrades? While, of course, individual servers need to be
taken offline
to update GPFS the collective should be able to stay up. Perhaps
your customer
environments just don't lend themselves to that. </font><br>
<br>
<font size="3">It occurs to me that some of these bugs sound
serious
(and indeed I believe this one is) I recently found myself
jumping prematurely
into an update for the metanode filesize corruption bug that as
it turns
out that while very scary sounding is not necessarily a
particularly common
bug (if I understand correctly). Perhaps it would be helpful if
IBM could
clarify the believed risk of these updates or give us some
indication if
the bugs fall in the category of "drop everything and patch
*now*"
or "this is a theoretically nasty bug but we've yet to see it in
the
wild". I could imagine IBM legal wanting to avoid a situation
where
IBM indicates something is low risk but someone hits it and it
eats data.
Although many companies do this with security patches so perhaps
it's a
non-issue.</font><br>
<br>
<font size="3">From my perspective I don't think existing
customers are
being "forgotten". I think IBM is pushing hard to help Spectrum
Scale adapt to an ever-changing world and I think these features
are necessary
and useful. Perhaps Scale would benefit from more resources
being dedicated
to QA/Testing which isn't a particularly sexy thing-- it doesn't
result
in any new shiny features for customers (although "not eating
your
data" is a feature I find really attractive).</font><br>
<br>
<font size="3">Anyway, I hope IBM can find a way to minimize the
frequency
of these bugs. Personally speaking, I'm pretty convinced, it's
not for
lack of capability or dedication on the part of the great folks
actually
writing the code.</font><br>
<br>
<font size="3">-Aaron</font><br>
<br>
<font size="3">On Tue, Aug 22, 2017 at 7:09 AM, Zeller, Jochen
<</font><a href="mailto:Jochen.Zeller@sva.de" target="_blank"
moz-do-not-send="true"><font size="3" color="blue"><u>Jochen.Zeller@sva.de</u></font></a><font
size="3">>
wrote:</font><br>
<font face="Arial" size="2">Dear community,</font><br>
<font face="Arial" size="2"> </font><br>
<font face="Arial" size="2">this morning I started in a good mood,
until
I’ve checked my mailbox. Again a reported bug in Spectrum Scale
that could
lead to data loss. During the last year I was looking for a
stable Scale
version, and each time I’ve thought: “Yes, this one is stable
and without
serious data loss bugs” - a few day later, IBM announced a new
APAR with
possible data loss for this version. </font><br>
<font face="Arial" size="2"> </font><br>
<font face="Arial" size="2">I am supporting many clients in
central Europe.
They store databases, backup data, life science data, video
data, results
of technical computing, do HPC on the file systems, etc. Some of
them had
to change their Scale version nearly monthly during the last
year to prevent
running in one of the serious data loss bugs in Scale. From my
perspective,
it was and is a shame to inform clients about new reported bugs
right after
the last update. From client perspective, it was and is a lot of
work and
planning to do to get a new downtime for updates. And their
internal customers
are not satisfied with those many downtimes of the clusters and
applications.</font><br>
<font face="Arial" size="2"> </font><br>
<font face="Arial" size="2">For me, it seems that Scale
development is
working on features for a specific project or client, to achieve
special
requirements. But they forgot the existing clients, using Scale
for storing
important data or running important workloads on it.</font><br>
<font face="Arial" size="2"> </font><br>
<font face="Arial" size="2">To make us more visible, I’ve used the
IBM
recommended way to notify about mandatory enhancements, the less
favored
RFE:</font><br>
<font face="Arial" size="2"> </font><br>
<a
href="http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334"
target="_blank" moz-do-not-send="true"><font face="Arial"
size="2" color="#0082bf"><u>http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=109334</u></font></a><br>
<font face="Calibri" size="2"> </font><br>
<font face="Arial" size="2">If you like, vote for more reliability
in
Scale.</font><br>
<font face="Calibri" size="2"> </font><br>
<font face="Arial" size="2">I hope this a good way to show
development
and responsible persons that we have trouble and are not
satisfied with
the quality of the releases.</font><br>
<font face="Calibri" size="2"> </font><br>
<font face="Calibri" size="2"> </font><br>
<font face="Arial" size="2">Regards,</font><br>
<font face="Arial" size="2" color="#8f8f8f"> </font><br>
<font face="Arial" size="2" color="#8f8f8f">Jochen </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font face="Calibri" size="2" color="#8f8f8f"> </font><br>
<font size="3"><br>
_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at </font><a
href="https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=-fp39C0mIHzPe7AhJGIwRCpmdKn0jC1QYEyM2DzYFZQ&e="
target="_blank" moz-do-not-send="true"><font size="3"
color="blue"><u>spectrumscale.org</u></font></a><font size="3"
color="blue"><u><br>
</u></font><a
href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e="
target="_blank" moz-do-not-send="true"><font size="3"
color="blue"><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></a><font
size="3"><br>
</font><br>
<tt><font size="2">_______________________________________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at spectrumscale.org<br>
</font></tt><a
href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e="
moz-do-not-send="true"><tt><font size="2">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=Nh-z-CGPni6b-k9jTdJfWNw6-jtvc8OJgjogfIyp498&s=Vsf2AaMf7b7F6Qv3lGZ9-xBciF9gdfuqnb206aVG-Go&e=</font></tt></a><tt><font
size="2"><br>
</font></tt><br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
<a class="moz-txt-link-freetext" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>
</pre>
</blockquote>
<br>
</body>
</html>