[gpfsug-discuss] Fwd: FLASH: IBM Spectrum Scale (GPFS): RDMA-enabled network adapter failure on the NSD server may result in file IO error (2017.06.30)

Sven Oehme oehmes at gmail.com
Fri Jun 30 19:25:28 BST 2017


end-to-end data integrity is very important and the reason it hasn't been
done in Scale is not because its not important, its because its very hard
to do without impacting performance in a very dramatic way.

imagine your raid controller blocksize is 1mb and your filesystem blocksize
is 1MB . if your application does a 1 MB write this ends up being a perfect
full block , full track de-stage to your raid layer and everything works
fine and fast. as soon as you add checksum support you need to add data
somehow into this, means your 1MB is no longer 1 MB but 1 MB+checksum.

to store this additional data you have multiple options, inline , outside
the data block or some combination ,the net is either you need to do more
physical i/o's to different places to get both the data and the
corresponding checksum or your per block on disc structure becomes bigger
than than what your application reads/or writes, both put massive burden on
the Storage layer as e.g. a 1 MB write will now, even the blocks are all
aligned from the application down to the raid layer, cause a
read/modify/write on the raid layer as the data is bigger than the physical
track size.

so to get end-to-end checksum in Scale outside of ESS the best way is to
get GNR as SW to run on generic HW, this is what people should vote for as
RFE if they need that functionality. beside end-to-end checksums you get
read/write cache and acceleration , fast rebuild and many other goodies as
a added bonus.

Sven

On Fri, Jun 30, 2017 at 10:53 AM Aaron Knister <aaron.s.knister at nasa.gov>
wrote:

> In fact the answer was quite literally "no":
>
> https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=84523
> (the RFE was declined and the answer was that the "function is already
> available in GNR environments").
>
> Regarding GNR, see this RFE request
> https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=95090
> requesting the use of GNR outside of an ESS/GSS environment. It's
> interesting to note this is the highest voted Public RFE for GPFS that I
> can see, at least. It too was declined.
>
> -Aaron
>
> On 6/30/17 1:41 PM, Aaron Knister wrote:
> > Thanks Olaf, that's good to know (and is kind of what I suspected). I've
> > requested a number of times this capability for those of us who can't
> > use or aren't using GNR and the answer is effectively "no". This
> > response is curious to me because I'm sure IBM doesn't believe that data
> > integrity is only important and of value to customers who purchase their
> > hardware *and* software.
> >
> > -Aaron
> >
> > On Fri, Jun 30, 2017 at 1:37 PM, Olaf Weiser <olaf.weiser at de.ibm.com
> > <mailto:olaf.weiser at de.ibm.com>> wrote:
> >
> >     yes.. in case of GNR (GPFS native raid) .. we do end-to-end
> >     check-summing ...  client --> server --> downToDisk
> >     GNR writes down a chksum to disk (to all pdisks /all "raid" segments
> >     )  so that dropped writes can be detected as well as miss-done
> >     writes (bit flips..)
> >
> >
> >
> >     From: Aaron Knister <aaron.s.knister at nasa.gov
> >     <mailto:aaron.s.knister at nasa.gov>>
> >     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
> >     <mailto:gpfsug-discuss at spectrumscale.org>>
> >     Date: 06/30/2017 07:15 PM
> >     Subject: [gpfsug-discuss] Fwd: FLASH: IBM Spectrum Scale (GPFS):
> >     RDMA-enabled network adapter failure on the NSD server may result in
> >     file IO error (2017.06.30)
> >     Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> >
>  ------------------------------------------------------------------------
> >
> >
> >
> >     I'm curious to know why this doesn't affect GSS/ESS? Is it a feature
> of
> >     the additional check-summing done on those platforms?
> >
> >
> >     -------- Forwarded Message --------
> >     Subject:  FLASH: IBM Spectrum Scale (GPFS): RDMA-enabled network
> >     adapter
> >     failure on the NSD server may result in file IO error (2017.06.30)
> >     Date:                  Fri, 30 Jun 2017 14:19:02 +0000
> >     From:                  IBM My Notifications
> >     <mynotify at stg.events.ihost.com <mailto:mynotify at stg.events.ihost.com
> >>
> >     To: aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
> >
> >
> >
> >
> >     My Notifications for Storage - 30 Jun 2017
> >
> >     Dear Subscriber (aaron.s.knister at nasa.gov
> >     <mailto:aaron.s.knister at nasa.gov>),
> >
> >     Here are your updates from IBM My Notifications.
> >
> >     Your support Notifications display in English by default. Machine
> >     translation based on your IBM profile
> >     language setting is added if you specify this option in My defaults
> >     within My Notifications.
> >     (Note: Not all languages are available at this time, and the English
> >     version always takes precedence
> >     over the machine translated version.)
> >
> >
>  ------------------------------------------------------------------------------
> >     1. IBM Spectrum Scale
> >
> >     - TITLE: IBM Spectrum Scale (GPFS): RDMA-enabled network adapter
> >     failure
> >     on the NSD server may result in file IO error
> >     - URL:
> >
> http://www.ibm.com/support/docview.wss?uid=ssg1S1010233&myns=s033&mynp=OCSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E
> >     <
> http://www.ibm.com/support/docview.wss?uid=ssg1S1010233&myns=s033&mynp=OCSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E
> >
> >     - ABSTRACT: IBM has identified an issue with all IBM GPFS and IBM
> >     Spectrum Scale versions where the NSD server is enabled to use RDMA
> for
> >     file IO and the storage used in your GPFS cluster accessed via NSD
> >     servers (not fully SAN accessible) includes anything other than IBM
> >     Elastic Storage Server (ESS) or GPFS Storage Server (GSS); under
> these
> >     conditions, when the RDMA-enabled network adapter fails, the issue
> may
> >     result in undetected data corruption for file write or read
> operations.
> >
> >
>  ------------------------------------------------------------------------------
> >     Manage your My Notifications subscriptions, or send questions and
> >     comments.
> >     - Subscribe or Unsubscribe -
> >     https://www.ibm.com/support/mynotifications
> >     <https://www.ibm.com/support/mynotifications>
> >     - Feedback -
> >
> https://www-01.ibm.com/support/feedback/techFeedbackCardContentMyNotifications.html
> >     <
> https://www-01.ibm.com/support/feedback/techFeedbackCardContentMyNotifications.html
> >
> >
> >     - Follow us on Twitter - https://twitter.com/IBMStorageSupt
> >     <https://twitter.com/IBMStorageSupt>
> >
> >
> >
> >     To ensure proper delivery please add mynotify at stg.events.ihost.com
> >     <mailto:mynotify at stg.events.ihost.com> to
> >     your address book.
> >     You received this email because you are subscribed to IBM My
> >     Notifications as:
> >     aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
> >
> >     Please do not reply to this message as it is generated by an
> automated
> >     service machine.
> >
> >     (C) International Business Machines Corporation 2017. All rights
> >     reserved.
> >     _______________________________________________
> >     gpfsug-discuss mailing list
> >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> >
> >
> >
> >
> >
> >     _______________________________________________
> >     gpfsug-discuss mailing list
> >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170630/112c2091/attachment-0002.htm>


More information about the gpfsug-discuss mailing list