[gpfsug-discuss] Protection against silent data corruption

IBM Spectrum Scale scale at us.ibm.com
Wed Jun 8 19:35:05 BST 2022



Hi Stephen,

Currently such a feature is not available in Spectrum Scale product.


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:	"Stephen Ulmer" <ulmer at ulmer.org>
To:	"gpfsug main discussion list" <gpfsug-discuss at gpfsug.org>
Date:	02-06-2022 11.32 PM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Protection against silent data
            corruption
Sent by:	"gpfsug-discuss" <gpfsug-discuss-bounces at gpfsug.org>



This only adds a checksum to the NSD wire protocol. The question was about
detecting data corruption at rest. -- Stephen On Jun 2, 2022, at 1:01 PM,
Achim Rehor <Achim.Rehor at de.ibm.com> wrote: hi Stephan, ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
This only adds a checksum to the NSD wire protocol. The question was about
detecting data corruption at rest.

--
Stephen



      On Jun 2, 2022, at 1:01 PM, Achim Rehor <Achim.Rehor at de.ibm.com>
      wrote:

      hi Stephan,

      there is, see mmchconfig man page :

      nsdCksumTraditional
      This attribute enables checksum data-integrity checking between a
      traditional NSD client node and its NSD server. Valid values are yes
      and no. The default value is no.
      (Traditional in this context means that the NSD client and server are
      configured with IBM Spectrum Scale rather than with IBM Spectrum
      Scale RAID.
      The latter is a component of IBM Elastic Storage Server (ESS) and of
      IBM GPFS Storage Server (GSS).)

      The checksum procedure detects any corruption by the network of the
      data in the NSD RPCs that are exchanged between the NSD client and
      the
      server. A checksum error triggers a request to retransmit the
      message.

      When this attribute is enabled on a client node, the client indicates
      in each of its requests to the server that it is using checksums. The
      server uses checksums only in
      response to client requests in which the indicator is set. A client
      node that accesses a file system that belongs to another cluster can
      use checksums in the same way.

      You can change the value of the this attribute for an entire cluster
      without shutting down the mmfsd daemon, or for one or more nodes
      without restarting the nodes.

      Note:
      * Enabling this feature can result in significant I/O performance
      degradation and a considerable increase in CPU usage.

      * To enable checksums for a subset of the nodes in a cluster, issue a
      command like the following one:
         mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>

         The -N flag is valid for this attribute.

      --
      Mit freundlichen Grüßen / Kind regards

      Achim Rehor

      Technical Support Specialist S​pectrum Scale and ESS (SME)
      Advisory Product Services Professional
      IBM Systems Storage Support - EMEA

      Achim.Rehor at de.ibm.com +49-170-4521194
      IBM Deutschland GmbH
      Vorsitzender des Aufsichtsrats: Sebastian Krause
      Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
      Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
      Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
      Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940


      -----Original Message-----
      From: Stephan Graf <st.graf at fz-juelich.de>
      Reply-To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
      To: gpfsug-discuss <gpfsug-discuss at gpfsug.org>
      Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data
      corruption
      Date: Thu, 02 Jun 2022 16:31:43 +0200

      Hi,

      I am wondering if there is an option in SS to enable some checking to
      detect silent data corruption.

      Form GNR I know that there is End-to-End integrity. So a checksum is
      stored in addition.

      The background is that we are facing an issue where in some files
      (which
      have data replication =  2) the mmrestripefile is reporting, that one
      block is mismatching it's copy (the storage cluster is running SS
      without GNR).
      We have validated that the copied block is fine, but the original one
      is
      broken (and this is what is returned on read access).
      SS right now in our installation is unable to determine which is the
      correct one.
      Is there any option to enable this kind of feature in SS? If not,
      does
      it make sense to create an "IDEA" for it?

      Stephan

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at gpfsug.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at gpfsug.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220609/d6acfc3d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220609/d6acfc3d/attachment-0001.gif>


More information about the gpfsug-discuss mailing list