[gpfsug-discuss] Protection against silent data corruption

Stephen Ulmer ulmer at ulmer.org
Thu Jun 2 18:55:50 BST 2022


This only adds a checksum to the NSD wire protocol. The question was about detecting data corruption at rest.

-- 
Stephen



> On Jun 2, 2022, at 1:01 PM, Achim Rehor <Achim.Rehor at de.ibm.com> wrote:
> 
> hi Stephan, 
> 
> there is, see mmchconfig man page : 
> 
> nsdCksumTraditional
> This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no.
> (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID. 
> The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).)
> 
> The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the 
> server. A checksum error triggers a request to retransmit the message.
> 
> When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in
> response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way.
> 
> You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes.
> 
> Note:
> * Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage.
> 
> * To enable checksums for a subset of the nodes in a cluster, issue a command like the following one:
>    mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>
> 
>    The -N flag is valid for this attribute.
> 
>  -- 
> Mit freundlichen Grüßen / Kind regards
> 
> Achim Rehor
> 
> Technical Support Specialist S​pectrum Scale and ESS (SME)
> Advisory Product Services Professional
> IBM Systems Storage Support - EMEA
> 
> Achim.Rehor at de.ibm.com <mailto:Achim.Rehor at de.ibm.com> +49-170-4521194
>   
> IBM Deutschland GmbH 
> Vorsitzender des Aufsichtsrats: Sebastian Krause
> Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer, 
> Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
> Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
> 
> 
> -----Original Message-----
> From: Stephan Graf <st.graf at fz-juelich.de <mailto:Stephan%20Graf%20%3cst.graf at fz-juelich.de%3e>>
> Reply-To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org <mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
> To: gpfsug-discuss <gpfsug-discuss at gpfsug.org <mailto:gpfsug-discuss%20%3cgpfsug-discuss at gpfsug.org%3e>>
> Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption
> Date: Thu, 02 Jun 2022 16:31:43 +0200
> 
> Hi,
> 
> I am wondering if there is an option in SS to enable some checking to 
> detect silent data corruption.
> 
> Form GNR I know that there is End-to-End integrity. So a checksum is 
> stored in addition.
> 
> The background is that we are facing an issue where in some files (which 
> have data replication =  2) the mmrestripefile is reporting, that one 
> block is mismatching it's copy (the storage cluster is running SS 
> without GNR).
> We have validated that the copied block is fine, but the original one is 
> broken (and this is what is returned on read access).
> SS right now in our installation is unable to determine which is the 
> correct one.
> Is there any option to enable this kind of feature in SS? If not, does 
> it make sense to create an "IDEA" for it?
> 
> Stephan
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220602/ef1a4a34/attachment-0001.htm>


More information about the gpfsug-discuss mailing list