[gpfsug-discuss] Protection against silent data corruption

Stephan Graf st.graf at fz-juelich.de
Thu Jun 9 06:59:13 BST 2022


Hi,

I have create an IDEA for it: 
https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851

Stephan


Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale:
> Hi Stephen,
> 
> Currently such a feature is not available in Spectrum Scale product.
> 
> 
> Regards, The Spectrum Scale (GPFS) team
> 
> ------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum 
> Scale (GPFS), then please post it to the public IBM developerWroks Forum 
> at 
> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 
> <https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479>. 
> 
> 
> If your query concerns a potential software error in Spectrum Scale 
> (GPFS) and you have an IBM software maintenance contract please contact 
>   1-800-237-5511 in the United States or your local IBM Service Center 
> in other countries.
> 
> The forum is informally monitored as time permits and should not be used 
> for priority messages to the Spectrum Scale (GPFS) team.
> 
> Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27 
> PM---This only adds a checksum to the NSD wire protocol. The q"Stephen 
> Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD 
> wire protocol. The question was about detecting data corruption
> 
> From: "Stephen Ulmer" <ulmer at ulmer.org>
> To: "gpfsug main discussion list" <gpfsug-discuss at gpfsug.org>
> Date: 02-06-2022 11.32 PM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data 
> corruption
> Sent by: "gpfsug-discuss" <gpfsug-discuss-bounces at gpfsug.org>
> 
> ------------------------------------------------------------------------
> 
> 
> 
> This only adds a checksum to the NSD wire protocol. The question was 
> about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at 
> 1:01 PM, Achim Rehor <Achim.Rehor at de.ibm.com> wrote: hi Stephan, 
> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
> ZjQcmQRYFpfptBannerStart
> *This Message Is From an External Sender *
> This message came from outside your organization.
> 
> ZjQcmQRYFpfptBannerEnd
> This only adds a checksum to the NSD wire protocol. The question was 
> about detecting data corruption at rest.
> 
> -- 
> Stephen
> 
> 
>     On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_
>     <mailto:Achim.Rehor at de.ibm.com>> wrote:
> 
>     hi Stephan,
> 
>     there is, see mmchconfig man page :
> 
>     nsdCksumTraditional
>     This attribute enables checksum data-integrity checking between a
>     traditional NSD client node and its NSD server. Valid values are yes
>     and no. The default value is no.
>     (Traditional in this context means that the NSD client and server
>     are configured with IBM Spectrum Scale rather than with IBM Spectrum
>     Scale RAID.
>     The latter is a component of IBM Elastic Storage Server (ESS) and of
>     IBM GPFS Storage Server (GSS).)
> 
>     The checksum procedure detects any corruption by the network of the
>     data in the NSD RPCs that are exchanged between the NSD client and the
>     server. A checksum error triggers a request to retransmit the message.
> 
>     When this attribute is enabled on a client node, the client
>     indicates in each of its requests to the server that it is using
>     checksums. The server uses checksums only in
>     response to client requests in which the indicator is set. A client
>     node that accesses a file system that belongs to another cluster can
>     use checksums in the same way.
> 
>     You can change the value of the this attribute for an entire cluster
>     without shutting down the mmfsd daemon, or for one or more nodes
>     without restarting the nodes.
> 
>     Note:
>     * Enabling this feature can result in significant I/O performance
>     degradation and a considerable increase in CPU usage.
> 
>     * To enable checksums for a subset of the nodes in a cluster, issue
>     a command like the following one:
>         mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>
> 
>         The -N flag is valid for this attribute.
> 
>     -- 
>     Mit freundlichen Grüßen / Kind regards
> 
>     Achim Rehor
> 
>     Technical Support Specialist S​pectrum Scale and ESS (SME)
>     Advisory Product Services Professional
>     IBM Systems Storage Support - EMEA
> 
>     _Achim.Rehor at de.ibm.com_
>     <mailto:Achim.Rehor at de.ibm.com> +49-170-4521194
>     IBM Deutschland GmbH
>     Vorsitzender des Aufsichtsrats: Sebastian Krause
>     Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
>     Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
>     Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
>     Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
> 
> 
>     -----Original Message-----
>     *From*: Stephan Graf <_st.graf at fz-juelich.de_
>     <mailto:Stephan%20Graf%20%3cst.graf at fz-juelich.de%3e>>
>     *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_
>     <mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
>     *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_
>     <mailto:gpfsug-discuss%20%3cgpfsug-discuss at gpfsug.org%3e>>
>     *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent
>     data corruption
>     *Date*: Thu, 02 Jun 2022 16:31:43 +0200
> 
>     Hi,
> 
>     I am wondering if there is an option in SS to enable some checking to
>     detect silent data corruption.
> 
>     Form GNR I know that there is End-to-End integrity. So a checksum is
>     stored in addition.
> 
>     The background is that we are facing an issue where in some files
>     (which
>     have data replication =  2) the mmrestripefile is reporting, that one
>     block is mismatching it's copy (the storage cluster is running SS
>     without GNR).
>     We have validated that the copied block is fine, but the original
>     one is
>     broken (and this is what is returned on read access).
>     SS right now in our installation is unable to determine which is the
>     correct one.
>     Is there any option to enable this kind of feature in SS? If not, does
>     it make sense to create an "IDEA" for it?
> 
>     Stephan
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at _gpfsug.org_ <http://gpfsug.org>
>     _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at _gpfsug.org_ <http://gpfsug.org>_
>     __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org 
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

-- 
Stephan Graf
Juelich Supercomputing Centre

Phone:  +49-2461-61-6578
Fax:    +49-2461-61-6656
E-mail: st.graf at fz-juelich.de
WWW:    http://www.fz-juelich.de/jsc/
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5360 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220609/cf20f600/attachment-0001.bin>


More information about the gpfsug-discuss mailing list