[gpfsug-discuss] Protection against silent data corruption

Achim Rehor Achim.Rehor at de.ibm.com
Fri Jun 10 09:01:01 BST 2022


Thanks Stephen,

for clarifying, i misread the initial question, and thanks Stefan for raising that IDEA.
The new address for raising RFEs/IDEAs on GPFS now is : https://ibm-sys-storage.ideas.ibm.com/ideas?project=GPFS


--

Mit freundlichen Grüßen / Kind regards

Achim Rehor

-----Original Message-----
From: Stephen Ulmer <ulmer at ulmer.org<mailto:Stephen%20Ulmer%20%3culmer at ulmer.org%3e>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption
Date: Thu, 09 Jun 2022 15:47:07 -0400

Just to be clear: any follow-up should be directed to Stephan, who is requesting the feature. I am well aware that Scale does not provide this feature, and was just clarifying Stephan’s question for Achim, who answered the question with an
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Just to be clear: any follow-up should be directed to Stephan, who is requesting the feature.

I am well aware that Scale does not provide this feature, and was just clarifying Stephan’s question for Achim, who answered the question with an unrelated reference after which Scale support replied to me.

This is also where I notice that for all that is holy, the generated IDEA links point to DeveloperWorks and don’t even get you to the correct forum thread. Sigh.

--
Stephen



On Jun 9, 2022, at 2:45 PM, IBM Spectrum Scale <scale at us.ibm.com<mailto:scale at us.ibm.com>> wrote:


Thanks Stephan.
This will be looked into and accordingly prioritized by the offering manager team. Incase the IBM team has any further questions on this then we will get back to you.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.

<graycol.gif>"Stephan Graf" ---09-06-2022 11.31.01 AM---Hi, I have create an IDEA for it:

From: "Stephan Graf" <st.graf at fz-juelich.de<mailto:st.graf at fz-juelich.de>>
To: <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date: 09-06-2022 11.31 AM
Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data corruption
Sent by: "gpfsug-discuss" <gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>>

________________________________



Hi,

I have create an IDEA for it:
https://ibm-sys-storage.ideas.ibm.com/ideas/GPFS-I-851

Stephan


Am 08.06.2022 um 20:35 schrieb IBM Spectrum Scale:
> Hi Stephen,
>
> Currently such a feature is not available in Spectrum Scale product.
>
>
> Regards, The Spectrum Scale (GPFS) team
>
> ------------------------------------------------------------------------------------------------------------------
> If you feel that your question can benefit other users of  Spectrum
> Scale (GPFS), then please post it to the public IBM developerWroks Forum
> at
> https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
> <https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479>.
>
>
> If your query concerns a potential software error in Spectrum Scale
> (GPFS) and you have an IBM software maintenance contract please contact
>   1-800-237-5511 in the United States or your local IBM Service Center
> in other countries.
>
> The forum is informally monitored as time permits and should not be used
> for priority messages to the Spectrum Scale (GPFS) team.
>
> Inactive hide details for "Stephen Ulmer" ---02-06-2022 11.32.27
> PM---This only adds a checksum to the NSD wire protocol. The q"Stephen
> Ulmer" ---02-06-2022 11.32.27 PM---This only adds a checksum to the NSD
> wire protocol. The question was about detecting data corruption
>
> From: "Stephen Ulmer" <ulmer at ulmer.org<mailto:ulmer at ulmer.org>>
> To: "gpfsug main discussion list" <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
> Date: 02-06-2022 11.32 PM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Protection against silent data
> corruption
> Sent by: "gpfsug-discuss" <gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>>
>
> ------------------------------------------------------------------------
>
>
>
> This only adds a checksum to the NSD wire protocol. The question was
> about detecting data corruption at rest. -- Stephen On Jun 2, 2022, at
> 1:01 PM, Achim Rehor <Achim.Rehor at de.ibm.com<mailto:Achim.Rehor at de.ibm.com>> wrote: hi Stephan,
> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
>
> This only adds a checksum to the NSD wire protocol. The question was
> about detecting data corruption at rest.
>
> --
> Stephen
>
>
>     On Jun 2, 2022, at 1:01 PM, Achim Rehor <_Achim.Rehor at de.ibm.com_
>     <mailto:Achim.Rehor at de.ibm.com>> wrote:
>
>     hi Stephan,
>
>     there is, see mmchconfig man page :
>
>     nsdCksumTraditional
>     This attribute enables checksum data-integrity checking between a
>     traditional NSD client node and its NSD server. Valid values are yes
>     and no. The default value is no.
>     (Traditional in this context means that the NSD client and server
>     are configured with IBM Spectrum Scale rather than with IBM Spectrum
>     Scale RAID.
>     The latter is a component of IBM Elastic Storage Server (ESS) and of
>     IBM GPFS Storage Server (GSS).)
>
>     The checksum procedure detects any corruption by the network of the
>     data in the NSD RPCs that are exchanged between the NSD client and the
>     server. A checksum error triggers a request to retransmit the message.
>
>     When this attribute is enabled on a client node, the client
>     indicates in each of its requests to the server that it is using
>     checksums. The server uses checksums only in
>     response to client requests in which the indicator is set. A client
>     node that accesses a file system that belongs to another cluster can
>     use checksums in the same way.
>
>     You can change the value of the this attribute for an entire cluster
>     without shutting down the mmfsd daemon, or for one or more nodes
>     without restarting the nodes.
>
>     Note:
>     * Enabling this feature can result in significant I/O performance
>     degradation and a considerable increase in CPU usage.
>
>     * To enable checksums for a subset of the nodes in a cluster, issue
>     a command like the following one:
>         mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>
>
>         The -N flag is valid for this attribute.
>
>     --
>     Mit freundlichen Grüßen / Kind regards
>
>     Achim Rehor
>
>     Technical Support Specialist S​pectrum Scale and ESS (SME)
>     Advisory Product Services Professional
>     IBM Systems Storage Support - EMEA
>
>     _Achim.Rehor at de.ibm.com_
>     <mailto:Achim.Rehor at de.ibm.com> +49-170-4521194
>     IBM Deutschland GmbH
>     Vorsitzender des Aufsichtsrats: Sebastian Krause
>     Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
>     Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
>     Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
>     Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>     -----Original Message-----
>     *From*: Stephan Graf <_st.graf at fz-juelich.de_
>     <mailto:Stephan%20Graf%20%3cst.graf at fz-juelich.de%3e>>
>     *Reply-To*: gpfsug main discussion list <_gpfsug-discuss at gpfsug.org_
>     <mailto:gpfsug%20main%20discussion%20list%20%3cgpfsug-discuss at gpfsug.org%3e>>
>     *To*: gpfsug-discuss <_gpfsug-discuss at gpfsug.org_
>     <mailto:gpfsug-discuss%20%3cgpfsug-discuss at gpfsug.org%3e>>
>     *Subject*: [EXTERNAL] [gpfsug-discuss] Protection against silent
>     data corruption
>     *Date*: Thu, 02 Jun 2022 16:31:43 +0200
>
>     Hi,
>
>     I am wondering if there is an option in SS to enable some checking to
>     detect silent data corruption.
>
>     Form GNR I know that there is End-to-End integrity. So a checksum is
>     stored in addition.
>
>     The background is that we are facing an issue where in some files
>     (which
>     have data replication =  2) the mmrestripefile is reporting, that one
>     block is mismatching it's copy (the storage cluster is running SS
>     without GNR).
>     We have validated that the copied block is fine, but the original
>     one is
>     broken (and this is what is returned on read access).
>     SS right now in our installation is unable to determine which is the
>     correct one.
>     Is there any option to enable this kind of feature in SS? If not, does
>     it make sense to create an "IDEA" for it?
>
>     Stephan
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at _gpfsug.org_ <http://gpfsug.org<http://gpfsug.org/>>
>     _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_>
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at _gpfsug.org_ <http://gpfsug.org<http://gpfsug.org/>>_
>     __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org_>
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org<http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org<http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>

--
Stephan Graf
Juelich Supercomputing Centre

Phone:  +49-2461-61-6578
Fax:    +49-2461-61-6656
E-mail: st.graf at fz-juelich.de<mailto:st.graf at fz-juelich.de>
WWW:    http://www.fz-juelich.de/jsc/<http://www.fz-juelich.de/jsc/>
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
[attachment "smime.p7s" deleted by Huzefa H Pancha/India/IBM] _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20220610/0904563d/attachment-0001.htm>


More information about the gpfsug-discuss mailing list