[gpfsug-discuss] NSD network checksums (nsdCksumTraditional)

Felipe Knop knop at us.ibm.com
Mon Oct 29 21:27:41 GMT 2018


Stephen,

ESS does perform checksums in the transfer between NSD clients and NSD
servers. As Kums described below, the difference between the checksums
performed by GNR and those performed with "nsdCksumTraditional" is that GNR
checksums are computed in parallel on the server side, as a large FS block
is broken into smaller pieces. On non-GNR environments (when
nsdCksumTraditional is set), the checksum is computed sequentially on the
server.

  Felipe

----
Felipe Knop                                     knop at us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314  T/L 293-9314





From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	10/29/2018 04:52 PM
Subject:	Re: [gpfsug-discuss] NSD network checksums
            (nsdCksumTraditional)
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



So the ESS checksums that are highly touted as "protecting all the way to
the disk surface" completely ignore the transfer between the client and the
NSD server? It sounds like you are saying that all of the checksumming done
for GNR is internal to GNR and only protects against bit-flips on the disk
(and in staging buffers, etc.)

I’m asking because your explanation completely ignores calculating anything
on the NSD client and implies that the client could not participate, given
that it does not know about the structure of the vdisks under the NSD — but
that has to be a performance factor for both types if the transfer is
protected starting at the client — which it is in the case of
nsdCksumTraditional which is what we are comparing to ESS checksumming.

If ESS checksumming doesn’t protect on the wire I’d say that marketing has
run amok, because that has *definitely* been implied in meetings for which
I’ve been present. In fact, when asked if Spectrum Scale provides
checksumming for data in-flight, IBM sales has used it as an ESS up-sell
opportunity.

--
Stephen



      On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram <kums at us.ibm.com> wrote:

      Hi,

      >>How can it be that the I/O performance degradation warning only
      seems to accompany the nsdCksumTraditional setting and not GNR?
      >>Why is there such a penalty for "traditional" environments?

      In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in
      parallel  for a NSD (storage volume/vdisk) across the threads
      handling each pdisk/drive (that constitutes the vdisk/volume). This
      is possible since the GNR software on the ESS IO servers is tightly
      integrated with underlying storage and is aware of the vdisk DRAID
      configuration (strip-size, pdisk constituting vdisk etc.) to perform
      parallel checksum operations.

      In non-GNR + external storage model, the GPFS software on the NSD
      server(s) does not manage the underlying storage volume (this is done
      by storage RAID controllers)  and the checksum is computed serially.
      This would contribute to increase in CPU usage and I/O performance
      degradation (depending on I/O access patterns, I/O load etc).

      My two cents.

      Regards,
      -Kums





      From:        Aaron Knister <aaron.s.knister at nasa.gov>
      To:        gpfsug main discussion list <
      gpfsug-discuss at spectrumscale.org>
      Date:        10/29/2018 12:34 PM
      Subject:        [gpfsug-discuss] NSD network checksums
      (nsdCksumTraditional)
      Sent by:        gpfsug-discuss-bounces at spectrumscale.org



      Flipping through the slides from the recent SSUG meeting I noticed
      that
      in 5.0.2 one of the features mentioned was the nsdCksumTraditional
      flag.
      Reading up on it it seems as though it comes with a warning about
      significant I/O performance degradation and increase in CPU usage. I
      also recall that data integrity checking is performed by default with

      GNR. How can it be that the I/O performance degradation warning only
      seems to accompany the nsdCksumTraditional setting and not GNR? As
      someone who knows exactly 0 of the implementation details, I'm just
      naively assuming that the checksum are being generated (in the same
      way?) in both cases and transferred to the NSD server. Why is there
      such
      a penalty for "traditional" environments?

      -Aaron

      --
      Aaron Knister
      NASA Center for Climate Simulation (Code 606.2)
      Goddard Space Flight Center
      (301) 286-2776
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss



      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181029/25ecc685/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181029/25ecc685/attachment-0002.gif>


More information about the gpfsug-discuss mailing list