[gpfsug-discuss] gpfsug-discuss Digest, Vol 62, Issue 33

Jonathan Buzzard jonathan at buzzard.me.uk
Thu Mar 16 16:54:32 GMT 2017


On Thu, 2017-03-16 at 10:39 -0400, Aaron Knister wrote:
> In all seriousness, I'd love to be wrong about this. I'm not sure
> which part(s) of what I said was inaccurate-- the vendor lock in
> and/or that GNR is the only way to get end to end checksums.

The end to end checksums, or at least the ability to protect against
silent corruption by having the data checksummed at all stages.

> 
> When I say end to end checksums I mean that from the moment an FS
> write is submitted to mmfsd a checksum is calculated that is passed
> through every layer (memory, network, sas, fibre channel etc.) down to
> individual disk drives (understanding that the RAID controller may
> need to derive the checksum based on whatever RAID algorithm it's
> using). It's my understanding that the only way to achieve this with
> GPFS today is with GNR which is only available via purchasing a GSS or
> ESS solution from IBM/Lenovo. One is of course free to by hardware
> from any vendor but you don't get GNR. I should really have said "if
> you want GNR you have to buy hardware from IBM or Lenovo" which to me
> is being locked in to a vendor as long as you'd like end to end
> checksums. 
> 
> If there's another way to get end-to-end checksums, could you help me
> understand how?
> 

Bear in mind the purpose of the checksums is to ensure the data is not
corrupted. Noting you don't get true end to end because the checksums
are never exposed to the application even in ZFS, at least as I
understand it; you just get a read failure.

> Regarding DIF/DIX, it's my understanding that I can/could use T10-DIF
> today (with the correct hardware) without purchasing any standards
> which would checksum data from the HBA to the RAID controller (and in
> theory disks). However, in an environment with NSD servers the origin
> of a read/write could in theory be quite a bit further away from the
> HBA in terms of hops. T10-DIF would be completely separate from GPFS
> as I understand it. I'm not aware of any integration (T10-DIF + DIX).
> What I'm really looking for, I think, is T10-DIF + DIX where the
> application (GPFS) generates protection information that's then passed
> along to the layers below it. 
> 

Well yes an end user does not need to purchase a copy of the standard.
However some people take the view that as you need to spend money to get
the standard it is not truly open, so I was just making that clear.

Right, so bearing in mind that the purpose of the checksums is to ensure
data is not corrupted if the NSD servers are using DIF/DIX, then the
options for silent data corruption are very limited if you are using ECC
memory and you are using standard TCP/IP to communicate between the
nodes. That is the TCP/IP checksums will protect your data between the
clients and the NSD nodes, and once at the NSD node the DIF/DIX will
protect your data as it makes it way to the disk. In between all that
the ECC memory in your machines will protect everything once in RAM, and
the PCIe bus has a 32bit CRC protecting everything that moves over the
PCIe bus.

The only "hole" in this is as far as I am aware is that the CPU itself,
because at least x86 ones (as far as I am aware) do not "checksum"
themselves as they do calculations, but you have the same problem with
ZFS for exactly the same reason.

So the point is that what you should be interested in as an admin;
ensuring your data is protected against silent corruption, is achievable
without hardware vendor lock in using open standards.

The advantage of DIF/DIX over ZFS is as I understand it unless you do an
immediate read on written blocks with ZFS and take a large performance
penalty on writes you have no idea if your data has been corrupted until
you try and read it back which could be months if not years down the
line. Where DIF/DIX should try and fix it immediately by a retry and if
that does not work pass the failure back up the line immediately.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.




More information about the gpfsug-discuss mailing list