[gpfsug-discuss] data integrity documentation

Sven Oehme oehmes at gmail.com
Wed Aug 2 17:26:29 BST 2017


the very first thing you should check is if you have this setting set :

mmlsconfig envVar

envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
MLX5_USE_MUTEX 1

if that doesn't come back the way above you need to set it :

mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"

there was a problem in the Mellanox FW in various versions that was never
completely addressed (bugs where found and fixed, but it was never fully
proven to be addressed) the above environment variables turn code on in the
mellanox driver that prevents this potential code path from being used to
begin with.

in Spectrum Scale 4.2.4 (not yet released) we added a workaround in Scale
that even you don't set this variables the problem can't happen anymore
until then the only choice you have is the envVar above (which btw ships as
default on all ESS systems).

you also should be on the latest available Mellanox FW & Drivers as not all
versions even have the code that is activated by the environment variables
above, i think at a minimum you need to be at 3.4 but i don't remember the
exact version. There had been multiple defects opened around this area, the
last one i remember was  :

00154843 : ESS ConnectX-3 performance issue - spinning on pthread_spin_lock

you may ask your mellanox representative if they can get you access to this
defect. while it was found on ESS , means on PPC64 and with ConnectX-3
cards its a general issue that affects all cards and on intel as well as
Power.

On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
wrote:

> hi all,
>
> is there any documentation wrt data integrity in spectrum scale:
> assuming a crappy network, does gpfs garantee somehow that data written
> by client ends up safe in the nsd gpfs daemon; and similarly from the
> nsd gpfs daemon to disk.
>
> and wrt crappy network, what about rdma on crappy network? is it the same?
>
> (we are hunting down a crappy infiniband issue; ibm support says it's
> network issue; and we see no errors anywhere...)
>
> thanks a lot,
>
> stijn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170802/cfdc2bac/attachment-0005.htm>


More information about the gpfsug-discuss mailing list