[gpfsug-discuss] data integrity documentation

Sven Oehme oehmes at gmail.com
Wed Aug 2 19:47:52 BST 2017


How can you reproduce this so quick ?
Did you restart all daemons after that ?

On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
wrote:

> hi sven,
>
>
> > the very first thing you should check is if you have this setting set :
> maybe the very first thing to check should be the faq/wiki that has this
> documented?
>
> >
> > mmlsconfig envVar
> >
> > envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
> > MLX5_USE_MUTEX 1
> >
> > if that doesn't come back the way above you need to set it :
> >
> > mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
> > MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
> i just set this (wasn't set before), but problem is still present.
>
> >
> > there was a problem in the Mellanox FW in various versions that was never
> > completely addressed (bugs where found and fixed, but it was never fully
> > proven to be addressed) the above environment variables turn code on in
> the
> > mellanox driver that prevents this potential code path from being used to
> > begin with.
> >
> > in Spectrum Scale 4.2.4 (not yet released) we added a workaround in Scale
> > that even you don't set this variables the problem can't happen anymore
> > until then the only choice you have is the envVar above (which btw ships
> as
> > default on all ESS systems).
> >
> > you also should be on the latest available Mellanox FW & Drivers as not
> all
> > versions even have the code that is activated by the environment
> variables
> > above, i think at a minimum you need to be at 3.4 but i don't remember
> the
> > exact version. There had been multiple defects opened around this area,
> the
> > last one i remember was  :
> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
> dell, and the fw is a bit behind. i'm trying to convince dell to make
> new one. mellanox used to allow to make your own, but they don't anymore.
>
> >
> > 00154843 : ESS ConnectX-3 performance issue - spinning on
> pthread_spin_lock
> >
> > you may ask your mellanox representative if they can get you access to
> this
> > defect. while it was found on ESS , means on PPC64 and with ConnectX-3
> > cards its a general issue that affects all cards and on intel as well as
> > Power.
> ok, thanks for this. maybe such a reference is enough for dell to update
> their firmware.
>
> stijn
>
> >
> > On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
> > wrote:
> >
> >> hi all,
> >>
> >> is there any documentation wrt data integrity in spectrum scale:
> >> assuming a crappy network, does gpfs garantee somehow that data written
> >> by client ends up safe in the nsd gpfs daemon; and similarly from the
> >> nsd gpfs daemon to disk.
> >>
> >> and wrt crappy network, what about rdma on crappy network? is it the
> same?
> >>
> >> (we are hunting down a crappy infiniband issue; ibm support says it's
> >> network issue; and we see no errors anywhere...)
> >>
> >> thanks a lot,
> >>
> >> stijn
> >> _______________________________________________
> >> gpfsug-discuss mailing list
> >> gpfsug-discuss at spectrumscale.org
> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170802/c2bf4d5a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list