[gpfsug-discuss] data integrity documentation

Stijn De Weirdt stijn.deweirdt at ugent.be
Wed Aug 2 19:53:09 BST 2017


yes ;)

the system is in preproduction, so nothing that can't stopped/started in
a few minutes (current setup has only 4 nsds, and no clients).
mmfsck triggers the errors very early during inode replica compare.


stijn

On 08/02/2017 08:47 PM, Sven Oehme wrote:
> How can you reproduce this so quick ?
> Did you restart all daemons after that ?
> 
> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
> wrote:
> 
>> hi sven,
>>
>>
>>> the very first thing you should check is if you have this setting set :
>> maybe the very first thing to check should be the faq/wiki that has this
>> documented?
>>
>>>
>>> mmlsconfig envVar
>>>
>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
>>> MLX5_USE_MUTEX 1
>>>
>>> if that doesn't come back the way above you need to set it :
>>>
>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
>> i just set this (wasn't set before), but problem is still present.
>>
>>>
>>> there was a problem in the Mellanox FW in various versions that was never
>>> completely addressed (bugs where found and fixed, but it was never fully
>>> proven to be addressed) the above environment variables turn code on in
>> the
>>> mellanox driver that prevents this potential code path from being used to
>>> begin with.
>>>
>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in Scale
>>> that even you don't set this variables the problem can't happen anymore
>>> until then the only choice you have is the envVar above (which btw ships
>> as
>>> default on all ESS systems).
>>>
>>> you also should be on the latest available Mellanox FW & Drivers as not
>> all
>>> versions even have the code that is activated by the environment
>> variables
>>> above, i think at a minimum you need to be at 3.4 but i don't remember
>> the
>>> exact version. There had been multiple defects opened around this area,
>> the
>>> last one i remember was  :
>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
>> dell, and the fw is a bit behind. i'm trying to convince dell to make
>> new one. mellanox used to allow to make your own, but they don't anymore.
>>
>>>
>>> 00154843 : ESS ConnectX-3 performance issue - spinning on
>> pthread_spin_lock
>>>
>>> you may ask your mellanox representative if they can get you access to
>> this
>>> defect. while it was found on ESS , means on PPC64 and with ConnectX-3
>>> cards its a general issue that affects all cards and on intel as well as
>>> Power.
>> ok, thanks for this. maybe such a reference is enough for dell to update
>> their firmware.
>>
>> stijn
>>
>>>
>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
>>> wrote:
>>>
>>>> hi all,
>>>>
>>>> is there any documentation wrt data integrity in spectrum scale:
>>>> assuming a crappy network, does gpfs garantee somehow that data written
>>>> by client ends up safe in the nsd gpfs daemon; and similarly from the
>>>> nsd gpfs daemon to disk.
>>>>
>>>> and wrt crappy network, what about rdma on crappy network? is it the
>> same?
>>>>
>>>> (we are hunting down a crappy infiniband issue; ibm support says it's
>>>> network issue; and we see no errors anywhere...)
>>>>
>>>> thanks a lot,
>>>>
>>>> stijn
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 



More information about the gpfsug-discuss mailing list