<div dir="ltr"><div>ok, you can't be any newer that that. i just wonder why you have 512b inodes if this is a new system ? </div><div>are this raw disks in this setup or raid controllers ? whats the disk sector size and how was the filesystem created (mmlsfs FSNAME would show answer to the last question) </div><div><br></div>on the tsdbfs i am not sure if it gave wrong results, but it would be worth a test to see whats actually on the disk . <div><br></div><div>you are correct that GNR extends this to the disk, but the network part is covered by the nsdchecksums you turned on </div><div>when you enable the not to be named checksum parameter do you actually still get an error from fsck ? </div><div><br></div><div>sven</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Aug 2, 2017 at 2:14 PM Stijn De Weirdt <<a href="mailto:stijn.deweirdt@ugent.be">stijn.deweirdt@ugent.be</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">hi sven,<br>

<br>

> before i answer the rest of your questions, can you share what version of<br>

> GPFS exactly you are on mmfsadm dump version would be best source for that.<br>

it returns<br>

Build branch "4.2.3.3 ".<br>

<br>

> if you have 2 inodes and you know the exact address of where they are<br>

> stored on disk one could 'dd' them of the disk and compare if they are<br>

> really equal.<br>

ok, i can try that later. are you suggesting that the "tsdbfs comp"<br>

might gave wrong results? because we ran that and got eg<br>

<br>

> # tsdbfs somefs comp 7:5137408 25:221785088 1024<br>

> Comparing 1024 sectors at 7:5137408 = 0x7:4E6400 and 25:221785088 = 0x19:D382C00:<br>

>   All sectors identical<br>

<br>

<br>

> we only support checksums when you use GNR based systems, they cover<br>

> network as well as Disk side for that.<br>

> the nsdchecksum code you refer to is the one i mentioned above thats only<br>

> supported with GNR at least i am not aware that we ever claimed it to be<br>

> supported outside of it, but i can check that.<br>

ok, maybe i'm a bit consfused. we have a GNR too, but it's not this one,<br>

and they are not in the same gpfs cluster.<br>

<br>

i thought the GNR extended the checksumming to disk, and that it was<br>

already there for the network part. thanks for clearing this up. but<br>

that is worse then i thought...<br>

<br>

stijn<br>

<br>

><br>

> sven<br>

><br>

> On Wed, Aug 2, 2017 at 12:20 PM Stijn De Weirdt <<a href="mailto:stijn.deweirdt@ugent.be" target="_blank">stijn.deweirdt@ugent.be</a>><br>

> wrote:<br>

><br>

>> hi sven,<br>

>><br>

>> the data is not corrupted. mmfsck compares 2 inodes, says they don't<br>

>> match, but checking the data with tbdbfs reveals they are equal.<br>

>> (one replica has to be fetched over the network; the nsds cannot access<br>

>> all disks)<br>

>><br>

>> with some nsdChksum... settings we get during this mmfsck a lot of<br>

>> "Encountered XYZ checksum errors on network I/O to NSD Client disk"<br>

>><br>

>> ibm support says these are hardware issues, but wrt to mmfsck false<br>

>> positives.<br>

>><br>

>> anyway, our current question is: if these are hardware issues, is there<br>

>> anything in gpfs client->nsd (on the network side) that would detect<br>

>> such errors. ie can we trust the data (and metadata).<br>

>> i was under the impression that client to disk is not covered, but i<br>

>> assumed that at least client to nsd (the network part) was checksummed.<br>

>><br>

>> stijn<br>

>><br>

>><br>

>> On 08/02/2017 09:10 PM, Sven Oehme wrote:<br>

>>> ok, i think i understand now, the data was already corrupted. the config<br>

>>> change i proposed only prevents a potentially known future on the wire<br>

>>> corruption, this will not fix something that made it to the disk already.<br>

>>><br>

>>> Sven<br>

>>><br>

>>><br>

>>><br>

>>> On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <<a href="mailto:stijn.deweirdt@ugent.be" target="_blank">stijn.deweirdt@ugent.be</a><br>

>>><br>

>>> wrote:<br>

>>><br>

>>>> yes ;)<br>

>>>><br>

>>>> the system is in preproduction, so nothing that can't stopped/started in<br>

>>>> a few minutes (current setup has only 4 nsds, and no clients).<br>

>>>> mmfsck triggers the errors very early during inode replica compare.<br>

>>>><br>

>>>><br>

>>>> stijn<br>

>>>><br>

>>>> On 08/02/2017 08:47 PM, Sven Oehme wrote:<br>

>>>>> How can you reproduce this so quick ?<br>

>>>>> Did you restart all daemons after that ?<br>

>>>>><br>

>>>>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <<a href="mailto:stijn.deweirdt@ugent.be" target="_blank">stijn.deweirdt@ugent.be</a><br>

>>><br>

>>>>> wrote:<br>

>>>>><br>

>>>>>> hi sven,<br>

>>>>>><br>

>>>>>><br>

>>>>>>> the very first thing you should check is if you have this setting<br>

>> set :<br>

>>>>>> maybe the very first thing to check should be the faq/wiki that has<br>

>> this<br>

>>>>>> documented?<br>

>>>>>><br>

>>>>>>><br>

>>>>>>> mmlsconfig envVar<br>

>>>>>>><br>

>>>>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1<br>

>>>>>>> MLX5_USE_MUTEX 1<br>

>>>>>>><br>

>>>>>>> if that doesn't come back the way above you need to set it :<br>

>>>>>>><br>

>>>>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1<br>

>>>>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"<br>

>>>>>> i just set this (wasn't set before), but problem is still present.<br>

>>>>>><br>

>>>>>>><br>

>>>>>>> there was a problem in the Mellanox FW in various versions that was<br>

>>>> never<br>

>>>>>>> completely addressed (bugs where found and fixed, but it was never<br>

>>>> fully<br>

>>>>>>> proven to be addressed) the above environment variables turn code on<br>

>> in<br>

>>>>>> the<br>

>>>>>>> mellanox driver that prevents this potential code path from being<br>

>> used<br>

>>>> to<br>

>>>>>>> begin with.<br>

>>>>>>><br>

>>>>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in<br>

>>>> Scale<br>

>>>>>>> that even you don't set this variables the problem can't happen<br>

>> anymore<br>

>>>>>>> until then the only choice you have is the envVar above (which btw<br>

>>>> ships<br>

>>>>>> as<br>

>>>>>>> default on all ESS systems).<br>

>>>>>>><br>

>>>>>>> you also should be on the latest available Mellanox FW & Drivers as<br>

>> not<br>

>>>>>> all<br>

>>>>>>> versions even have the code that is activated by the environment<br>

>>>>>> variables<br>

>>>>>>> above, i think at a minimum you need to be at 3.4 but i don't<br>

>> remember<br>

>>>>>> the<br>

>>>>>>> exact version. There had been multiple defects opened around this<br>

>> area,<br>

>>>>>> the<br>

>>>>>>> last one i remember was  :<br>

>>>>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from<br>

>>>>>> dell, and the fw is a bit behind. i'm trying to convince dell to make<br>

>>>>>> new one. mellanox used to allow to make your own, but they don't<br>

>>>> anymore.<br>

>>>>>><br>

>>>>>>><br>

>>>>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on<br>

>>>>>> pthread_spin_lock<br>

>>>>>>><br>

>>>>>>> you may ask your mellanox representative if they can get you access<br>

>> to<br>

>>>>>> this<br>

>>>>>>> defect. while it was found on ESS , means on PPC64 and with<br>

>> ConnectX-3<br>

>>>>>>> cards its a general issue that affects all cards and on intel as well<br>

>>>> as<br>

>>>>>>> Power.<br>

>>>>>> ok, thanks for this. maybe such a reference is enough for dell to<br>

>> update<br>

>>>>>> their firmware.<br>

>>>>>><br>

>>>>>> stijn<br>

>>>>>><br>

>>>>>>><br>

>>>>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <<br>

>>>> <a href="mailto:stijn.deweirdt@ugent.be" target="_blank">stijn.deweirdt@ugent.be</a>><br>

>>>>>>> wrote:<br>

>>>>>>><br>

>>>>>>>> hi all,<br>

>>>>>>>><br>

>>>>>>>> is there any documentation wrt data integrity in spectrum scale:<br>

>>>>>>>> assuming a crappy network, does gpfs garantee somehow that data<br>

>>>> written<br>

>>>>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from<br>

>> the<br>

>>>>>>>> nsd gpfs daemon to disk.<br>

>>>>>>>><br>

>>>>>>>> and wrt crappy network, what about rdma on crappy network? is it the<br>

>>>>>> same?<br>

>>>>>>>><br>

>>>>>>>> (we are hunting down a crappy infiniband issue; ibm support says<br>

>> it's<br>

>>>>>>>> network issue; and we see no errors anywhere...)<br>

>>>>>>>><br>

>>>>>>>> thanks a lot,<br>

>>>>>>>><br>

>>>>>>>> stijn<br>

>>>>>>>> _______________________________________________<br>

>>>>>>>> gpfsug-discuss mailing list<br>

>>>>>>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>>>>>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>>>>>>><br>

>>>>>>><br>

>>>>>>><br>

>>>>>>><br>

>>>>>>> _______________________________________________<br>

>>>>>>> gpfsug-discuss mailing list<br>

>>>>>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>>>>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>>>>>><br>

>>>>>> _______________________________________________<br>

>>>>>> gpfsug-discuss mailing list<br>

>>>>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>>>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>>>>><br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> _______________________________________________<br>

>>>>> gpfsug-discuss mailing list<br>

>>>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>>>><br>

>>>> _______________________________________________<br>

>>>> gpfsug-discuss mailing list<br>

>>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>>><br>

>>><br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> gpfsug-discuss mailing list<br>

>>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>>><br>

>> _______________________________________________<br>

>> gpfsug-discuss mailing list<br>

>> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

>> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

>><br>

><br>

><br>

><br>

> _______________________________________________<br>

> gpfsug-discuss mailing list<br>

> gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

> <a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

><br>

_______________________________________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

</blockquote></div>