<div dir="ltr">Hi Valdis,<div><br></div><div>I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully:</div><div><br></div><div><div>$ ./pipetestls.sh </div><div>256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt</div><div>0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt</div><div><br></div><div>$ ./pipetestmd5.sh </div><div>15cb81a85c9e450bdac8230309453a0a  /srv/gsfs0/projects/pipetest.tmp.txt</div><div>15cb81a85c9e450bdac8230309453a0a  /home/griznog/pipetest.tmp.txt</div></div><div><br></div><div>And replacing grep with 'file' even properly sees the files as ASCII:</div><div><div>$ ./pipetestfile.sh </div><div>/srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines</div><div>/home/griznog/pipetest.tmp.txt: ASCII text, with very long lines</div></div><div><br></div><div>I'll poke a little harder at grep next and see what the difference in strace of each reveals.</div><div><br></div><div>Thanks,</div><div><br></div><div>jbh</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 14, 2018 at 7:08 AM,  <span dir="ltr"><<a href="mailto:valdis.kletnieks@vt.edu" target="_blank">valdis.kletnieks@vt.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said:<br>

<br>

> #  ls -aln /srv/gsfs0/projects/pipetest.<wbr>tmp.txt $HOME/pipetest.tmp.txt<br>

> -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 /home/griznog/pipetest.tmp.txt<br>

> -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10<br>

> /srv/gsfs0/projects/pipetest.<wbr>tmp.txt<br>

><br>

> We can "fix" the user case that exposed this by not using a temp file or<br>

> inserting a sleep, but I'd still like to know why GPFS is behaving this way<br>

> and make it stop.<br>

<br>

</span>May be related to replication, or other behind-the-scenes behavior.<br>

<br>

Consider this example - 4.2.3.6, data and metadata replication both<br>

set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full<br>

fiberchannel mesh to 3 Dell MD34something arrays.<br>

<br>

% dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test<br>

4096+0 records in<br>

4096+0 records out<br>

4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s<br>

2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test<br>

8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test<br>

8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test<br>

<br>

Notice that the first /bin/ls shouldn't be starting until after the dd has<br>

completed - at which point it's only allocated half the blocks needed to hold<br>

the 4M of data at one site.  5 seconds later, it's allocated the blocks at both<br>

sites and thus shows the full 8M needed for 2 copies.<br>

<br>

I've also seen (but haven't replicated it as I write this) a small file (4-8K<br>

or so) showing first one full-sized block, then a second full-sized block, and<br>

then dropping back to what's needed for 2 1/32nd fragments.  That had me<br>

scratching my head<br>

<br>

Having said that, that's all metadata fun and games, while your case<br>

appears to have some problems with data integrity (which is a whole lot<br>

scarier).  It would be *really* nice if we understood the problem here.<br>

<br>

The scariest part is:<br>

<span class=""><br>

> The first grep | wc -l returns 1, because grep outputs  "Binary file /path/to/<br>

> gpfs/mount/test matches"<br>

<br>

</span>which seems to be implying that we're failing on semantic consistency.<br>

Basically, your 'cat' command is completing and closing the file, but then a<br>

temporally later open of the same find is reading something other that only the<br>

just-written data.  My first guess is that it's a race condition similar to the<br>

following: The cat command is causing a write on one NSD server, and the first<br>

grep results in a read from a *different* NSD server, returning the data that<br>

*used* to be in the block because the read actually happens before the first<br>

NSD server actually completes the write.<br>

<br>

It may be interesting to replace the grep's with pairs of 'ls -ls / dd' commands to grab the<br>

raw data and its size, and check the following:<br>

<br>

1) does the size (both blocks allocated and logical length) reported by<br>

ls match the amount of data actually read by the dd?<br>

<br>

2) Is the file length as actually read equal to the written length, or does it<br>

overshoot and read all the way to the next block boundary?<br>

<br>

3) If the length is correct, what's wrong with the data that's telling grep that<br>

it's a binary file?  ( od -cx is your friend here).<br>

<br>

4) If it overshoots, is the remainder all-zeros (good) or does it return semi-random<br>

"what used to be there" data (bad, due to data exposure issues)?<br>

<br>

(It's certainly not the most perplexing data consistency issue I've hit in 4 decades - the<br>

winner *has* to be a intermittent data read corruption on a GPFS 3.5 cluster that<br>

had us, IBM, SGI, DDN, and at least one vendor of networking gear all chasing our<br>

tails for 18 months before we finally tracked it down. :)<br>

<br>______________________________<wbr>_________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/<wbr>listinfo/gpfsug-discuss</a><br>

<br></blockquote></div><br></div>