[gpfsug-discuss] Odd behavior with cat followed by grep.

John Hanks griznog at gmail.com
Wed Feb 14 15:21:52 GMT 2018


Hi Valdis,

I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think
this is a data integrity issue, thankfully:

$ ./pipetestls.sh
256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16
/srv/gsfs0/projects/pipetest.tmp.txt
0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt

$ ./pipetestmd5.sh
15cb81a85c9e450bdac8230309453a0a  /srv/gsfs0/projects/pipetest.tmp.txt
15cb81a85c9e450bdac8230309453a0a  /home/griznog/pipetest.tmp.txt

And replacing grep with 'file' even properly sees the files as ASCII:
$ ./pipetestfile.sh
/srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines
/home/griznog/pipetest.tmp.txt: ASCII text, with very long lines

I'll poke a little harder at grep next and see what the difference in
strace of each reveals.

Thanks,

jbh




On Wed, Feb 14, 2018 at 7:08 AM, <valdis.kletnieks at vt.edu> wrote:

> On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said:
>
> > #  ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt
> > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10
> /home/griznog/pipetest.tmp.txt
> > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10
> > /srv/gsfs0/projects/pipetest.tmp.txt
> >
> > We can "fix" the user case that exposed this by not using a temp file or
> > inserting a sleep, but I'd still like to know why GPFS is behaving this
> way
> > and make it stop.
>
> May be related to replication, or other behind-the-scenes behavior.
>
> Consider this example - 4.2.3.6, data and metadata replication both
> set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full
> fiberchannel mesh to 3 Dell MD34something arrays.
>
> % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep
> 5; ls -ls sync.test; sleep 5; ls -ls sync.test
> 4096+0 records in
> 4096+0 records out
> 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s
> 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
>
> Notice that the first /bin/ls shouldn't be starting until after the dd has
> completed - at which point it's only allocated half the blocks needed to
> hold
> the 4M of data at one site.  5 seconds later, it's allocated the blocks at
> both
> sites and thus shows the full 8M needed for 2 copies.
>
> I've also seen (but haven't replicated it as I write this) a small file
> (4-8K
> or so) showing first one full-sized block, then a second full-sized block,
> and
> then dropping back to what's needed for 2 1/32nd fragments.  That had me
> scratching my head
>
> Having said that, that's all metadata fun and games, while your case
> appears to have some problems with data integrity (which is a whole lot
> scarier).  It would be *really* nice if we understood the problem here.
>
> The scariest part is:
>
> > The first grep | wc -l returns 1, because grep outputs  "Binary file
> /path/to/
> > gpfs/mount/test matches"
>
> which seems to be implying that we're failing on semantic consistency.
> Basically, your 'cat' command is completing and closing the file, but then
> a
> temporally later open of the same find is reading something other that
> only the
> just-written data.  My first guess is that it's a race condition similar
> to the
> following: The cat command is causing a write on one NSD server, and the
> first
> grep results in a read from a *different* NSD server, returning the data
> that
> *used* to be in the block because the read actually happens before the
> first
> NSD server actually completes the write.
>
> It may be interesting to replace the grep's with pairs of 'ls -ls / dd'
> commands to grab the
> raw data and its size, and check the following:
>
> 1) does the size (both blocks allocated and logical length) reported by
> ls match the amount of data actually read by the dd?
>
> 2) Is the file length as actually read equal to the written length, or
> does it
> overshoot and read all the way to the next block boundary?
>
> 3) If the length is correct, what's wrong with the data that's telling
> grep that
> it's a binary file?  ( od -cx is your friend here).
>
> 4) If it overshoots, is the remainder all-zeros (good) or does it return
> semi-random
> "what used to be there" data (bad, due to data exposure issues)?
>
> (It's certainly not the most perplexing data consistency issue I've hit in
> 4 decades - the
> winner *has* to be a intermittent data read corruption on a GPFS 3.5
> cluster that
> had us, IBM, SGI, DDN, and at least one vendor of networking gear all
> chasing our
> tails for 18 months before we finally tracked it down. :)
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180214/28e0aa8e/attachment-0002.htm>


More information about the gpfsug-discuss mailing list