[gpfsug-discuss] Odd behavior with cat followed by grep. (John Hanks)

Steve Xiao sxiao at us.ibm.com
Wed Feb 14 21:53:17 GMT 2018


This could be related to the following flash:

http://www-01.ibm.com/support/docview.wss?uid=ssg1S1012054

You should contact IBM service to obtain the fix for your release.

Steve Y. Xiao

gpfsug-discuss-bounces at spectrumscale.org wrote on 02/14/2018 02:18:02 PM:

> From: gpfsug-discuss-request at spectrumscale.org
> To: gpfsug-discuss at spectrumscale.org
> Date: 02/14/2018 02:18 PM
> Subject: gpfsug-discuss Digest, Vol 73, Issue 36
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> Send gpfsug-discuss mailing list submissions to
>    gpfsug-discuss at spectrumscale.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> or, via email, send a message with subject or body 'help' to
>    gpfsug-discuss-request at spectrumscale.org
> 
> You can reach the person managing the list at
>    gpfsug-discuss-owner at spectrumscale.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Odd behavior with cat followed by grep. (John Hanks)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 14 Feb 2018 11:17:19 -0800
> From: John Hanks <griznog at gmail.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep.
> Message-ID:
>    <CAGrHuK7okijzHLysDNCxR8tn1fKeMDwW_iT5898tBvaCR_QJUg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Thanks Bryan, mystery solved :)
> 
> We also stumbled across these related items, in case anyone else wanders
> into this thread.
> 
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__bug-2Dgrep.gnu.narkive.com_Y8cfvWDt_bug-2D27666-2Dgrep-2Don-2Dgpfs-2Dfilesystem-2Dseek-2Dhole-2Dproblem&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=FgxYBxqHZ0bHdWirEs1U_B3oDpeHJe8iRd-
> TYrXh6FI&e=
> 
> https://www.ibm.com/developerworks/community/forums/html/topic?
> id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630
> 
> GPFS, the gift that keeps on giving ... me more things to do instead of
> doing the things I want to be doing.
> 
> Thanks all,
> 
> jbh
> 
> On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister 
<bbanister at jumptrading.com>
> wrote:
> 
> > Hi all,
> >
> >
> >
> > We found this a while back and IBM fixed it.  Here?s your answer:
> > http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385
> >
> >
> >
> > Cheers,
> >
> > -Bryan
> >
> >
> >
> > *From:* gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-
> > bounces at spectrumscale.org] *On Behalf Of *John Hanks
> > *Sent:* Wednesday, February 14, 2018 12:31 PM
> > *To:* gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by 
grep.
> >
> >
> >
> > *Note: External Email*
> > ------------------------------
> >
> > Straces are interesting, but don't immediately open my eyes:
> >
> >
> >
> > strace of grep on NFS (works as expected)
> >
> >
> >
> > openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
> >
> > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
> >
> > ioctl(3, TCGETS, 0x7ffe2c26b0b0)        = -1 ENOTTY (Inappropriate 
ioctl
> > for device)
> >
> > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
> >
> > lseek(3, 32768, SEEK_HOLE)              = 530721
> >
> > lseek(3, 32768, SEEK_SET)               = 32768
> >
> > fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0
> >
> > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) =
> > 0x7f3bf6c43000
> >
> > write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1
> >
> >
> >
> > strace on GPFS (thinks file is binary)
> >
> >
> >
> > openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3
> >
> > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
> >
> > ioctl(3, TCGETS, 0x7ffc9b52caa0)        = -1 ENOTTY (Inappropriate 
ioctl
> > for device)
> >
> > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
> >
> > lseek(3, 32768, SEEK_HOLE)              = 262144
> >
> > lseek(3, 32768, SEEK_SET)               = 32768
> >
> > fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0
> >
> > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) =
> > 0x7fd45ee88000
> >
> > close(3)                                = 0
> >
> > write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file
> > /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches
> >
> > ) = 72
> >
> >
> >
> > Do the lseek() results indicate that the grep on the GPFS mounted 
version
> > thinks the file is a sparse file? For comparison I strace'd md5sum in 
place
> > of the grep and it does not lseek() with SEEK_HOLE, it's access in 
both
> > cases look identical, like:
> >
> >
> >
> > open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
> >
> > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> >
> > fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
> >
> > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) =
> > 0x7fb7d2c2b000
> >
> > read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
> >
> > ...[reads clipped]...
> >
> > read(3, "", 24576)                      = 0
> >
> > lseek(3, 0, SEEK_CUR)                   = 530721
> >
> > close(3)                                = 0
> >
> >
> >
> >
> >
> > jbh
> >
> >
> >
> >
> >
> > On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister 
<aaron.s.knister at nasa.gov>
> > wrote:
> >
> > Just speculating here (also known as making things up) but I wonder if
> > grep is somehow using the file's size in its determination of binary
> > status. I also see mmap in the strace so maybe there's some issue with
> > mmap where some internal GPFS buffer is getting truncated
> > inappropriately but leaving a bunch of null values which gets returned
> > to grep.
> >
> > -Aaron
> >
> > On 2/14/18 10:21 AM, John Hanks wrote:
> > > Hi Valdis,
> > >
> > > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't 
think
> > > this is a data integrity issue, thankfully:
> > >
> > > $ ./pipetestls.sh
> > > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16
> > > /srv/gsfs0/projects/pipetest.tmp.txt
> > > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16
> > /home/griznog/pipetest.tmp.txt
> > >
> > > $ ./pipetestmd5.sh
> > > 15cb81a85c9e450bdac8230309453a0a 
/srv/gsfs0/projects/pipetest.tmp.txt
> > > 15cb81a85c9e450bdac8230309453a0a  /home/griznog/pipetest.tmp.txt
> > >
> > > And replacing grep with 'file' even properly sees the files as 
ASCII:
> > > $ ./pipetestfile.sh
> > > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long 
lines
> > > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines
> > >
> > > I'll poke a little harder at grep next and see what the difference 
in
> > > strace of each reveals.
> > >
> > > Thanks,
> > >
> > > jbh
> > >
> > >
> > >
> > >
> > > On Wed, Feb 14, 2018 at 7:08 AM, <valdis.kletnieks at vt.edu
> >
> > > <mailto:valdis.kletnieks at vt.edu>> wrote:
> > >
> > >     On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said:
> > >
> > >     > #  ls -aln /srv/gsfs0/projects/pipetest.tmp.txt
> > $HOME/pipetest.tmp.txt
> > >     > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10
> > /home/griznog/pipetest.tmp.txt
> > >     > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10
> > >     > /srv/gsfs0/projects/pipetest.tmp.txt
> > >     >
> > >     > We can "fix" the user case that exposed this by not using a 
temp
> > file or
> > >     > inserting a sleep, but I'd still like to know why GPFS is 
behaving
> > this way
> > >     > and make it stop.
> > >
> > >     May be related to replication, or other behind-the-scenes 
behavior.
> > >
> > >     Consider this example - 4.2.3.6, data and metadata replication 
both
> > >     set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers 
with
> > >     a full
> > >     fiberchannel mesh to 3 Dell MD34something arrays.
> > >
> > >     % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls 
sync.test;
> > >     sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test
> > >     4096+0 records in
> > >     4096+0 records out
> > >     4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s
> > >     2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> > >     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> > >     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> > >
> > >     Notice that the first /bin/ls shouldn't be starting until after 
the
> > >     dd has
> > >     completed - at which point it's only allocated half the blocks
> > >     needed to hold
> > >     the 4M of data at one site.  5 seconds later, it's allocated the
> > >     blocks at both
> > >     sites and thus shows the full 8M needed for 2 copies.
> > >
> > >     I've also seen (but haven't replicated it as I write this) a 
small
> > >     file (4-8K
> > >     or so) showing first one full-sized block, then a second 
full-sized
> > >     block, and
> > >     then dropping back to what's needed for 2 1/32nd fragments. That
> > had me
> > >     scratching my head
> > >
> > >     Having said that, that's all metadata fun and games, while your 
case
> > >     appears to have some problems with data integrity (which is a 
whole
> > lot
> > >     scarier).  It would be *really* nice if we understood the 
problem
> > here.
> > >
> > >     The scariest part is:
> > >
> > >     > The first grep | wc -l returns 1, because grep outputs "Binary
> > file /path/to/
> > >     > gpfs/mount/test matches"
> > >
> > >     which seems to be implying that we're failing on semantic
> > consistency.
> > >     Basically, your 'cat' command is completing and closing the 
file,
> > >     but then a
> > >     temporally later open of the same find is reading something 
other
> > >     that only the
> > >     just-written data.  My first guess is that it's a race condition
> > >     similar to the
> > >     following: The cat command is causing a write on one NSD server, 
and
> > >     the first
> > >     grep results in a read from a *different* NSD server, returning 
the
> > >     data that
> > >     *used* to be in the block because the read actually happens 
before
> > >     the first
> > >     NSD server actually completes the write.
> > >
> > >     It may be interesting to replace the grep's with pairs of 'ls 
-ls /
> > >     dd' commands to grab the
> > >     raw data and its size, and check the following:
> > >
> > >     1) does the size (both blocks allocated and logical length) 
reported
> > by
> > >     ls match the amount of data actually read by the dd?
> > >
> > >     2) Is the file length as actually read equal to the written 
length,
> > >     or does it
> > >     overshoot and read all the way to the next block boundary?
> > >
> > >     3) If the length is correct, what's wrong with the data that's
> > >     telling grep that
> > >     it's a binary file?  ( od -cx is your friend here).
> > >
> > >     4) If it overshoots, is the remainder all-zeros (good) or does 
it
> > >     return semi-random
> > >     "what used to be there" data (bad, due to data exposure issues)?
> > >
> > >     (It's certainly not the most perplexing data consistency issue 
I've
> > >     hit in 4 decades - the
> > >     winner *has* to be a intermittent data read corruption on a GPFS 
3.5
> > >     cluster that
> > >     had us, IBM, SGI, DDN, and at least one vendor of networking 
gear
> > >     all chasing our
> > >     tails for 18 months before we finally tracked it down. :)
> > >
> > >     _______________________________________________
> > >     gpfsug-discuss mailing list
> >
> > >     gpfsug-discuss at spectrumscale.org <https://
> urldefense.proofpoint.com/v2/url?
> u=http-3A__spectrumscale.org&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=jUBFb8C9yai1TUTu1BVnNTNcOnJXGxupWiEKkEjT4pM&e=
> >
> > >     https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> > >     <https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org
> > > https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> > >
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> >
> >
> >
> > ------------------------------
> >
> > Note: This email is for the confidential use of the named addressee(s)
> > only and may contain proprietary, confidential or privileged 
information.
> > If you are not the intended recipient, you are hereby notified that 
any
> > review, dissemination or copying of this email is strictly prohibited, 
and
> > to please notify the sender immediately and destroy this email and any
> > attachments. Email transmission cannot be guaranteed to be secure or
> > error-free. The Company, therefore, does not make any guarantees as to 
the
> > completeness or accuracy of this email or any attachments. This email 
is
> > for informational purposes only and does not constitute a 
recommendation,
> > offer, request or solicitation of any kind to buy, sell, subscribe, 
redeem
> > or perform any type of transaction of a financial product.
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20180214_d62fc203_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=nUcKIKr84CRhS0EbxV5vwjSlEr4p3Wf6Is3EDKvOjJg&e=
> >
> 
> ------------------------------
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=ck4PYlaRFvCcNKlfHMPhoA&m=Jv87Pffe4kSlhiO2NmMbL4HQo_zJ-8s8CkIRy7p92r4&s=aVWMptxcCR3po3ijmRweTyjbs1Pp5D7WEiJTYvSYLUk&e=
> 
> 
> End of gpfsug-discuss Digest, Vol 73, Issue 36
> **********************************************
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180214/ed757838/attachment-0001.htm>


More information about the gpfsug-discuss mailing list