[gpfsug-discuss] Associating I/O operations with files/processes

Sven Oehme oehmes at gmail.com
Tue May 30 14:55:30 BST 2017


Hi,

the very first thing to do would be to do a mmfsadm dump iohist instead of
mmdiag --iohist one time (we actually add this info in next release to
mmdiag --iohist) to see if the thread type will reveal something :

07:25:53.578522  W        data    1:20260249600   8192   35.930    488076
    181  C0A70D0A:59076980 cli  192.167.20.129 Prefetch
 WritebehindWorkerThread
07:25:53.632722  W        data    1:20260257792   8192   45.179    627136
    173  C0A70D0A:59076980 cli  192.167.20.129 Cleaner   CleanBufferThread
07:25:53.662067  W        data    2:20259815424   8192   45.612 992975086
     40  C0A70D0A:59076985 cli  192.167.20.130 Prefetch
 WritebehindWorkerThread
07:25:53.734274  W        data    1:19601858560      8    0.624     50237
      0  C0A70D0A:59076980 cli  192.167.20.129 MBHandler *DioHandlerThread*

if you see DioHandlerThread most likely somebody changed a openflag to use
O_DIRECT . if you don't use that flag even the app does only 4k i/o which
is inefficient GPFS will detect this and do prefetch writebehind in large
blocks, as soon as you add O_DIRECT, we don't do this anymore to honor the
hint and then every single request gets handled one by one.

after that the next thing would be to run a very low level trace with just
IO infos like :

mmtracectl --start --trace=io --tracedev-write-mode=overwrite  -N .

this will start collection on  the node you execute the command if you want
to run it against a different node replace the dot at the end with the
hostname .
wait a few seconds and run

mmtracectl --off

you will get a message that the trace gets formated and a eventually a
trace file .
now grep for FIO and you get lines like this :


   7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize 64
1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 nSectors
32768 err 0

if you further reduce it to nSectors 8 you would focus only on your 4k
writes you mentioned above.

the key item in the line above you care about is tag 16... this is the
inode number of your file.
if you now do :

cd /usr/lpp/mmfs/samples/util ; make
then run (replace -i and filesystem path obviously)

[root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/

and you get a hit like this :

   1670183               0      /ibm/fs2-16m-09//shared/test-newbuf

you now know the file that is being accessed in the I/O example above is
 /ibm/fs2-16m-09//shared/test-newbuf

hope that helps.

sven




On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) <
andreas.petzold at kit.edu> wrote:

>         Hi John,
>
> iotop wasn't helpful. It seems to be overwhelmed by what is going on on
> the machine.
>
>         Cheers,
>
>                 Andreas
>
> On 05/30/2017 02:28 PM, John Hearns wrote:
> > Andreas,
> > This is a stupid reply, but please bear with me.
> > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS
> filesystem) setup.
> > We also had a new application which did post-processing One of the users
> reported that a post-processing job would take about 30 minutes.
> > However when two or more of the same application were running the job
> would take several hours.
> >
> > We finally found that this slowdown was due to the IO size, the
> application was using the default size.
> > We only found this by stracing the application and spending hours
> staring at the trace...
> >
> > I am sure there are better tools for this, and I do hope you don’t have
> to strace every application.... really.
> > A good tool to get a general feel for IO pattersn is 'iotop'. It might
> help?
> >
> >
> >
> >
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:
> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold
> (SCC)
> > Sent: Tuesday, May 30, 2017 2:17 PM
> > To: gpfsug-discuss at spectrumscale.org
> > Subject: [gpfsug-discuss] Associating I/O operations with files/processes
> >
> > Dear group,
> >
> > first a quick introduction: at KIT we are running a 20+PB storage system
> with several large (1-9PB) file systems. We have a 14 node NSD server
> cluster and 5 small (~10 nodes) protocol node clusters which each mount one
> of the file systems. The protocol nodes run server software (dCache,
> xrootd) specific to our users which primarily are the LHC experiments at
> CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB,
> while the protocol nodes communicate via Ethernet to their clients.
> >
> > Now let me describe the problem we are facing. Since a few days, one of
> the protocol nodes shows a very strange and as of yet unexplained I/O
> behaviour. Before we were usually seeing reads like this (iohist example
> from a well behaved node):
> >
> > 14:03:37.637526  R        data   32:138835918848  8192   46.626  cli
> 0A417D79:58E3B179    172.18.224.19
> > 14:03:37.660177  R        data   18:12590325760   8192   25.498  cli
> 0A4179AD:58E3AE66    172.18.224.14
> > 14:03:37.640660  R        data   15:106365067264  8192   45.682  cli
> 0A4179AD:58E3ADD7    172.18.224.14
> > 14:03:37.657006  R        data   35:130482421760  8192   30.872  cli
> 0A417DAD:58E3B266    172.18.224.21
> > 14:03:37.643908  R        data   33:107847139328  8192   45.571  cli
> 0A417DAD:58E3B206    172.18.224.21
> >
> > Since a few days we see this on the problematic node:
> >
> > 14:06:27.253537  R        data   46:126258287872     8   15.474  cli
> 0A4179AB:58E3AE54    172.18.224.13
> > 14:06:27.268626  R        data   40:137280768624     8    0.395  cli
> 0A4179AD:58E3ADE3    172.18.224.14
> > 14:06:27.269056  R        data   46:56452781528      8    0.427  cli
> 0A4179AB:58E3AE54    172.18.224.13
> > 14:06:27.269417  R        data   47:97273159640      8    0.293  cli
> 0A4179AD:58E3AE5A    172.18.224.14
> > 14:06:27.269293  R        data   49:59102786168      8    0.425  cli
> 0A4179AD:58E3AE72    172.18.224.14
> > 14:06:27.269531  R        data   46:142387326944     8    0.340  cli
> 0A4179AB:58E3AE54    172.18.224.13
> > 14:06:27.269377  R        data   28:102988517096     8    0.554  cli
> 0A417879:58E3AD08    172.18.224.10
> >
> > The number of read ops has gone up by O(1000) which is what one would
> expect when going from 8192 sector reads to 8 sector reads.
> >
> > We have already excluded problems of node itself so we are focusing on
> the applications running on the node. What we'd like to to is to associate
> the I/O requests either with files or specific processes running on the
> machine in order to be able to blame the correct application. Can somebody
> tell us, if this is possible and if now, if there are other ways to
> understand what application is causing this?
> >
> > Thanks,
> >
> > Andreas
> >
> > --
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170530/1276b629/attachment-0002.htm>


More information about the gpfsug-discuss mailing list