<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Little clarification:<br>

    Our ls its plain ls, there is no alias. <br>

    <br>

    Consider that all those things are already set up properly as EBI

    run hi computing farms from many years, so those things are already

    fixed loong time ago. We have very little experience with GPFS, but

    good knowledge with LSF farms and own multiple NFS stotages (

    several petabyte sized). about NIS, all clients run NSCD that cashes

    all  informations to avoid such tipe of slownes, in fact then ls

    isslow, also ls -n is slow.<br>

    <br>

    Beside that,  also a "cd" sometimes hangs, so it have nothing to do

    with getting attributes.<br>

    <br>

    Just to clarify a bit more. Now GSS usually seems working fine, we

    have users that run jobs on the farms that pushes 180Gb/s read (

    reading and writing files of 100GB size). GPFS works very well

    there, where other systems had performance problems accessing

    portion of data in so huge files.<br>

    <br>

    Sadly, on the other hand, other users run jobs that do suge ammount

    of metadata operations, like toons of ls in directory with many

    files, or creating a silly amount of temporary files just to

    synchronize the jobs between the farm nodes, or just to store

    temporary data for few milliseconds and them immediately delete

    those temporary files. Imagine to create constantly thousands files

    just to write few bytes and they delete them after few

    milliseconds...<br>

    <br>

    When those thing happens we see 10-15Gb/sec throughput, low CPU

    usage on the server ( 80% iddle), but any cd, or ls or wathever

    takes few seconds. So my question is, if the bottleneck could be the

    spindles, or if the clients could be tuned a bit more?<br>

    <br>

    I read your PDF and all the paramenters seems already well

    configured  except "maxFilesToCache", but  I'm not sure how we

    should configure few of those  parameters on the clients. As an

    example I cannot immagine a client that require 38g pagepool size.<br>

    <br>

    so what's the correct <b>pagepool</b> on a client? what about those

    others?<br>

    <br>

    <b>maxFilestoCache</b><b><br>

    </b><b>maxBufferdescs</b><b><br>

    </b><b>worker1threads</b><b><br>

    </b><b>worker3threads</b><br>

    <br>

    Right now all the clients have 1 GB pagepool size. In theory, we can

    afford to use more ( i thing we can easily go up to 8GB) as they

    have plenty or available memory. If this could help, we can do that,

    but the client really really need more than 1G? They are just

    clients after all, so the memory in theory should be used for jobs

    not just for "caching".<br>

    <br>

    Last question about "maxFIlesToCache" you say that must be large on

    small cluster but small on large clusters. What do you consider 6

    servers and  almost 700 clients?<br>

    <br>

    on clienst we have:<br>

       maxFilesToCache 4000<br>

    <br>

    on servers we have<br>

      maxFilesToCache 12288<br>

    <br>

    <br>

    <br>

    Regards,<br>

    Salvatore<br>

    <div class="moz-cite-prefix">On 05/09/14 01:48, Sven Oehme wrote:<br>

    </div>

    <blockquote

cite="mid:OF9ACC3F87.D8E14E88-ON88257D4A.0002DEA2-88257D4A.000474AF@us.ibm.com"

      type="cite"><font face="sans-serif" size="2">------------------------------------------<br>

        Sven Oehme <br>

        Scalable Storage Research <br>

        email: <a class="moz-txt-link-abbreviated" href="mailto:oehmes@us.ibm.com">oehmes@us.ibm.com</a> <br>

        Phone: +1 (408) 824-8904 <br>

        IBM Almaden Research Lab <br>

        ------------------------------------------</font>

      <br>

      <br>

      <tt><font size="2"><a class="moz-txt-link-abbreviated" href="mailto:gpfsug-discuss-bounces@gpfsug.org">gpfsug-discuss-bounces@gpfsug.org</a> wrote on

          09/04/2014

          03:43:36 AM:<br>

          <br>

          > From: Salvatore Di Nardo <a class="moz-txt-link-rfc2396E" href="mailto:sdinardo@ebi.ac.uk"><sdinardo@ebi.ac.uk></a></font></tt>

      <br>

      <tt><font size="2">> To: gpfsug main discussion list

          <a class="moz-txt-link-rfc2396E" href="mailto:gpfsug-discuss@gpfsug.org"><gpfsug-discuss@gpfsug.org></a></font></tt>

      <br>

      <tt><font size="2">> Date: 09/04/2014 03:44 AM</font></tt>

      <br>

      <tt><font size="2">> Subject: Re: [gpfsug-discuss] gpfs

          performance

          monitoring</font></tt>

      <br>

      <tt><font size="2">> Sent by: <a class="moz-txt-link-abbreviated" href="mailto:gpfsug-discuss-bounces@gpfsug.org">gpfsug-discuss-bounces@gpfsug.org</a></font></tt>

      <br>

      <tt><font size="2">> <br>

          > On 04/09/14 01:50, Sven Oehme wrote:</font></tt>

      <br>

      <tt><font size="2">> > Hello everybody,<br>

          > <br>

          > Hi <br>

          > <br>

          > > here i come here again, this time to ask some hint

          about how

          to <br>

          > monitor GPFS.<br>

          > > <br>

          > > I know about mmpmon, but the issue with its

          "fs_io_s"

          and "io_s" is <br>

          > > that they return number based only on the request

          done in the

          <br>

          > > current host, so i have to run them on all the

          clients ( over

          600 <br>

          > > nodes) so its quite unpractical.  Instead i would

          like to

          know from <br>

          > > the servers whats going on, and i came across the

          vio_s statistics

          <br>

          > > wich are less documented and i dont know exacly what

          they mean.

          <br>

          > > There is also this script

          "/usr/lpp/mmfs/samples/vdisk/viostat"

          that<br>

          > > runs VIO_S.<br>

          > > <br>

          > > My problems with the output of this command: <br>

          > >  echo "vio_s" | /usr/lpp/mmfs/bin/mmpmon -r 1<br>

          > > <br>

          > > mmpmon> mmpmon node 10.7.28.2 name gss01a vio_s

          OK VIOPS per

          second<br>

          > > timestamp:                         

          1409763206/477366<br>

          > > recovery group:                    

          *<br>

          > > declustered array:                 

          *<br>

          > > vdisk:                             

          *<br>

          > > client reads:                         

          2584229<br>

          > > client short writes:                 

          55299693<br>

          > > client medium writes:                  

          190071<br>

          > > client promoted full track writes:     

          465145<br>

          > > client full track writes:                

          9249<br>

          > > flushed update writes:                

          4187708<br>

          > > flushed promoted full track writes:       

          123<br>

          > > migrate operations:                       

          114<br>

          > > scrub operations:                      

          450590<br>

          > > log writes:                          

          28509602 <br>

          > > <br>

          > > it sais "VIOPS per second", but they seem to me just

          counters as <br>

          > > every time i re-run the command, the numbers

          increase by a bit.. 

          <br>

          > > Can anyone confirm if those numbers are counter or

          if they are

          OPS/sec.<br>

          > <br>

          > the numbers are accumulative so everytime you run them

          they just <br>

          > show the value since start (or last reset) time. </font></tt>

      <br>

      <tt><font size="2">> OK, you confirmed my toughts, thatks<br>

        </font></tt>

      <br>

      <tt><font size="2">> <br>

          > > <br>

          > > On a closer eye about i dont understand what most of

          thosevalues

          <br>

          > > mean. For example, what exacly are "flushed promoted

          full

          track write" ?? <br>

          > > I tried to find a documentation about this output ,

          but could

          not <br>

          > > find any. can anyone point me a link where output of

          vio_s is

          explained?<br>

          > > <br>

          > > Another thing i dont understand about those numbers

          is if they

          are <br>

          > > just operations, or the number of blocks that was

          read/write/etc

          . <br>

          > <br>

          > its just operations and if i would explain what the

          numbers mean i

          <br>

          > might confuse you even more because this is not what you

          are really

          <br>

          > looking for. <br>

          > what you are looking for is what the client io's look

          like on the

          <br>

          > Server side, while the VIO layer is the Server side to

          the disks,

          so<br>

          > one lever lower than what you are looking for from what i

          could read<br>

          > out of the description above.  </font></tt>

      <br>

      <tt><font size="2">> No.. what I'm looking its exactly how the

          disks

          are busy to keep the<br>

          > requests. Obviously i'm not looking just that, but I feel

          the needs

          <br>

          > to monitor also those things. Ill explain you why. <br>

          > <br>

          > It happens when our storage is quite busy ( 180Gb/s of

          read/write

          ) <br>

          > that the FS start to be slowin normal cd or ls requests.

          This might

          <br>

          > be normal, but in those situation i want to know where

          the <br>

          > bottleneck is. Is the server CPU? Memory? Network?

          Spindles? knowing<br>

          > where the bottlenek is might help me to understand if we

          can tweak

          <br>

          > the system a bit more.<br>

        </font></tt>

      <br>

      <tt><font size="2">if cd or ls is very slow in GPFS in the

           majority

          of the cases it has nothing to do with NSD Server bottlenecks,

          only indirect.

        </font></tt>

      <br>

      <tt><font size="2">the main reason ls is slow in the field is you

          have

          some very powerful nodes that all do buffered writes into the

          same directory

          into 1 or multiple files while you do the ls on a different

          node. what

          happens now is that the ls you did run most likely is a alias

          for ls -l

          or something even more complex with color display, etc, but

          the point is

          it most likely returns file size. GPFS doesn't lie about the

          filesize,

          we only return accurate stat informations and while this is

          arguable, its

          a fact today. </font></tt>

      <br>

      <tt><font size="2">so what happens is that the stat on each file

          triggers

          a token  revoke on the node that currently writing to the file

          you

          do stat on, lets say it has 1 gb of dirty data in its memory

          for this file

          (as its writes data buffered) this 1 GB of data now gets

          written to the

          NSD server, the client updates the inode info and returns the

          correct size.

        </font></tt>

      <br>

      <tt><font size="2">lets say you have very fast network and you

          have a

          fast storage device like GSS (which i see you have) it will be

          able to

          do this in a few 100 ms, but the problem is this now happens

          serialized

          for each single file in this directory that people  write into

          as

          for each we need to get the exact stat info to satisfy your ls

          -l request.

        </font></tt>

      <br>

      <tt><font size="2">this is what takes so long, not the fact that

          the

          storage device might be slow or to much metadata activity is

          going on ,

          this is token , means network traffic and obviously latency

          dependent.

        </font></tt>

      <br>

      <br>

      <tt><font size="2">the best way to see this is to look at waiters

          on

          the client where you run the ls and see what they are waiting

          for.</font></tt>

      <br>

      <br>

      <tt><font size="2">there are various ways to tune this to get

          better

          'felt' ls responses but its not completely going away </font></tt>

      <br>

      <tt><font size="2">if all you try to with ls is if there is a file

          in

          the directory run unalias ls and check if ls after that runs

          fast as it

          shouldn't do the -l under the cover anymore. </font></tt>

      <br>

      <br>

      <tt><font size="2">> <br>

          > If its the CPU on the servers then there is no much to do

          beside <br>

          > replacing or add more servers.If its not the CPU, maybe

          more memory

          <br>

          > would help? Maybe its just the network that filled up? so

          i can add

          <br>

          > more links <br>

          > <br>

          > Or if we reached the point there the bottleneck its the

          spindles,

          <br>

          > then there is no much point o look somethere else, we

          just reached

          <br>

          > the hardware limit..<br>

          > <br>

          > Sometimes, it also happens that there is very low IO

          (10Gb/s ), <br>

          > almost no cpu usage on the servers but huge slownes ( ls

          can take

          10<br>

          > seconds).  Why that happens? There is not much data ops ,

          but

          we <br>

          > think there is a huge ammount of metadata ops. So what i

          want to <br>

          > know is if the metadata vdisks are busy or not. If this

          is our <br>

          > problem, could some SSD disks dedicated to metadata help?

          <br>

        </font></tt>

      <br>

      <tt><font size="2">the answer if ssd's would help or not are hard

          to

          say without knowing the root case and as i tried to explain

          above the most

          likely case is token revoke, not disk i/o. obviously as more

          busy your

          disks are as longer the token revoke will take.  </font></tt>

      <br>

      <br>

      <tt><font size="2">> <br>

          > <br>

          > In particular im, a bit puzzled with the design of our

          GSS storage.<br>

          > Each recovery groups have 3 declustered arrays, and each

          declustered<br>

          > aray have 1 data and 1 metadata vdisk, but in the end

          both metadata

          <br>

          > and data vdisks use the same spindles. The problem that,

          its that

          I <br>

          > dont understand if we have a metadata bottleneck there.

          Maybe some

          <br>

          > SSD disks in a dedicated declustered array would perform

          much <br>

          > better, but this is just theory. I really would like to

          be able to

          <br>

          > monitor IO activities on the metadata vdisks.<br>

        </font></tt>

      <br>

      <tt><font size="2">the short answer is we WANT the metadata disks

          to

          be with the data disks on the same spindles. compared to other

          storage

          systems, GSS is capable to handle different raid codes for

          different virtual

          disks on the same physical disks, this way we create raid1'ish

          'LUNS' for

          metadata and raid6'is 'LUNS' for data so the small i/o penalty

          for a metadata

          is very small compared to a read/modify/write on the data

          disks.</font></tt>

      <br>

      <br>

      <tt><font size="2">> <br>

          > <br>

        </font></tt>

      <br>

      <tt><font size="2">> <br>

          > <br>

          > so the Layer you care about is the NSD Server layer,

          which sits on

          <br>

          > top of the VIO layer (which is essentially the SW RAID

          Layer in GNR)

          <br>

          > <br>

          > > I'm asking that because if they are just ops, i

          don't know how

          much <br>

          > > they could be usefull. For example one write

          operation could

          eman <br>

          > > write 1 block or write a file of 100GB. If those are

          oprations,

          <br>

          > > there is a way to have the oupunt in bytes or

          blocks? <br>

          > <br>

          > there are multiple ways to get infos on the NSD layer,

          one would be

          <br>

          > to use the dstat plugin (see /usr/lpp/mmfs/sample/util)

          but thats

          <br>

          > counts again. </font></tt>

      <br>

      <tt><font size="2">> <br>

          > Counters its not a problem. I can collect them and create

          some <br>

          > graphs in a monitoring tool. I will check that.<br>

        </font></tt>

      <br>

      <tt><font size="2">if you (let) upgrade your system to GSS 2.0 you

          get

          a graphical monitoring as part of it. if you want i can send

          you some direct

          email outside the group with additional informations on that.

        </font></tt>

      <br>

      <br>

      <tt><font size="2">> <br>

          > the alternative option is to use mmdiag --iohist. this

          shows you a

          <br>

          > history of the last X numbers of io operations on either

          the client

          <br>

          > or the server side like on a client : <br>

          > <br>

          > # mmdiag --iohist <br>

          > <br>

          > === mmdiag: iohist === <br>

          > <br>

          > I/O history: <br>

          > <br>

          >  I/O start time RW    Buf type disk:sectorNum  

            nSec  time ms <br>

          > qTime ms       RpcTimes ms  Type  Device/NSD

          ID         NSD server <br>

          > --------------- -- ----------- ----------------- -----

           -------

          <br>

          > -------- -----------------  ---- ------------------

          ---------------

          <br>

          > 14:25:22.169617  R  LLIndBlock    1:1075622848

               64   13.073   <br>

          >  0.000   12.959    0.063  cli   C0A70401:53BEEA7F

              192.167.4.1 <br>

          > 14:25:22.182723  R       inode    1:1071252480

                8    6.970   <br>

          >  0.000    6.908    0.038  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:53.659918  R  LLIndBlock    1:1081202176

               64    8.309   <br>

          >  0.000    8.210    0.046  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:53.668262  R       inode    2:1081373696

                8   14.117   <br>

          >  0.000   14.032    0.058  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:53.682750  R  LLIndBlock    1:1065508736

               64    9.254   <br>

          >  0.000    9.180    0.038  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:53.692019  R       inode    2:1064356608

                8   14.899   <br>

          >  0.000   14.847    0.029  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:53.707100  R       inode    2:1077830152

                8   16.499   <br>

          >  0.000   16.449    0.025  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:53.723788  R  LLIndBlock    1:1081202432

               64    4.280   <br>

          >  0.000    4.203    0.040  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:53.728082  R       inode    2:1081918976

                8    7.760   <br>

          >  0.000    7.710    0.027  cli  

          C0A70402:53BEEA5E     192.167.4.2 <br>

          > 14:25:57.877416  R    metadata    2:678978560

                16   13.343   <br>

          >  0.000   13.254    0.053  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:57.891048  R  LLIndBlock    1:1065508608

               64   15.491   <br>

          >  0.000   15.401    0.058  cli   C0A70401:53BEEA7F

              192.167.4.1 <br>

          > 14:25:57.906556  R       inode    2:1083476520

                8   11.723   <br>

          >  0.000   11.676    0.029  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:57.918516  R  LLIndBlock    1:1075622720

               64    8.062   <br>

          >  0.000    8.001    0.032  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:57.926592  R       inode    1:1076503480

                8    8.087   <br>

          >  0.000    8.043    0.026  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:57.934856  R  LLIndBlock    1:1071088512

               64    6.572   <br>

          >  0.000    6.510    0.033  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:57.941441  R       inode    2:1069885984

                8   11.686   <br>

          >  0.000   11.641    0.024  cli   C0A70402:53BEEA5E

              192.167.4.2 <br>

          > 14:25:57.953294  R       inode    2:1083476936

                8    8.951   <br>

          >  0.000    8.912    0.021  cli  

          C0A70402:53BEEA5E     192.167.4.2 <br>

          > 14:25:57.965475  R       inode    1:1076503504

                8    0.477   <br>

          >  0.000    0.053    0.000  cli  

          C0A70401:53BEEA7F     192.167.4.1 <br>

          > 14:25:57.965755  R       inode    2:1083476488

                8    0.410   <br>

          >  0.000    0.061    0.321  cli  

          C0A70402:53BEEA5E     192.167.4.2 <br>

          > 14:25:57.965787  R       inode    2:1083476512

                8    0.439   <br>

          >  0.000    0.053    0.342  cli  

          C0A70402:53BEEA5E     192.167.4.2 <br>

          > <br>

          > you basically see if its a inode , data block , what size

          it has (in<br>

          > sectors) , which nsd server you did send this request to,

          etc. <br>

          > <br>

          > on the Server side you see the type , which physical disk

          it goes

          to<br>

          > and also what size of disk i/o it causes like :   <br>

          > <br>

          > 14:26:50.129995  R       inode   12:3211886376

               64   14.261   <br>

          >  0.000    0.000    0.000  pd  

          sdis <br>

          > 14:26:50.137102  R       inode   19:3003969520

               64    9.004   <br>

          >  0.000    0.000    0.000  pd  

          sdad <br>

          > 14:26:50.136116  R       inode   55:3591710992

               64   11.057   <br>

          >  0.000    0.000    0.000  pd  

          sdoh <br>

          > 14:26:50.141510  R       inode   21:3066810504

               64    5.909   <br>

          >  0.000    0.000    0.000  pd  

          sdaf <br>

          > 14:26:50.130529  R       inode   89:2962370072

               64   17.437   <br>

          >  0.000    0.000    0.000  pd  

          sddi <br>

          > 14:26:50.131063  R       inode   78:1889457000

               64   17.062   <br>

          >  0.000    0.000    0.000  pd  

          sdsj <br>

          > 14:26:50.143403  R       inode   36:3323035688

               64    4.807   <br>

          >  0.000    0.000    0.000  pd  

          sdmw <br>

          > 14:26:50.131044  R       inode   37:2513579736

              128   17.181   <br>

          >  0.000    0.000    0.000  pd  

          sddv <br>

          > 14:26:50.138181  R       inode   72:3868810400

               64   10.951   <br>

          >  0.000    0.000    0.000  pd  

          sdbz <br>

          > 14:26:50.138188  R       inode  131:2443484784

              128   11.792   <br>

          >  0.000    0.000    0.000  pd  

          sdug <br>

          > 14:26:50.138003  R       inode  102:3696843872

               64   11.994   <br>

          >  0.000    0.000    0.000  pd  

          sdgp <br>

          > 14:26:50.137099  R       inode  145:3370922504

               64   13.225   <br>

          >  0.000    0.000    0.000  pd  

          sdmi <br>

          > 14:26:50.141576  R       inode   62:2668579904

               64    9.313   <br>

          >  0.000    0.000    0.000  pd  

          sdou <br>

          > 14:26:50.134689  R       inode  159:2786164648

               64   16.577   <br>

          >  0.000    0.000    0.000  pd  

          sdpq <br>

          > 14:26:50.145034  R       inode   34:2097217320

               64    7.409   <br>

          >  0.000    0.000    0.000  pd  

          sdmt <br>

          > 14:26:50.138140  R       inode  139:2831038792

               64   14.898   <br>

          >  0.000    0.000    0.000  pd  

          sdlw <br>

          > 14:26:50.130954  R       inode  164:282120312

                64   22.274   <br>

          >  0.000    0.000    0.000  pd  

          sdzd <br>

          > 14:26:50.137038  R       inode   41:3421909608

               64   16.314   <br>

          >  0.000    0.000    0.000  pd  

          sdef <br>

          > 14:26:50.137606  R       inode  104:1870962416

               64   16.644   <br>

          >  0.000    0.000    0.000  pd  

          sdgx <br>

          > 14:26:50.141306  R       inode   65:2276184264

               64   16.593   <br>

          >  0.000    0.000    0.000  pd  

          sdrk <br>

          > <br>

        </font></tt>

      <br>

      <tt><font size="2">> <br>

          > mmdiag --iohist its another think i looked at it, but i

          could not

          <br>

          > find good explanation for all the "buf type" ( third

          column

          )<br>

        </font></tt>

      <br>

      <tt><font size="2">> allocSeg<br>

          > data<br>

          > iallocSeg<br>

          > indBlock<br>

          > inode<br>

          > LLIndBlock<br>

          > logData<br>

          > logDesc<br>

          > logWrap<br>

          > metadata<br>

          > vdiskAULog<br>

          > vdiskBuf<br>

          > vdiskFWLog<br>

          > vdiskMDLog<br>

          > vdiskMeta<br>

          > vdiskRGDesc</font></tt>

      <br>

      <tt><font size="2">> If i want to monifor metadata operation

          whan

          should i look at? just <br>

        </font></tt>

      <br>

      <tt><font size="2">inodes =inodes , *alloc* = file or data

          allocation

          blocks , *ind* = indirect blocks (for very large files) and

          metadata ,

          everyhing else is data or internal i/o's </font></tt>

      <br>

      <tt><font size="2"> </font></tt>

      <br>

      <tt><font size="2">> the metadata flag or also inode? this

          command

          takes also long to <br>

          > run, especially if i run it a second time it hangs for a

          lot before

          <br>

          > to rerun again, so i'm not sure that run it every 30secs

          or minute

          <br>

          > its viable, but i will look also into that. THere is any

          <br>

          > documentation that descibes clearly the whole output?

          what i found

          <br>

          > its quite generic and don't go into details...<br>

        </font></tt>

      <br>

      <tt><font size="2">the reason it takes so long is because it

          collects

          10's of thousands of i/os in a table and to not slow down the

          system when

          we dump the data we copy it to a separate buffer so we don't

          need locks

          :-) </font></tt>

      <br>

      <tt><font size="2">you can adjust the number of entries you want

          to collect

          by adjusting the ioHistorySize config parameter</font></tt>

      <br>

      <br>

      <br>

      <tt><font size="2">> > <br>

          > > Last but not least.. and this is what i really would

          like to

          <br>

          > > accomplish, i would to be able to monitor the

          latency of metadata

          <br>

          > operations. <br>

          > <br>

          > you can't do this on the server side as you don't know

          how much time<br>

          > you spend on the client , network or anything between the

          app and

          <br>

          > the physical disk, so you can only reliably look at this

          from the

          <br>

          > client, the iohist output only shows you the Server disk

          i/o <br>

          > processing time, but that can be a fraction of the

          overall time (in

          <br>

          > other cases this obviously can also be the dominant part

          depending

          <br>

          > on your workload). <br>

          > <br>

          > the easiest way on the client is to run <br>

          > <br>

          > mmfsadm vfsstats enable <br>

          > from now on vfs stats are collected until you restart

          GPFS. <br>

          > <br>

          > then run : <br>

          > <br>

          > vfs statistics currently enabled <br>

          > started at: Fri Aug 29 13:15:05.380 2014 <br>

          >   duration: 448446.970 sec <br>

          > <br>

          >  name                

             calls  time per call     total time <br>

          >  -------------------- -------- --------------

          --------------

          <br>

          >  statfs                

               9       0.000002      

          0.000021 <br>

          >  startIO              246191176

                0.005853 1441049.976740 <br>

          > <br>

          > to dump what ever you collected so far on this node. <br>

        </font></tt>

      <br>

      <tt><font size="2">> <br>

          > We already do that, but as I said, I want to check

          specifically how

          <br>

          > gss servers are keeping the requests to identify or

          exlude server

          <br>

          > side bottlenecks.<br>

          > <br>

          > <br>

          > Thanks for your help, you gave me definitely few things

          where to look

          at.<br>

          > <br>

          > Salvatore<br>

          > _______________________________________________<br>

          > gpfsug-discuss mailing list<br>

          > gpfsug-discuss at gpfsug.org<br>

          > </font></tt><a moz-do-not-send="true"

        href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font

            size="2">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font

          size="2"><br>

        </font></tt>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

<a class="moz-txt-link-freetext" href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>