<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Sorry, I forgot to introduce myself.<br>

      <br>

      My name is Txema Heredia. I am a systems administrator at the

      Evolutionary Biology Institute in Barcelona, Spain. We are a

      public research institution focused in biological and genetics

      research. We have a small cluster (300-cores) and we use GPFS to

      feed it with a 150TB filesystem that is currently being upgraded

      with ~450TB more. <br>

      We have been working with GPFS for less than a year. Our initial

      installation was made by on-site IBM technicians. But we are

      upgrading the system on our own, and now I am begining to

      understand the guts of GPFS and all its nuances.<br>

      <br>

      I looking forward to learn a lot from this discussion list.<br>

      <br>

      Cheers,<br>

      <br>

      Txema<br>

      <br>

      El 14/06/13 19:57, Txema Heredia Genestar escribió:<br>

    </div>

    <blockquote cite="mid:51BB5970.6010303@gmail.com" type="cite">

      <meta http-equiv="content-type" content="text/html;

        charset=ISO-8859-1">

      Hi all,<br>

      <br>

      We are building a new GPFS cluster and I have a few doubts that I

      hope you can solve about the NSD threads.<br>

      <br>

      Our old cluster is composed of 2 building blocks. Each BB is

      composed by 2 servers (12-core with 48GB RAM) connected by SAS to

      a dual controller DS3512 disk cabinet, with 36x 7.2krpm 3TB SATA

      disks. Each controller has 2Gb of cache. <br>

      Our "big" filesystem (130 TB) is formed by 6 NSDs, each one being

      a 8+1 RAID5 LUN coming from a cabinet. Data and metadata mixed. We

      have 6 luns and 4 controllers and NSD servers. Thus, some serve 2

      "disks" and some just 1.<br>

      <br>

      As for GPFS, we are using the default GPFS 3.4 thread parameters:<br>

      nsdMaxWorkerThreads = 64<br>

      nsdMinWorkerThreads = 16<br>

      nsdThreadsPerDisk = 3<br>

      #NSD per server = 1 or 2<br>

      <br>

      In this an IBM presentation ( <a moz-do-not-send="true"

        class="moz-txt-link-freetext"

href="http://www-05.ibm.com/de/events/gpfs-workshop/pdf/pr-11-GPFS_R35_nsdMultipleQ_and_other_enhancmentsv4-OW.pdf">http://www-05.ibm.com/de/events/gpfs-workshop/pdf/pr-11-GPFS_R35_nsdMultipleQ_and_other_enhancmentsv4-OW.pdf</a>

      slide 4), they show that the formula to obtain the number of

      concurrently active nsd threads is:<br>

      <br>

      <meta http-equiv="content-type" content="text/html;

        charset=ISO-8859-1">

      MAX (  MIN (  nsdThreadsPerDisk * #NSDperServer ,

      nsdMaxWorkerThreads  ), nsdMinWorkerThreads  )<br>

      <br>

      In our case, we have only 6 NSD, and a server is responsible only

      of up-to 2 of them. We are left with MIN ( 6 ,  16 ), and thus, we

      end up having between 8 and 16 threads per disk, when we should

      have just 3.<br>

      <br>

      This is a photo obtained right now in one of our servers with

      "mmfsadm dump nsd":<br>

      <br>

      <br>

      Worker threads: running 16, started 16, desired 16, active 16,

      highest 16<br>

      Requests: pending 333, highest pending 615, total processed

      839099802<br>

      <br>

      Buffer use: current 16777216, highest 16777216<br>

      Server state: suspendCount 0, killRequested 0, activeLocalIO 0<br>

        reOpenRequested 0, reOpenInProgress 0, nsdJoinComplete 1,

      osdRequests 0x0<br>

      [...]<br>

      Disk name   NsdId              Status    Hold I/O rcktry wckerr

      Addr<br>

        ----------  -----------------  --------  ---- --- ------ ------

      ----<br>

        home11      0A3C3D02:4FC87656  active    0    0   0      0     

      0x7F4E501565C0<br>

        scratch11   0A3C3D01:4FBE76D7  active    15   15  0      0     

      0x7F4E50156640<br>

        scratch12   0A3C3D02:4FBE76D8  active    0    0   0      0     

      0x7F4E501566C0<br>

        scratch13   0A3C3D01:4FBE76D8  active    1    1   0      0     

      0x7F4E50156740<br>

      <br>

      <br>

      <br>

      On the other hand, when we run the performance monitor on each

      disk controller, we obtain the following numbers per LUN in a

      state of 0% cache hit:<br>

      mean IO/s = 180<br>

      Read % = 97.5%<br>

      throughput = 105 MB/s<br>

      <br>

      All LUNs show similar results. The combined read throughput is

      ~630 MB/s. This is is the "live" cluster with ~300 jobs running,

      not a single process reading a big file.<br>

      <br>

      <br>

      Are all these numbers ok? Is that disk performance fine?<br>

      <br>

      What should we do with the thread parameters? Are the 16

      simultaneous threads disrupting our disk? Should we lower the

      nsdMinWorkerThreads? Are they not? Should we rise the

      nsdThreadsPerDisk to 16 or more, as the disks have shown they can

      handle them?<br>

      <br>

      In our new cluster installation, we will have 4 nsd servers, each

      one being responsible of 4-5 NSDs, using 4 disk cabinets similar

      to the ones we have now. We will also move to GPFS 3.5, where

      nsdMaxWorkerThreads has been rised to 512 as default and with the

      small/large queues thing.<br>

      How should we adapt to it? Is the nsdThreadsPerDisk=3 an ancient

      default value and we should move on?<br>

      <br>

      Thanks in advance,<br>

      <br>

      Txema<br>

    </blockquote>

    <br>

  </body>

</html>