[gpfsug-discuss] nsdMinWorkerThreads and nsdThreadsPerDisk in clusters with just a few NSDs

Fri Jun 14 18:57:04 BST 2013

Hi all,

We are building a new GPFS cluster and I have a few doubts that I hope 
you can solve about the NSD threads.

Our old cluster is composed of 2 building blocks. Each BB is composed by 
2 servers (12-core with 48GB RAM) connected by SAS to a dual controller 
DS3512 disk cabinet, with 36x 7.2krpm 3TB SATA disks. Each controller 
has 2Gb of cache.
Our "big" filesystem (130 TB) is formed by 6 NSDs, each one being a 8+1 
RAID5 LUN coming from a cabinet. Data and metadata mixed. We have 6 luns 
and 4 controllers and NSD servers. Thus, some serve 2 "disks" and some 
just 1.

As for GPFS, we are using the default GPFS 3.4 thread parameters:
nsdMaxWorkerThreads = 64
nsdMinWorkerThreads = 16
nsdThreadsPerDisk = 3
#NSD per server = 1 or 2

In this an IBM presentation ( 
http://www-05.ibm.com/de/events/gpfs-workshop/pdf/pr-11-GPFS_R35_nsdMultipleQ_and_other_enhancmentsv4-OW.pdf 
slide 4), they show that the formula to obtain the number of 
concurrently active nsd threads is:

MAX (  MIN (  nsdThreadsPerDisk * #NSDperServer , nsdMaxWorkerThreads  
), nsdMinWorkerThreads  )

In our case, we have only 6 NSD, and a server is responsible only of 
up-to 2 of them. We are left with MIN ( 6 ,  16 ), and thus, we end up 
having between 8 and 16 threads per disk, when we should have just 3.

This is a photo obtained right now in one of our servers with "mmfsadm 
dump nsd":

Worker threads: running 16, started 16, desired 16, active 16, highest 16
Requests: pending 333, highest pending 615, total processed 839099802

Buffer use: current 16777216, highest 16777216
Server state: suspendCount 0, killRequested 0, activeLocalIO 0
   reOpenRequested 0, reOpenInProgress 0, nsdJoinComplete 1, osdRequests 0x0
[...]
Disk name   NsdId              Status    Hold I/O rcktry wckerr Addr
   ----------  -----------------  --------  ---- --- ------ ------ ----
   home11      0A3C3D02:4FC87656  active    0    0   0      0 0x7F4E501565C0
   scratch11   0A3C3D01:4FBE76D7  active    15   15  0      0 0x7F4E50156640
   scratch12   0A3C3D02:4FBE76D8  active    0    0   0      0 0x7F4E501566C0
   scratch13   0A3C3D01:4FBE76D8  active    1    1   0      0 0x7F4E50156740

On the other hand, when we run the performance monitor on each disk 
controller, we obtain the following numbers per LUN in a state of 0% 
cache hit:
mean IO/s = 180
Read % = 97.5%
throughput = 105 MB/s

All LUNs show similar results. The combined read throughput is ~630 
MB/s. This is is the "live" cluster with ~300 jobs running, not a single 
process reading a big file.

Are all these numbers ok? Is that disk performance fine?

What should we do with the thread parameters? Are the 16 simultaneous 
threads disrupting our disk? Should we lower the nsdMinWorkerThreads? 
Are they not? Should we rise the nsdThreadsPerDisk to 16 or more, as the 
disks have shown they can handle them?

In our new cluster installation, we will have 4 nsd servers, each one 
being responsible of 4-5 NSDs, using 4 disk cabinets similar to the ones 
we have now. We will also move to GPFS 3.5, where nsdMaxWorkerThreads 
has been rised to 512 as default and with the small/large queues thing.
How should we adapt to it? Is the nsdThreadsPerDisk=3 an ancient default 
value and we should move on?

Thanks in advance,

Txema
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20130614/ca65593e/attachment-0002.htm>