<font size=2 face="sans-serif">so from the </font><br><font size=3>nsdMaxWorkerThreads 1024 </font><br><br><font size=2 face="sans-serif">I used to specify the same way for minWorker

... and tell everybody in the cluster.. ignorePrefetchLunCount=yes</font><br><br><br><font size=2 face="sans-serif">to adjust the min/maxworkers to your

infrastructure according your need.. how many IOPS - and / or bandwidth

with your given BS , do you think can your Backend handle ? depending on

that ... adjust #nsdmin/maxThreads... </font><br><font size=2 face="sans-serif">so if your backEnd can manage 10.000

IOPS ... roughly divide by #NSDservers and adjust nsdworkerthreads ..</font><br><br><font size=2 face="sans-serif">in addition .. check the client setting

... sometimes it is helpful to lower workerThreads on the clients.. to

prevent, that they don't overrun your NSD servers.. </font><br><br><br><div><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">"Buterbaugh, Kevin

L" <Kevin.Buterbaugh@Vanderbilt.Edu></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">gpfsug main discussion

list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">02/21/2019 01:39 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [gpfsug-discuss]

Clarification of mmdiag --iohist output</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=3>Hi All, </font><br><br><font size=3>My thanks to Aaron, Sven, Steve, and whoever responded

for the GPFS team.  You confirmed what I suspected … my example 10

second I/O was _from an NSD server_ … and since we’re in a 8 Gb FC SAN

environment, it therefore means - correct me if I’m wrong about this someone

- that I’ve got a problem somewhere in one (or more) of the following

3 components:</font><br><br><font size=3>1) the NSD servers</font><br><font size=3>2) the SAN fabric</font><br><font size=3>3) the storage arrays</font><br><br><font size=3>I’ve been looking at all of the above and none of them

are showing any obvious problems.  I’ve actually got a techie from

the storage array vendor stopping by on Thursday, so I’ll see if he can

spot anything there.  Our FC switches are QLogic’s, so I’m kinda

screwed there in terms of getting any help.  But I don’t see any

errors in the switch logs and “show perf” on the switches is showing

I/O rates of 50-100 MB/sec on the in use ports, so I don’t _think_ that’s

the issue.</font><br><br><font size=3>And this is the GPFS mailing list, after all … so let’s

talk about the NSD servers.  Neither memory (64 GB) nor CPU (2 x quad-core

Intel Xeon E5620’s) appear to be an issue.  But I have been looking

at the output of “mmfsadm saferdump nsd” based on what Aaron and then

Steve said.  Here’s some fairly typical output from one of the SMALL

queues (I’ve checked several of my 8 NSD servers and they’re all showing

similar output):</font><br><br><font size=3>    Queue NSD type NsdQueueTraditional [244]:

SMALL, threads started 12, active 3, highest 12, deferred 0, chgSize 0,

draining 0, is_chg 0</font><br><font size=3>     requests pending 0, highest pending

73, total processed 4859732</font><br><font size=3>     mutex 0x7F3E449B8F10, reqCond 0x7F3E449B8F58,

thCond 0x7F3E449B8F98, queue 0x7F3E449B8EF0, nFreeNsdRequests 29</font><br><br><font size=3>And for a LARGE queue:</font><br><br><font size=3>    Queue NSD type NsdQueueTraditional [8]:

LARGE, threads started 12, active 1, highest 12, deferred 0, chgSize 0,

draining 0, is_chg 0</font><br><font size=3>     requests pending 0, highest pending

71, total processed 2332966</font><br><font size=3>     mutex 0x7F3E441F3890, reqCond 0x7F3E441F38D8,

thCond 0x7F3E441F3918, queue 0x7F3E441F3870, nFreeNsdRequests 31</font><br><br><font size=3>So my large queues seem to be slightly less utilized than

my small queues overall … i.e. I see more inactive large queues and they

generally have a smaller “highest pending” value.</font><br><br><font size=3>Question:  are those non-zero “highest pending”

values something to be concerned about?</font><br><br><font size=3>I have the following thread-related parameters set:</font><br><br><font size=3>[common]</font><br><font size=3>maxReceiverThreads 12</font><br><font size=3>nsdMaxWorkerThreads 640</font><br><font size=3>nsdThreadsPerQueue 4</font><br><font size=3>nsdSmallThreadRatio 3</font><br><font size=3>workerThreads 128</font><br><br><font size=3>[serverLicense]</font><br><font size=3>nsdMaxWorkerThreads 1024</font><br><font size=3>nsdThreadsPerQueue 12</font><br><font size=3>nsdSmallThreadRatio 1</font><br><font size=3>pitWorkerThreadsPerNode 3</font><br><font size=3>workerThreads 1024</font><br><br><font size=3>Also, at the top of the “mmfsadm saferdump nsd” output

I see: <font size=3>Total server worker threads: running 1008, desired 147,

forNSD 147, forGNR 0, nsdBigBufferSize 16777216</font><br><font size=3>nsdMultiQueue: 256, nsdMultiQueueType: 1, nsdMinWorkerThreads:

16, nsdMaxWorkerThreads: 1024</font><br><br><font size=3>Question:  is the fact that 1008 is pretty close

to 1024 a concern?</font><br><br><font size=3>Anything jump out at anybody?  I don’t mind sharing

full output, but it is rather lengthy.  Is this worthy of a PMR?</font><br><br><font size=3>Thanks!</font><br><br><font size=3>--</font><br><font size=3>Kevin Buterbaugh - Senior System Administrator</font><br><font size=3>Vanderbilt University - Advanced Computing Center for

Research and Education</font><br><a href="mailto:Kevin.Buterbaugh@vanderbilt.edu"><font size=3 color=blue><u>Kevin.Buterbaugh@vanderbilt.edu</u></font></a><font size=3>- (615)875-9633</font><br><br><font size=3>On Feb 17, 2019, at 1:01 PM, IBM Spectrum Scale <</font><a href="mailto:scale@us.ibm.com"><font size=3 color=blue><u>scale@us.ibm.com</u></font></a><font size=3>>

wrote:</font><br><br><font size=2>Hi Kevin,</font><font size=3><br></font><font size=2><br>The I/O hist shown by the command mmdiag --iohist actually depends on the

node on which you are running this command from.<br>If you are running this on a NSD server node then it will show the time

taken to complete/serve the read or write I/O operation sent from the client

node. <br>And if you are running this on a client (or non NSD server) node then it

will show the complete time taken by the read or write I/O operation requested

by the client node to complete.<br>So in a nut shell for the NSD server case it is just the latency of the

I/O done on disk by the server whereas for the NSD client case it also

the latency of send and receive of I/O request to the NSD server along

with the latency of I/O done on disk by the NSD server.<br>I hope this answers your query.</font><font size=3><br><br></font><font size=2><br>Regards, The Spectrum Scale (GPFS) team<br><br>------------------------------------------------------------------------------------------------------------------<br>If you feel that your question can benefit other users of  Spectrum

Scale (GPFS), then please post it to the public IBM developerWroks Forum

at <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056267091&sdata=%2FWFsVfr73xZcfH25vIFYC4ts7LlWDFUIoh9fLheAEwE%3D&reserved=0"><font size=2 color=blue>https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479</a><font size=2>.

<br><br>If your query concerns a potential software error in Spectrum Scale (GPFS)

and you have an IBM software maintenance contract please contact  1-800-237-5511

in the United States or your local IBM Service Center in other countries.

<br><br>The forum is informally monitored as time permits and should not be used

for priority messages to the Spectrum Scale (GPFS) team.</font><font size=3><br><br><br></font><font size=1 color=#5f5f5f><br>From:        </font><font size=1>"Buterbaugh,

Kevin L" <</font><a href="mailto:Kevin.Buterbaugh@Vanderbilt.Edu"><font size=1 color=blue><u>Kevin.Buterbaugh@Vanderbilt.Edu</u></font></a><font size=1>></font><font size=1 color=#5f5f5f><br>To:        </font><font size=1>gpfsug main discussion

list <</font><a href="mailto:gpfsug-discuss@spectrumscale.org"><font size=1 color=blue><u>gpfsug-discuss@spectrumscale.org</u></font></a><font size=1>></font><font size=1 color=#5f5f5f><br>Date:        </font><font size=1>02/16/2019 08:18 PM</font><font size=1 color=#5f5f5f><br>Subject:        </font><font size=1>[gpfsug-discuss]

Clarification of mmdiag --iohist output</font><font size=1 color=#5f5f5f><br>Sent by:        </font><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org"><font size=1 color=blue><u>gpfsug-discuss-bounces@spectrumscale.org</u></font></a><font size=3><br></font><hr noshade><font size=3><br><br><br>Hi All, <br><br>Been reading man pages, docs, and Googling, and haven’t found a definitive

answer to this question, so I knew exactly where to turn… ;-)<br><br>I’m dealing with some slow I/O’s to certain storage arrays in our environments

… like really, really slow I/O’s … here’s just one example from one

of my NSD servers of a 10 second I/O:<br><br>08:49:34.943186  W        data   30:41615622144

  2048 10115.192  srv   dm-92        

         <client IP redacted><br><br>So here’s my question … when mmdiag —iohist tells me that that I/O took

slightly over 10 seconds, is that:<br><br>1.  The time from when the NSD server received the I/O request from

the client until it shipped the data back onto the wire towards the client?<br>2.  The time from when the client issued the I/O request until it

received the data back from the NSD server?<br>3.  Something else?<br><br>I’m thinking it’s #1, but want to confirm.  Which one it is has

very obvious implications for our troubleshooting steps.  Thanks in

advance…<br><br>Kevin<br>—<br>Kevin Buterbaugh - Senior System Administrator<br>Vanderbilt University - Advanced Computing Center for Research and Education</font><font size=3 color=blue><u><br></u></font><a href="mailto:Kevin.Buterbaugh@vanderbilt.edu"><font size=3 color=blue><u>Kevin.Buterbaugh@vanderbilt.edu</u></font></a><font size=3>-

(615)875-9633</font><tt><font size=2><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at </font></tt><a href="http://spectrumscale.org"><tt><font size=2 color=blue><u>spectrumscale.org</u></font></tt></a><font size=3 color=blue><u><br></u></font><a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056277100&sdata=PP%2Bs3UFJOHEIFNk7aOXJgo46GVeQr6P%2FLwgDUIGzAXQ%3D&reserved=0"><tt><font size=2 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></tt></a><font size=3><br><br><br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at </font><a href="http://spectrumscale.org"><font size=3 color=blue><u>spectrumscale.org</u></font></a><font size=3 color=blue><u><br></u></font><a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056297114&sdata=5pL67mhVyScJovkRHRqZog9bM5BZG8F2q972czIYAbA%3D&reserved=0"><font size=3 color=blue><u>https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056297114&amp;sdata=5pL67mhVyScJovkRHRqZog9bM5BZG8F2q972czIYAbA%3D&amp;reserved=0</u></font></a><br><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><br></div><BR>