[gpfsug-discuss] Clarification of mmdiag --iohist output

Thu Feb 21 12:54:20 GMT 2019

 Are all of the slow IOs from the same NSD volumes?    

You could run an mmtrace and take an internaldump and open a ticket to the Spectrum Scale queue.  You may want to limit the run to just your nsd servers and not all nodes like I use in my example.     Or one of the tools we use to review a trace is available in /usr/lpp/mmfs/samples/debugtools/trsum.awk   and you can run it passing in the uncompressed trace file and redirect standard out to a file.     If you search for ' total '  in the trace you will find the different sections,  or you can just grep ' total IO ' trsum.out  | grep duration  to get a quick look per LUN.

mmtracectl --set --trace=def --tracedev-write-mode=overwrite --tracedev-overwrite-buffer-size=500M -N all
mmtracectl --start -N all ; sleep 30 ; mmtracectl --stop -N all  ; mmtracectl --off -N all 
mmdsh -N all "/usr/lpp/mmfs/bin/mmfsadm dump all >/tmp/mmfs/service.dumpall.\$(hostname)"

Jim

    On Thursday, February 21, 2019, 7:23:46 AM EST, Frederick Stock <stockf at us.ibm.com> wrote:  

 Kevin I'm assuming you have seen the article on IBM developerWorks about the GPFS NSD queues.  It provides useful background for analyzing the dump nsd information.  Here I'll list some thoughts for items that you can investigate/consider. If your NSD servers are doing both large (greater than 64K) and small (64K or less) IOs then you want to have the nsdSmallThreadRatio set to 1 as it seems you do for the NSD servers.  This provides an equal number of SMALL and LARGE NSD queues.  You can also increase the total number of queues (currently 256) but I cannot determine if that is necessary from the data you provided.  Only on rare occasions have I seen a need to increase the number of queues. The fact that you have 71 highest pending on your LARGE queues and 73 highest pending on your SMALL queues would imply your IOs are queueing for a good while either waiting for resources in GPFS or waiting for IOs to complete.  Your maximum buffer size is 16M which is defined to be the largest IO that can be requested by GPFS.  This is the buffer size that GPFS will use for LARGE IOs.  You indicated you had sufficient memory on the NSD servers but what is the value for the pagepool on those servers, and what is the value of the nsdBufSpace parameter?   If the NSD server is just that then usually nsdBufSpace is set to 70.  The IO buffers used by the NSD server come from the pagepool so you need sufficient space there for the maximum number of LARGE IO buffers that would be used concurrently by GPFS or threads will need to wait for those buffers to become available.  Essentially you want to ensure you have sufficient memory for the maximum number of IOs all doing a large IO and that value being less than 70% of the pagepool size. You could look at the settings for the FC cards to ensure they are configured to do the largest IOs possible.  I forget the actual values (have not done this for awhile) but there are settings for the adapters that control the maximum IO size that will be sent.  I think you want this to be as large as the adapter can handle to reduce the number of messages needed to complete the large IOs done by GPFS.  Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com  
----- Original message -----
From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: Re: [gpfsug-discuss] Clarification of mmdiag --iohist output
Date: Thu, Feb 21, 2019 6:39 AM

 Hi All, My thanks to Aaron, Sven, Steve, and whoever responded for the GPFS team.  You confirmed what I suspected … my example 10 second I/O was _from an NSD server_ … and since we’re in a 8 Gb FC SAN environment, it therefore means - correct me if I’m wrong about this someone - that I’ve got a problem somewhere in one (or more) of the following 3 components: 1) the NSD servers2) the SAN fabric3) the storage arrays I’ve been looking at all of the above and none of them are showing any obvious problems.  I’ve actually got a techie from the storage array vendor stopping by on Thursday, so I’ll see if he can spot anything there.  Our FC switches are QLogic’s, so I’m kinda screwed there in terms of getting any help.  But I don’t see any errors in the switch logs and “show perf” on the switches is showing I/O rates of 50-100 MB/sec on the in use ports, so I don’t _think_ that’s the issue. And this is the GPFS mailing list, after all … so let’s talk about the NSD servers.  Neither memory (64 GB) nor CPU (2 x quad-core Intel Xeon E5620’s) appear to be an issue.  But I have been looking at the output of “mmfsadm saferdump nsd” based on what Aaron and then Steve said.  Here’s some fairly typical output from one of the SMALL queues (I’ve checked several of my 8 NSD servers and they’re all showing similar output):     Queue NSD type NsdQueueTraditional [244]: SMALL, threads started 12, active 3, highest 12, deferred 0, chgSize 0, draining 0, is_chg 0     requests pending 0, highest pending 73, total processed 4859732     mutex 0x7F3E449B8F10, reqCond 0x7F3E449B8F58, thCond 0x7F3E449B8F98, queue 0x7F3E449B8EF0, nFreeNsdRequests 29 And for a LARGE queue:     Queue NSD type NsdQueueTraditional [8]: LARGE, threads started 12, active 1, highest 12, deferred 0, chgSize 0, draining 0, is_chg 0     requests pending 0, highest pending 71, total processed 2332966     mutex 0x7F3E441F3890, reqCond 0x7F3E441F38D8, thCond 0x7F3E441F3918, queue 0x7F3E441F3870, nFreeNsdRequests 31 So my large queues seem to be slightly less utilized than my small queues overall … i.e. I see more inactive large queues and they generally have a smaller “highest pending” value. Question:  are those non-zero “highest pending” values something to be concerned about? I have the following thread-related parameters set: [common]maxReceiverThreads 12nsdMaxWorkerThreads 640nsdThreadsPerQueue 4nsdSmallThreadRatio 3workerThreads 128 [serverLicense]nsdMaxWorkerThreads 1024nsdThreadsPerQueue 12nsdSmallThreadRatio 1pitWorkerThreadsPerNode 3workerThreads 1024 Also, at the top of the “mmfsadm saferdump nsd” output I see: Total server worker threads: running 1008, desired 147, forNSD 147, forGNR 0, nsdBigBufferSize 16777216nsdMultiQueue: 256, nsdMultiQueueType: 1, nsdMinWorkerThreads: 16, nsdMaxWorkerThreads: 1024 Question:  is the fact that 1008 is pretty close to 1024 a concern? Anything jump out at anybody?  I don’t mind sharing full output, but it is rather lengthy.  Is this worthy of a PMR? Thanks! --Kevin Buterbaugh - Senior System AdministratorVanderbilt University - Advanced Computing Center for Research and EducationKevin.Buterbaugh at vanderbilt.edu - (615)875-9633 
On Feb 17, 2019, at 1:01 PM, IBM Spectrum Scale <scale at us.ibm.com> wrote: Hi Kevin,

The I/O hist shown by the command mmdiag --iohist actually depends on the node on which you are running this command from.
If you are running this on a NSD server node then it will show the time taken to complete/serve the read or write I/O operation sent from the client node. 
And if you are running this on a client (or non NSD server) node then it will show the complete time taken by the read or write I/O operation requested by the client node to complete.
So in a nut shell for the NSD server case it is just the latency of the I/O done on disk by the server whereas for the NSD client case it also the latency of send and receive of I/O request to the NSD server along with the latency of I/O done on disk by the NSD server.
I hope this answers your query.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.

From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        02/16/2019 08:18 PM
Subject:        [gpfsug-discuss] Clarification of mmdiag --iohist output
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

Hi All, 

Been reading man pages, docs, and Googling, and haven’t found a definitive answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments … like really, really slow I/O’s … here’s just one example from one of my NSD servers of a 10 second I/O:

08:49:34.943186  W        data   30:41615622144   2048 10115.192  srv   dm-92                  <client IP redacted>

So here’s my question … when mmdiag —iohist tells me that that I/O took slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056297114&sdata=5pL67mhVyScJovkRHRqZog9bM5BZG8F2q972czIYAbA%3D&reserved=0
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190221/f5fa0e58/attachment-0002.htm>