[gpfsug-discuss] Clarification of mmdiag --iohist output

Sun Feb 17 18:59:37 GMT 2019

If you run it on the client, it includes local queuing, network as well as NSD Server processing and the actual device I/O time. 

if issued on the NSD Server it contains processing and I/O time, the processing shouldn’t really add any overhead but in some cases I have seen it contributing. 

If you corelate the client and server iohist outputs you can find the server entry based on the tags in the iohist output, this allows you to see exactly how much time was spend on network vs on the server to rule out network as the problem.

Sven

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Aaron Knister <aaron.knister at gmail.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Sunday, February 17, 2019 at 6:26 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

Hi Kevin,

It's funny you bring this up because I was looking at this yesterday. My belief is that it's the time the from when the I/O request was queued internally by the client to when the I/O response was received from the NSD server which means it absolutely includes the network RTT. It would be great to get formal confirmation of this from someone who knows the code.

Here's some trace data showing a single 4K read from an NSD client. I've stripped out a bunch of uninteresting stuff. It's my belief that the TRACE_IO indicates the point at which the "i/o timer" reported on by mmdiag --iohist begins ticking. The testing data seems to support this. If I'm correct, the testing data shows that the RDMA I/O to the NSD server occurs within the TRACE_IO timing window. The other thing that makes me believe this, is in my testing the mmdiag --iohist on the client shows an average latency of ~230us for a 4K read whereas mmdiag --iohist on the NSD server appears to show an average latency of ~170us when servicing those 4K reads from the back-end disk (a DDN SFA14KX).

0.000218276     37005   TRACE_DISK: doReplicatedRead: da 34:490710888
0.000218424     37005   TRACE_IO: QIO: read data tag 743942 108137 ioVecSize 1 1st buf 0x122F000 nsdName S01_DMD_NSD_034 da 34:490710888 nSectors 8 align 0 by iocMBHandler (DioHandlerThread)
0.000218566     37005   TRACE_DLEASE: checkLeaseForIO: rc 0

0.000218628     37005   TRACE_IO: SIO: ioVecSize 1 1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 doioP 0x1806994D300 isDaemonAddr 0
0.000218672     37005   TRACE_FS: verify4KIO exit: code 4 err 0
0.000219106     37005   TRACE_NSD: nsdDoIO enter: read ioVecSize 1 1st bufAddr 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nBytes 4096 isDaemonAddr 0
0.000219408     37005   TRACE_GCRYPTO: EncBufPool::getTmpBuf(): bsize=4096 code=205 err=0 bufP=0x180656C1058 outBufP=0x180656C1058 index=0
0.000220105     37005   TRACE_TS: sendMessage msg_id 22993695: dest <c0n2> 10.3.17.3 sto03
0.000220436     37005   TRACE_RDMA: verbs::verbsSend_i: enter: index 11 cookie 12 node <c0n2> msg_id 22993695 len 92

0.000221111      37005   TRACE_RDMA: verbsConn::postSend: currRpcHeadP 0x7F6D7FA106E8 sendType SEND_BUFFER_RPC index 11 cookie 12 threadId 37005 bufferId.index 11
0.000221662     37005   TRACE_MUTEX: Waiting on fast condvar for signal 0x7F6D7FA106F8 RdmaSend_NSD_SVC

0.000426716     16691   TRACE_RDMA: handleRecvComp: success 1 of 1 nWrSuccess 1 index 11 cookie 12 wr_id 0xB0E6E00000000 bufferP 0x7F6D7EEE1700 byte_len 4144

0.000432140     37005   TRACE_NSD: nsdDoIO_ReadAndCheck: read complete, len 0 status 6 err 0 bufP 0x180656C1058 dioIsOverRdma 1 ioDataP 0x200000BC000 ckSumType NsdCksum_None
0.000432163     37005   TRACE_NSD: nsdDoIO_ReadAndCheck: exit err 0
0.000433707     37005   TRACE_GCRYPTO: EncBufPool::releaseTmpBuf(): exit bsize=8192 err=0 inBufP=0x180656C1058 bufP=0x180656C1058 index=0
0.000433777     37005   TRACE_NSD: nsdDoIO exit: err 0 0

0.000433844     37005   TRACE_IO: FIO: read data tag 743942 108137 ioVecSize 1 1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 err 0
0.000434236     37005   TRACE_DISK: postIO: qosid A00D91E read data disk FFFFFFFF ioVecSize 1 1st buf 0x122F000 err 0 duration 0.000215000 by iocMBHandler (DioHandlerThread)

I'd suggest looking at "mmdiag --iohist" on the NSD server itself and see if/how that differs from the client. The other thing you could do is see if your NSD server queues are backing up (e.g. "mmfsadm saferdump nsd" and look for "requests pending" on queues where the "active" field is > 0). That doesn't necessarily mean you need to tune your queues but I'd suggest that if the disk I/O on your NSD server looks healthy (e.g. low latency, not overly-taxed) that you could benefit from queue tuning.

-Aaron

On Sat, Feb 16, 2019 at 9:47 AM Buterbaugh, Kevin L <Kevin.Buterbaugh at vanderbilt.edu> wrote:

Hi All, 

Been reading man pages, docs, and Googling, and haven’t found a definitive answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments … like really, really slow I/O’s … here’s just one example from one of my NSD servers of a 10 second I/O:

08:49:34.943186  W        data   30:41615622144   2048 10115.192  srv   dm-92                  <client IP redacted>

So here’s my question … when mmdiag —iohist tells me that that I/O took slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client until it shipped the data back onto the wire towards the client?

2.  The time from when the client issued the I/O request until it received the data back from the NSD server?

3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious implications for our troubleshooting steps.  Thanks in advance…

Kevin

—

Kevin Buterbaugh - Senior System Administrator

Vanderbilt University - Advanced Computing Center for Research and Education

Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190217/b270dc02/attachment-0002.htm>