[gpfsug-discuss] Clarification of mmdiag --iohist output

Aaron Knister aaron.knister at gmail.com
Sun Feb 17 14:26:23 GMT 2019


Hi Kevin,

It's funny you bring this up because I was looking at this yesterday. My
belief is that it's the time the from when the I/O request was queued
internally by the client to when the I/O response was received from the NSD
server which means it absolutely includes the network RTT. It would be
great to get formal confirmation of this from someone who knows the code.

Here's some trace data showing a single 4K read from an NSD client. I've
stripped out a bunch of uninteresting stuff. It's my belief that the
TRACE_IO indicates the point at which the "i/o timer" reported on by mmdiag
--iohist begins ticking. The testing data seems to support this. If I'm
correct, the testing data shows that the RDMA I/O to the NSD server occurs
within the TRACE_IO timing window. The other thing that makes me believe
this, is in my testing the mmdiag --iohist on the client shows an average
latency of ~230us for a 4K read whereas mmdiag --iohist on the NSD server
appears to show an average latency of ~170us when servicing those 4K reads
from the back-end disk (a DDN SFA14KX).

0.000218276     37005   TRACE_DISK: doReplicatedRead: da 34:490710888
0.000218424     37005   TRACE_IO: QIO: read data tag 743942 108137
ioVecSize 1 1st buf 0x122F000 nsdName S01_DMD_NSD_034 da 34:490710888
nSectors 8 align 0 by iocMBHandler (DioHandlerThread)
0.000218566     37005   TRACE_DLEASE: checkLeaseForIO: rc 0

0.000218628     37005   TRACE_IO: SIO: ioVecSize 1 1st buf 0x122F000 nsdId
0A011103:5C59DBAC da 34:490710888 nSectors 8 doioP 0x1806994D300
isDaemonAddr 0
0.000218672     37005   TRACE_FS: verify4KIO exit: code 4 err 0
0.000219106     37005   TRACE_NSD: nsdDoIO enter: read ioVecSize 1 1st
bufAddr 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nBytes 4096
isDaemonAddr 0
0.000219408     37005   TRACE_GCRYPTO: EncBufPool::getTmpBuf(): bsize=4096
code=205 err=0 bufP=0x180656C1058 outBufP=0x180656C1058 index=0
0.000220105     37005   TRACE_TS: sendMessage msg_id 22993695: dest <c0n2>
10.3.17.3 sto03
0.000220436     37005   TRACE_RDMA: verbs::verbsSend_i: enter: index 11
cookie 12 node <c0n2> msg_id 22993695 len 92
0.000221111      37005   TRACE_RDMA: verbsConn::postSend: currRpcHeadP
0x7F6D7FA106E8 sendType SEND_BUFFER_RPC index 11 cookie 12 threadId 37005
bufferId.index 11
0.000221662     37005   TRACE_MUTEX: Waiting on fast condvar for signal
0x7F6D7FA106F8 RdmaSend_NSD_SVC

0.000426716     16691   TRACE_RDMA: handleRecvComp: success 1 of 1
nWrSuccess 1 index 11 cookie 12 wr_id 0xB0E6E00000000 bufferP
0x7F6D7EEE1700 byte_len 4144

0.000432140     37005   TRACE_NSD: nsdDoIO_ReadAndCheck: read complete, len
0 status 6 err 0 bufP 0x180656C1058 dioIsOverRdma 1 ioDataP 0x200000BC000
ckSumType NsdCksum_None
0.000432163     37005   TRACE_NSD: nsdDoIO_ReadAndCheck: exit err 0
0.000433707     37005   TRACE_GCRYPTO: EncBufPool::releaseTmpBuf(): exit
bsize=8192 err=0 inBufP=0x180656C1058 bufP=0x180656C1058 index=0
0.000433777     37005   TRACE_NSD: nsdDoIO exit: err 0 0

0.000433844     37005   TRACE_IO: FIO: read data tag 743942 108137
ioVecSize 1 1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888
nSectors 8 err 0
0.000434236     37005   TRACE_DISK: postIO: qosid A00D91E read data disk
FFFFFFFF ioVecSize 1 1st buf 0x122F000 err 0 duration 0.000215000 by
iocMBHandler (DioHandlerThread)

I'd suggest looking at "mmdiag --iohist" on the NSD server itself and see
if/how that differs from the client. The other thing you could do is see if
your NSD server queues are backing up (e.g. "mmfsadm saferdump nsd" and
look for "requests pending" on queues where the "active" field is > 0).
That doesn't necessarily mean you need to tune your queues but I'd suggest
that if the disk I/O on your NSD server looks healthy (e.g. low latency,
not overly-taxed) that you could benefit from queue tuning.

-Aaron

On Sat, Feb 16, 2019 at 9:47 AM Buterbaugh, Kevin L <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

> Hi All,
>
> Been reading man pages, docs, and Googling, and haven’t found a definitive
> answer to this question, so I knew exactly where to turn… ;-)
>
> I’m dealing with some slow I/O’s to certain storage arrays in our
> environments … like really, really slow I/O’s … here’s just one example
> from one of my NSD servers of a 10 second I/O:
>
> 08:49:34.943186  W        data   30:41615622144   2048 10115.192  srv
> dm-92                  <client IP redacted>
>
> So here’s my question … when mmdiag —iohist tells me that that I/O took
> slightly over 10 seconds, is that:
>
> 1.  The time from when the NSD server received the I/O request from the
> client until it shipped the data back onto the wire towards the client?
> 2.  The time from when the client issued the I/O request until it received
> the data back from the NSD server?
> 3.  Something else?
>
> I’m thinking it’s #1, but want to confirm.  Which one it is has very
> obvious implications for our troubleshooting steps.  Thanks in advance…
>
> Kevin
>> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190217/8f2b0d96/attachment-0002.htm>


More information about the gpfsug-discuss mailing list