<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 04/09/14 01:50, Sven Oehme wrote:<br>
</div>
<blockquote
cite="mid:OFB11720D9.723B90AC-ON88257D49.00032BBC-88257D49.00049D93@us.ibm.com"
type="cite"><tt><font size="2">> Hello everybody,<br>
</font></tt>
<br>
<tt><font size="2">Hi</font></tt>
<br>
<br>
<tt><font size="2">> here i come here again, this time to ask
some
hint about how to monitor GPFS.<br>
> <br>
> I know about mmpmon, but the issue with its "fs_io_s" and
"io_s" is <br>
> that they return number based only on the request done in
the <br>
> current host, so i have to run them on all the clients (
over 600
<br>
> nodes) so its quite unpractical. Instead i would like to
know
from <br>
> the servers whats going on, and i came across the vio_s
statistics
<br>
> wich are less documented and i dont know exacly what they
mean. <br>
> There is also this script
"/usr/lpp/mmfs/samples/vdisk/viostat"
that<br>
> runs VIO_S.<br>
> <br>
> My problems with the output of this command:</font></tt>
<br>
<tt><font size="2">> echo "vio_s" | /usr/lpp/mmfs/bin/mmpmon
-r 1<br>
> <br>
> mmpmon> mmpmon node 10.7.28.2 name gss01a vio_s OK
VIOPS per second<br>
> timestamp:
1409763206/477366<br>
> recovery group:
*<br>
> declustered array:
*<br>
> vdisk:
*<br>
> client reads:
2584229<br>
> client short writes:
55299693<br>
> client medium writes:
190071<br>
> client promoted full track writes: 465145<br>
> client full track writes:
9249<br>
> flushed update writes:
4187708<br>
> flushed promoted full track writes:
123<br>
> migrate operations:
114<br>
> scrub operations:
450590<br>
> log writes:
28509602</font></tt>
<br>
<tt><font size="2">> <br>
> it sais "VIOPS per second", but they seem to me just
counters
as <br>
> every time i re-run the command, the numbers increase by
a bit..
<br>
> Can anyone confirm if those numbers are counter or if
they are OPS/sec.<br>
</font></tt>
<br>
<tt><font size="2">the numbers are accumulative so everytime you
run
them they just show the value since start (or last reset)
time.</font></tt>
<br>
</blockquote>
OK, you confirmed my toughts, thatks<br>
<br>
<blockquote
cite="mid:OFB11720D9.723B90AC-ON88257D49.00032BBC-88257D49.00049D93@us.ibm.com"
type="cite">
<br>
<tt><font size="2">> <br>
> On a closer eye about i dont understand what most of
thosevalues <br>
> mean. For example, what exacly are "flushed promoted full
track
write" ?? <br>
> I tried to find a documentation about this output , but
could not
<br>
> find any. can anyone point me a link where output of
vio_s is explained?<br>
> <br>
> Another thing i dont understand about those numbers is if
they are
<br>
> just operations, or the number of blocks that was
read/write/etc .
<br>
</font></tt>
<br>
<tt><font size="2">its just operations and if i would explain what
the
numbers mean i might confuse you even more because this is not
what you
are really looking for. </font></tt>
<br>
<tt><font size="2">what you are looking for is what the client
io's look
like on the Server side, while the VIO layer is the Server
side to the
disks, so one lever lower than what you are looking for from
what i could
read out of the description above. <br>
</font></tt></blockquote>
No.. what I'm looking its exactly how the disks are busy to keep the
requests. Obviously i'm not looking just that, but I feel the needs
to monitor <u><b>also</b></u> those things. Ill explain you why. <br>
<br>
It happens when our storage is quite busy ( 180Gb/s of read/write )
that the FS start to be slowin normal <i><b>cd</b></i> or <i><b>ls</b></i>
requests. This might be normal, but in those situation i want to
know where the bottleneck is. Is the server CPU? Memory? Network?
Spindles? knowing where the bottlenek is might help me to understand
if we can tweak the system a bit more.<br>
<br>
If its the CPU on the servers then there is no much to do beside
replacing or add more servers.If its not the CPU, maybe more memory
would help? Maybe its just the network that filled up? so i can add
more links <br>
<br>
Or if we reached the point there the bottleneck its the spindles,
then there is no much point o look somethere else, we just reached
the hardware limit..<br>
<br>
Sometimes, it also happens that there is very low IO (10Gb/s ),
almost no cpu usage on the servers but huge slownes ( ls can take 10
seconds). Why that happens? There is not much data ops , but we
think there is a huge ammount of metadata ops. So what i want to
know is if the metadata vdisks are busy or not. If this is our
problem, could some SSD disks dedicated to metadata help? <br>
<br>
<br>
In particular im, a bit puzzled with the design of our GSS storage.<br>
Each recovery groups have 3 declustered arrays, and each declustered
aray have 1 data and 1 metadata vdisk, but in the end both metadata
and data vdisks use the same spindles. The problem that, its that I
dont understand if we have a metadata bottleneck there. Maybe some
SSD disks in a dedicated declustered array would perform much
better, but this is just theory. I really would like to be able to
monitor IO activities on the metadata vdisks.<br>
<br>
<br>
<br>
<blockquote
cite="mid:OFB11720D9.723B90AC-ON88257D49.00032BBC-88257D49.00049D93@us.ibm.com"
type="cite">
<br>
<br>
<tt><font size="2">so the Layer you care about is the NSD Server
layer,
which sits on top of the VIO layer (which is essentially the
SW RAID Layer
in GNR) </font></tt>
<br>
<br>
<tt><font size="2">> I'm asking that because if they are just
ops,
i don't know how much <br>
> they could be usefull. For example one write operation
could eman
<br>
> write 1 block or write a file of 100GB. If those are
oprations, <br>
> there is a way to have the oupunt in bytes or blocks?</font></tt>
<br>
<br>
<tt><font size="2">there are multiple ways to get infos on the NSD
layer,
one would be to use the dstat plugin (see
/usr/lpp/mmfs/sample/util) but
thats counts again. </font></tt>
<br>
</blockquote>
<br>
Counters its not a problem. I can collect them and create some
graphs in a monitoring tool. I will check that.<br>
<br>
<blockquote
cite="mid:OFB11720D9.723B90AC-ON88257D49.00032BBC-88257D49.00049D93@us.ibm.com"
type="cite">
<br>
<tt><font size="2">the alternative option is to use mmdiag
--iohist.
this shows you a history of the last X numbers of io
operations on either
the client or the server side like on a client : </font></tt>
<br>
<br>
<tt><font size="2"># mmdiag --iohist</font></tt>
<br>
<br>
<tt><font size="2">=== mmdiag: iohist ===</font></tt>
<br>
<br>
<tt><font size="2">I/O history:</font></tt>
<br>
<br>
<tt><font size="2"> I/O start time RW Buf type disk:sectorNum
nSec time ms qTime ms RpcTimes
ms Type Device/NSD ID NSD server</font></tt>
<br>
<tt><font size="2">--------------- -- -----------
----------------- -----
------- -------- ----------------- ---- ------------------
---------------</font></tt>
<br>
<tt><font size="2">14:25:22.169617 R LLIndBlock 1:1075622848
64 13.073 0.000 12.959
0.063 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:22.182723 R inode
1:1071252480 8 6.970
0.000 6.908 0.038 cli C0A70401:53BEEA7F
192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:53.659918 R LLIndBlock 1:1081202176
64 8.309 0.000 8.210
0.046 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:53.668262 R inode
2:1081373696 8 14.117 0.000
14.032 0.058 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:53.682750 R LLIndBlock 1:1065508736
64 9.254 0.000 9.180
0.038 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:53.692019 R inode
2:1064356608 8 14.899 0.000
14.847 0.029 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:53.707100 R inode
2:1077830152 8 16.499 0.000
16.449 0.025 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:53.723788 R LLIndBlock 1:1081202432
64 4.280 0.000 4.203
0.040 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:53.728082 R inode
2:1081918976 8 7.760
0.000 7.710 0.027 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.877416 R metadata
2:678978560 16 13.343 0.000
13.254 0.053 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.891048 R LLIndBlock 1:1065508608
64 15.491 0.000 15.401
0.058 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:57.906556 R inode
2:1083476520 8 11.723 0.000
11.676 0.029 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.918516 R LLIndBlock 1:1075622720
64 8.062 0.000 8.001
0.032 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:57.926592 R inode
1:1076503480 8 8.087
0.000 8.043 0.026 cli C0A70401:53BEEA7F
192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:57.934856 R LLIndBlock 1:1071088512
64 6.572 0.000 6.510
0.033 cli C0A70401:53BEEA7F 192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:57.941441 R inode
2:1069885984 8 11.686 0.000
11.641 0.024 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.953294 R inode
2:1083476936 8 8.951
0.000 8.912 0.021 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.965475 R inode
1:1076503504 8 0.477
0.000 0.053 0.000 cli C0A70401:53BEEA7F
192.167.4.1</font></tt>
<br>
<tt><font size="2">14:25:57.965755 R inode
2:1083476488 8 0.410
0.000 0.061 0.321 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<tt><font size="2">14:25:57.965787 R inode
2:1083476512 8 0.439
0.000 0.053 0.342 cli C0A70402:53BEEA5E
192.167.4.2</font></tt>
<br>
<br>
<tt><font size="2">you basically see if its a inode , data block ,
what
size it has (in sectors) , which nsd server you did send this
request to,
etc. </font></tt>
<br>
<br>
<tt><font size="2">on the Server side you see the type , which
physical
disk it goes to and also what size of disk i/o it causes like
: </font></tt>
<br>
<br>
<tt><font size="2">14:26:50.129995 R inode
12:3211886376 64 14.261 0.000
0.000 0.000 pd sdis</font></tt>
<br>
<tt><font size="2">14:26:50.137102 R inode
19:3003969520 64 9.004 0.000
0.000 0.000 pd sdad</font></tt>
<br>
<tt><font size="2">14:26:50.136116 R inode
55:3591710992 64 11.057 0.000
0.000 0.000 pd sdoh</font></tt>
<br>
<tt><font size="2">14:26:50.141510 R inode
21:3066810504 64 5.909 0.000
0.000 0.000 pd sdaf</font></tt>
<br>
<tt><font size="2">14:26:50.130529 R inode
89:2962370072 64 17.437 0.000
0.000 0.000 pd sddi</font></tt>
<br>
<tt><font size="2">14:26:50.131063 R inode
78:1889457000 64 17.062 0.000
0.000 0.000 pd sdsj</font></tt>
<br>
<tt><font size="2">14:26:50.143403 R inode
36:3323035688 64 4.807 0.000
0.000 0.000 pd sdmw</font></tt>
<br>
<tt><font size="2">14:26:50.131044 R inode
37:2513579736 128 17.181 0.000
0.000 0.000 pd sddv</font></tt>
<br>
<tt><font size="2">14:26:50.138181 R inode
72:3868810400 64 10.951 0.000
0.000 0.000 pd sdbz</font></tt>
<br>
<tt><font size="2">14:26:50.138188 R inode
131:2443484784 128 11.792 0.000
0.000 0.000 pd sdug</font></tt>
<br>
<tt><font size="2">14:26:50.138003 R inode
102:3696843872 64 11.994 0.000
0.000 0.000 pd sdgp</font></tt>
<br>
<tt><font size="2">14:26:50.137099 R inode
145:3370922504 64 13.225 0.000
0.000 0.000 pd sdmi</font></tt>
<br>
<tt><font size="2">14:26:50.141576 R inode
62:2668579904 64 9.313 0.000
0.000 0.000 pd sdou</font></tt>
<br>
<tt><font size="2">14:26:50.134689 R inode
159:2786164648 64 16.577 0.000
0.000 0.000 pd sdpq</font></tt>
<br>
<tt><font size="2">14:26:50.145034 R inode
34:2097217320 64 7.409 0.000
0.000 0.000 pd sdmt</font></tt>
<br>
<tt><font size="2">14:26:50.138140 R inode
139:2831038792 64 14.898 0.000
0.000 0.000 pd sdlw</font></tt>
<br>
<tt><font size="2">14:26:50.130954 R inode
164:282120312 64 22.274 0.000
0.000 0.000 pd sdzd</font></tt>
<br>
<tt><font size="2">14:26:50.137038 R inode
41:3421909608 64 16.314 0.000
0.000 0.000 pd sdef</font></tt>
<br>
<tt><font size="2">14:26:50.137606 R inode
104:1870962416 64 16.644 0.000
0.000 0.000 pd sdgx</font></tt>
<br>
<tt><font size="2">14:26:50.141306 R inode
65:2276184264 64 16.593 0.000
0.000 0.000 pd sdrk</font></tt>
<br>
<br>
<br>
</blockquote>
<br>
mmdiag --iohist its another think i looked at it, but i could not
find good explanation for all the "buf type" ( third column )<br>
<br>
<blockquote>
<blockquote>
<blockquote><tt>allocSeg</tt><br>
<tt>data</tt><br>
<tt>iallocSeg</tt><br>
<tt>indBlock</tt><br>
<tt>inode</tt><br>
<tt>LLIndBlock</tt><br>
<tt>logData</tt><br>
<tt>logDesc</tt><br>
<tt>logWrap</tt><br>
<tt>metadata</tt><br>
<tt>vdiskAULog</tt><br>
<tt>vdiskBuf</tt><br>
<tt>vdiskFWLog</tt><br>
<tt>vdiskMDLog</tt><br>
<tt>vdiskMeta</tt><br>
<tt>vdiskRGDesc</tt><br>
</blockquote>
</blockquote>
</blockquote>
If i want to monifor metadata operation whan should i look at? just
the metadata flag or also inode? this command takes also long to
run, especially if i run it a second time it hangs for a lot before
to rerun again, so i'm not sure that run it every 30secs or minute
its viable, but i will look also into that. THere is any
documentation that descibes clearly the whole output? what i found
its quite generic and don't go into details...<br>
<br>
<blockquote
cite="mid:OFB11720D9.723B90AC-ON88257D49.00032BBC-88257D49.00049D93@us.ibm.com"
type="cite"><tt><font size="2">> <br>
> Last but not least.. and this is what i really would like
to <br>
> accomplish, i would to be able to monitor the latency of
metadata
operations. <br>
</font></tt>
<br>
<tt><font size="2">you can't do this on the server side as you
don't
know how much time you spend on the client , network or
anything between
the app and the physical disk, so you can only reliably look
at this from
the client, the iohist output only shows you the Server disk
i/o processing
time, but that can be a fraction of the overall time (in other
cases this
obviously can also be the dominant part depending on your
workload).</font></tt>
<br>
<br>
<tt><font size="2">the easiest way on the client is to run </font></tt>
<br>
<br>
<tt><font size="2">mmfsadm vfsstats enable</font></tt>
<br>
<tt><font size="2">from now on vfs stats are collected until you
restart
GPFS. </font></tt>
<br>
<br>
<tt><font size="2">then run :</font></tt>
<br>
<br>
<tt><font size="2">vfs statistics currently enabled</font></tt>
<br>
<tt><font size="2">started at: Fri Aug 29 13:15:05.380 2014</font></tt>
<br>
<tt><font size="2"> duration: 448446.970 sec</font></tt>
<br>
<br>
<tt><font size="2"> name
calls time per call total
time</font></tt>
<br>
<tt><font size="2"> -------------------- -------- --------------
--------------</font></tt>
<br>
<tt><font size="2"> statfs
9 0.000002
0.000021</font></tt>
<br>
<tt><font size="2"> startIO
246191176 0.005853 1441049.976740</font></tt>
<br>
<br>
<tt><font size="2">to dump what ever you collected so far on this
node.
</font></tt>
<br>
<br>
</blockquote>
<br>
We already do that, but as I said, I want to check specifically how
gss servers are keeping the requests to identify or exlude server
side bottlenecks.<br>
<br>
<br>
Thanks for your help, you gave me definitely few things where to
look at.<br>
<br>
Salvatore<br>
<br>
</body>
</html>