[gpfsug-discuss] gpfs performance monitoring
Orlando Richards
orlando.richards at ed.ac.uk
Thu Sep 4 14:54:37 BST 2014
On 04/09/14 14:32, Salvatore Di Nardo wrote:
> Sorry to bother you again but dstat have some issues with the plugin:
>
> [root at gss01a util]# dstat --gpfs
> /usr/bin/dstat:1672: DeprecationWarning: os.popen3 is
> deprecated. Use the subprocess module.
> pipes[cmd] = os.popen3(cmd, 't', 0)
> Module dstat_gpfs failed to load. (global name 'select' is not
> defined)
> None of the stats you selected are available.
>
> I found this solution , but involve dstat recompile....
>
> https://github.com/dagwieers/dstat/issues/44
>
> Are you aware about any easier solution (we use RHEL6.3) ?
>
This worked for me the other day on a dev box I was poking at:
# rm /usr/share/dstat/dstat_gpfsops*
# cp /usr/lpp/mmfs/samples/util/dstat_gpfsops.py.dstat.0.7
/usr/share/dstat/dstat_gpfsops.py
# dstat --gpfsops
/usr/bin/dstat:1672: DeprecationWarning: os.popen3 is deprecated. Use
the subprocess module.
pipes[cmd] = os.popen3(cmd, 't', 0)
---------------------------gpfs-vfs-ops--------------------------#-----------------------------gpfs-disk-i/o-----------------------------
cr del op/cl rd wr trunc fsync looku gattr sattr other mb_rd
mb_wr pref wrbeh steal clean sync revok logwr logda oth_r oth_w
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
...
>
> Regards,
> Salvatore
>
> On 04/09/14 01:50, Sven Oehme wrote:
>> > Hello everybody,
>>
>> Hi
>>
>> > here i come here again, this time to ask some hint about how to
>> monitor GPFS.
>> >
>> > I know about mmpmon, but the issue with its "fs_io_s" and "io_s" is
>> > that they return number based only on the request done in the
>> > current host, so i have to run them on all the clients ( over 600
>> > nodes) so its quite unpractical. Instead i would like to know from
>> > the servers whats going on, and i came across the vio_s statistics
>> > wich are less documented and i dont know exacly what they mean.
>> > There is also this script "/usr/lpp/mmfs/samples/vdisk/viostat" that
>> > runs VIO_S.
>> >
>> > My problems with the output of this command:
>> > echo "vio_s" | /usr/lpp/mmfs/bin/mmpmon -r 1
>> >
>> > mmpmon> mmpmon node 10.7.28.2 name gss01a vio_s OK VIOPS per second
>> > timestamp: 1409763206/477366
>> > recovery group: *
>> > declustered array: *
>> > vdisk: *
>> > client reads: 2584229
>> > client short writes: 55299693
>> > client medium writes: 190071
>> > client promoted full track writes: 465145
>> > client full track writes: 9249
>> > flushed update writes: 4187708
>> > flushed promoted full track writes: 123
>> > migrate operations: 114
>> > scrub operations: 450590
>> > log writes: 28509602
>> >
>> > it sais "VIOPS per second", but they seem to me just counters as
>> > every time i re-run the command, the numbers increase by a bit..
>> > Can anyone confirm if those numbers are counter or if they are OPS/sec.
>>
>> the numbers are accumulative so everytime you run them they just show
>> the value since start (or last reset) time.
>>
>> >
>> > On a closer eye about i dont understand what most of thosevalues
>> > mean. For example, what exacly are "flushed promoted full track
>> write" ??
>> > I tried to find a documentation about this output , but could not
>> > find any. can anyone point me a link where output of vio_s is explained?
>> >
>> > Another thing i dont understand about those numbers is if they are
>> > just operations, or the number of blocks that was read/write/etc .
>>
>> its just operations and if i would explain what the numbers mean i
>> might confuse you even more because this is not what you are really
>> looking for.
>> what you are looking for is what the client io's look like on the
>> Server side, while the VIO layer is the Server side to the disks, so
>> one lever lower than what you are looking for from what i could read
>> out of the description above.
>>
>> so the Layer you care about is the NSD Server layer, which sits on top
>> of the VIO layer (which is essentially the SW RAID Layer in GNR)
>>
>> > I'm asking that because if they are just ops, i don't know how much
>> > they could be usefull. For example one write operation could eman
>> > write 1 block or write a file of 100GB. If those are oprations,
>> > there is a way to have the oupunt in bytes or blocks?
>>
>> there are multiple ways to get infos on the NSD layer, one would be to
>> use the dstat plugin (see /usr/lpp/mmfs/sample/util) but thats counts
>> again.
>>
>> the alternative option is to use mmdiag --iohist. this shows you a
>> history of the last X numbers of io operations on either the client or
>> the server side like on a client :
>>
>> # mmdiag --iohist
>>
>> === mmdiag: iohist ===
>>
>> I/O history:
>>
>> I/O start time RW Buf type disk:sectorNum nSec time ms qTime
>> ms RpcTimes ms Type Device/NSD ID NSD server
>> --------------- -- ----------- ----------------- ----- -------
>> -------- ----------------- ---- ------------------ ---------------
>> 14:25:22.169617 R LLIndBlock 1:1075622848 64 13.073
>> 0.000 12.959 0.063 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:22.182723 R inode 1:1071252480 8 6.970 0.000
>> 6.908 0.038 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:53.659918 R LLIndBlock 1:1081202176 64 8.309
>> 0.000 8.210 0.046 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:53.668262 R inode 2:1081373696 8 14.117
>> 0.000 14.032 0.058 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:53.682750 R LLIndBlock 1:1065508736 64 9.254
>> 0.000 9.180 0.038 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:53.692019 R inode 2:1064356608 8 14.899
>> 0.000 14.847 0.029 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:53.707100 R inode 2:1077830152 8 16.499
>> 0.000 16.449 0.025 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:53.723788 R LLIndBlock 1:1081202432 64 4.280
>> 0.000 4.203 0.040 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:53.728082 R inode 2:1081918976 8 7.760 0.000
>> 7.710 0.027 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.877416 R metadata 2:678978560 16 13.343 0.000
>> 13.254 0.053 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.891048 R LLIndBlock 1:1065508608 64 15.491
>> 0.000 15.401 0.058 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:57.906556 R inode 2:1083476520 8 11.723
>> 0.000 11.676 0.029 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.918516 R LLIndBlock 1:1075622720 64 8.062
>> 0.000 8.001 0.032 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:57.926592 R inode 1:1076503480 8 8.087 0.000
>> 8.043 0.026 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:57.934856 R LLIndBlock 1:1071088512 64 6.572
>> 0.000 6.510 0.033 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:57.941441 R inode 2:1069885984 8 11.686
>> 0.000 11.641 0.024 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.953294 R inode 2:1083476936 8 8.951 0.000
>> 8.912 0.021 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.965475 R inode 1:1076503504 8 0.477 0.000
>> 0.053 0.000 cli C0A70401:53BEEA7F 192.167.4.1
>> 14:25:57.965755 R inode 2:1083476488 8 0.410 0.000
>> 0.061 0.321 cli C0A70402:53BEEA5E 192.167.4.2
>> 14:25:57.965787 R inode 2:1083476512 8 0.439 0.000
>> 0.053 0.342 cli C0A70402:53BEEA5E 192.167.4.2
>>
>> you basically see if its a inode , data block , what size it has (in
>> sectors) , which nsd server you did send this request to, etc.
>>
>> on the Server side you see the type , which physical disk it goes to
>> and also what size of disk i/o it causes like :
>>
>> 14:26:50.129995 R inode 12:3211886376 64 14.261
>> 0.000 0.000 0.000 pd sdis
>> 14:26:50.137102 R inode 19:3003969520 64 9.004
>> 0.000 0.000 0.000 pd sdad
>> 14:26:50.136116 R inode 55:3591710992 64 11.057
>> 0.000 0.000 0.000 pd sdoh
>> 14:26:50.141510 R inode 21:3066810504 64 5.909
>> 0.000 0.000 0.000 pd sdaf
>> 14:26:50.130529 R inode 89:2962370072 64 17.437
>> 0.000 0.000 0.000 pd sddi
>> 14:26:50.131063 R inode 78:1889457000 64 17.062
>> 0.000 0.000 0.000 pd sdsj
>> 14:26:50.143403 R inode 36:3323035688 64 4.807
>> 0.000 0.000 0.000 pd sdmw
>> 14:26:50.131044 R inode 37:2513579736 128 17.181
>> 0.000 0.000 0.000 pd sddv
>> 14:26:50.138181 R inode 72:3868810400 64 10.951
>> 0.000 0.000 0.000 pd sdbz
>> 14:26:50.138188 R inode 131:2443484784 128 11.792
>> 0.000 0.000 0.000 pd sdug
>> 14:26:50.138003 R inode 102:3696843872 64 11.994
>> 0.000 0.000 0.000 pd sdgp
>> 14:26:50.137099 R inode 145:3370922504 64 13.225
>> 0.000 0.000 0.000 pd sdmi
>> 14:26:50.141576 R inode 62:2668579904 64 9.313
>> 0.000 0.000 0.000 pd sdou
>> 14:26:50.134689 R inode 159:2786164648 64 16.577
>> 0.000 0.000 0.000 pd sdpq
>> 14:26:50.145034 R inode 34:2097217320 64 7.409
>> 0.000 0.000 0.000 pd sdmt
>> 14:26:50.138140 R inode 139:2831038792 64 14.898
>> 0.000 0.000 0.000 pd sdlw
>> 14:26:50.130954 R inode 164:282120312 64 22.274
>> 0.000 0.000 0.000 pd sdzd
>> 14:26:50.137038 R inode 41:3421909608 64 16.314
>> 0.000 0.000 0.000 pd sdef
>> 14:26:50.137606 R inode 104:1870962416 64 16.644
>> 0.000 0.000 0.000 pd sdgx
>> 14:26:50.141306 R inode 65:2276184264 64 16.593
>> 0.000 0.000 0.000 pd sdrk
>>
>>
>> >
>> > Last but not least.. and this is what i really would like to
>> > accomplish, i would to be able to monitor the latency of metadata
>> operations.
>>
>> you can't do this on the server side as you don't know how much time
>> you spend on the client , network or anything between the app and the
>> physical disk, so you can only reliably look at this from the client,
>> the iohist output only shows you the Server disk i/o processing time,
>> but that can be a fraction of the overall time (in other cases this
>> obviously can also be the dominant part depending on your workload).
>>
>> the easiest way on the client is to run
>>
>> mmfsadm vfsstats enable
>> from now on vfs stats are collected until you restart GPFS.
>>
>> then run :
>>
>> vfs statistics currently enabled
>> started at: Fri Aug 29 13:15:05.380 2014
>> duration: 448446.970 sec
>>
>> name calls time per call total time
>> -------------------- -------- -------------- --------------
>> statfs 9 0.000002 0.000021
>> startIO 246191176 0.005853 1441049.976740
>>
>> to dump what ever you collected so far on this node.
>>
>> > In my environment there are users that litterally overhelm our
>> > storages with metadata request, so even if there is no massive
>> > throughput or huge waiters, any "ls" could take ages. I would like
>> > to be able to monitor metadata behaviour. There is a way to to do
>> > that from the NSD servers?
>>
>> not this simple as described above.
>>
>> >
>> > Thanks in advance for any tip/help.
>> >
>> > Regards,
>> > Salvatore_______________________________________________
>> > gpfsug-discuss mailing list
>> > gpfsug-discuss at gpfsug.org
>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
--
Dr Orlando Richards
Research Facilities (ECDF) Systems Leader
Information Services
IT Infrastructure Division
Tel: 0131 650 4994
skype: orlando.richards
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the gpfsug-discuss
mailing list