<html><body>Hi Uwe -<BR>
<BR>
I hit the issue again today, no major waiters, and nothing useful from the iohist report. Nothing interesting in the logs either.<BR>
<BR>
I was able to get a trace today while the issue was happening. I took 2 traces a few min apart.<BR>
<BR>
The beginning of the traces look something like this:<BR>
Overwrite trace parameters:<BR>
buffer size: 134217728<BR>
64 kernel trace streams, indices 0-63 (selected by low bits of processor ID)<BR>
128 daemon trace streams, indices 64-191 (selected by low bits of thread ID)<BR>
Interval for calibrating clock rate was 100.019054 seconds and 260049296314 cycles<BR>
Measured cycle count update rate to be 2599997559 per second <---- using this value<BR>
OS reported cycle count update rate as 2599999000 per second<BR>
Trace milestones:<BR>
kernel trace enabled Thu Nov 12 20:56:40.114080000 2020 (TOD 1605232600.114080, cycles 20701293141385220)<BR>
daemon trace enabled Thu Nov 12 20:56:40.247430000 2020 (TOD 1605232600.247430, cycles 20701293488095152)<BR>
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 1605232699.950515, cycles 20701552715873212) <---- useful part of trace extends from here<BR>
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, cycles 20701553190681534) <---- to here<BR>
Approximate number of times the trace buffer was filled: 553.529<BR>
<BR>
<BR>
Here is the output of<BR>
trsum.awk details=0<BR>
I'm not quite sure what to make of it, can you help me decipher it? The 'lookup' operations are taking a hell of a long time, what does that mean?<BR>
<BR>
Capture 1<BR>
<BR>
Unfinished operations:<BR>
<BR>
21234 ***************** lookup ************** 0.165851604<BR>
290020 ***************** lookup ************** 0.151032241<BR>
302757 ***************** lookup ************** 0.168723402<BR>
301677 ***************** lookup ************** 0.070016530<BR>
230983 ***************** lookup ************** 0.127699082<BR>
21233 ***************** lookup ************** 0.060357257<BR>
309046 ***************** lookup ************** 0.157124551<BR>
301643 ***************** lookup ************** 0.165543982<BR>
304042 ***************** lookup ************** 0.172513838<BR>
167794 ***************** lookup ************** 0.056056815<BR>
189680 ***************** lookup ************** 0.062022237<BR>
362216 ***************** lookup ************** 0.072063619<BR>
406314 ***************** lookup ************** 0.114121838<BR>
167776 ***************** lookup ************** 0.114899642<BR>
303016 ***************** lookup ************** 0.144491120<BR>
290021 ***************** lookup ************** 0.142311603<BR>
167762 ***************** lookup ************** 0.144240366<BR>
248530 ***************** lookup ************** 0.168728131<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\00000000FFFFFFFF<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFE<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\00000000FFFFFFFF<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 30018014000 14:48493092752^\FFFFFFFF<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFE<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 2000006B000 2:6058336^\FFFFFFFF<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFE<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\00000000FFFFFFFF<BR>
0 0.000000000 ********* Unfinished IO: buffer/disk 3000002E000 2:1917676744^\FFFFFFFF<BR>
<BR>
Elapsed trace time: 0.182617894 seconds<BR>
Elapsed trace time from first VFS call to last: 0.182617893<BR>
Time idle between VFS calls: 0.000006317 seconds<BR>
<BR>
Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs<BR>
rdwr 0.012021696 35 343.477<BR>
read_inode2 0.000100787 43 2.344<BR>
follow_link 0.000050609 8 6.326<BR>
pagein 0.000097806 10 9.781<BR>
revalidate 0.000010884 156 0.070<BR>
open 0.001001824 18 55.657<BR>
lookup 1.152449696 36 32012.492<BR>
delete_inode 0.000036816 38 0.969<BR>
permission 0.000080574 14 5.755<BR>
release 0.000470096 18 26.116<BR>
mmap 0.000340095 9 37.788<BR>
llseek 0.000001903 9 0.211<BR>
<BR>
<BR>
User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops<BR>
221919 0.000000244 0.050064080 0.00% 100.00% 4<BR>
167794 0.000011891 0.000069707 14.57% 85.43% 4<BR>
309046 0.147664569 0.000074663 99.95% 0.05% 9<BR>
349767 0.000000070 0.000000000 100.00% 0.00% 1<BR>
301677 0.017638372 0.048741086 26.57% 73.43% 12<BR>
84407 0.000010448 0.000016977 38.10% 61.90% 3<BR>
406314 0.000002279 0.000122367 1.83% 98.17% 7<BR>
25464 0.043270937 0.000006200 99.99% 0.01% 2<BR>
362216 0.000005617 0.000017498 24.30% 75.70% 2<BR>
379982 0.000000626 0.000000000 100.00% 0.00% 1<BR>
230983 0.123947465 0.000056796 99.95% 0.05% 6<BR>
21233 0.047877661 0.004887113 90.74% 9.26% 17<BR>
302757 0.154486003 0.010695642 93.52% 6.48% 24<BR>
248530 0.000006763 0.000035442 16.02% 83.98% 3<BR>
303016 0.014678039 0.000013098 99.91% 0.09% 2<BR>
301643 0.088025575 0.054036566 61.96% 38.04% 33<BR>
3339 0.000034997 0.178199426 0.02% 99.98% 35<BR>
21234 0.164240073 0.000262711 99.84% 0.16% 39<BR>
167762 0.000011886 0.000041865 22.11% 77.89% 3<BR>
336006 0.000001246 0.100519562 0.00% 100.00% 16<BR>
304042 0.121322325 0.019218406 86.33% 13.67% 33<BR>
301644 0.054325242 0.087715613 38.25% 61.75% 37<BR>
301680 0.000015005 0.020838281 0.07% 99.93% 9<BR>
290020 0.147713357 0.000121422 99.92% 0.08% 19<BR>
290021 0.000476072 0.000085833 84.72% 15.28% 10<BR>
44777 0.040819757 0.000010957 99.97% 0.03% 3<BR>
189680 0.000000044 0.000002376 1.82% 98.18% 1<BR>
241759 0.000000698 0.000000000 100.00% 0.00% 1<BR>
184839 0.000001621 0.150341986 0.00% 100.00% 28<BR>
362220 0.000010818 0.000020949 34.05% 65.95% 2<BR>
104687 0.000000495 0.000000000 100.00% 0.00% 1<BR>
<BR>
# total App-read/write = 45 Average duration = 0.000269322 sec<BR>
# time(sec) count % %ile read write avgBytesR avgBytesW<BR>
0.000500 34 0.755556 0.755556 34 0 32889 0<BR>
0.001000 10 0.222222 0.977778 10 0 108136 0<BR>
0.004000 1 0.022222 1.000000 1 0 8 0<BR>
<BR>
# max concurrant App-read/write = 2<BR>
# conc count % %ile<BR>
1 38 0.844444 0.844444<BR>
2 7 0.155556 1.000000<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
Capture 2<BR>
<BR>
Unfinished operations:<BR>
<BR>
335096 ***************** lookup ************** 0.289127895<BR>
334691 ***************** lookup ************** 0.225380797<BR>
362246 ***************** lookup ************** 0.052106493<BR>
334694 ***************** lookup ************** 0.048567769<BR>
362220 ***************** lookup ************** 0.054825580<BR>
333972 ***************** lookup ************** 0.275355791<BR>
406314 ***************** lookup ************** 0.283219905<BR>
334686 ***************** lookup ************** 0.285973208<BR>
289606 ***************** lookup ************** 0.064608288<BR>
21233 ***************** lookup ************** 0.074923689<BR>
189680 ***************** lookup ************** 0.089702578<BR>
335100 ***************** lookup ************** 0.151553955<BR>
334685 ***************** lookup ************** 0.117808430<BR>
167700 ***************** lookup ************** 0.119441314<BR>
336813 ***************** lookup ************** 0.120572137<BR>
334684 ***************** lookup ************** 0.124718126<BR>
21234 ***************** lookup ************** 0.131124745<BR>
84407 ***************** lookup ************** 0.132442945<BR>
334696 ***************** lookup ************** 0.140938740<BR>
335094 ***************** lookup ************** 0.201637910<BR>
167735 ***************** lookup ************** 0.164059859<BR>
334687 ***************** lookup ************** 0.252930745<BR>
334695 ***************** lookup ************** 0.278037098<BR>
341818 0.291815990 ********* Unfinished IO: buffer/disk 50000015000 3:439888512^\scratch_metadata_5<BR>
341818 0.291822084 MSG FSnd: nsdMsgReadExt msg_id 2894690129 Sduration 199.688 + us<BR>
100041 0.025061905 MSG FRep: nsdMsgReadExt msg_id 1012644746 Rduration 266959.867 us Rlen 0 Hduration 266963.954 + us<BR>
<BR>
Elapsed trace time: 0.292021772 seconds<BR>
Elapsed trace time from first VFS call to last: 0.292021771<BR>
Time idle between VFS calls: 0.001436519 seconds<BR>
<BR>
Operations stats: total time(s) count avg-usecs wait-time(s) avg-usecs<BR>
rdwr 0.000831801 4 207.950<BR>
read_inode2 0.000082347 31 2.656<BR>
pagein 0.000033905 3 11.302<BR>
revalidate 0.000013109 156 0.084<BR>
open 0.000237969 22 10.817<BR>
lookup 1.233407280 10 123340.728<BR>
delete_inode 0.000013877 33 0.421<BR>
permission 0.000046486 8 5.811<BR>
release 0.000172456 21 8.212<BR>
mmap 0.000064411 2 32.206<BR>
llseek 0.000000391 2 0.196<BR>
readdir 0.000213657 36 5.935<BR>
<BR>
<BR>
User thread stats: GPFS-time(sec) Appl-time GPFS-% Appl-% Ops<BR>
335094 0.053506265 0.000170270 99.68% 0.32% 16<BR>
167700 0.000008522 0.000027547 23.63% 76.37% 2<BR>
167776 0.000008293 0.000019462 29.88% 70.12% 2<BR>
334684 0.000023562 0.000160872 12.78% 87.22% 8<BR>
349767 0.000000467 0.250029787 0.00% 100.00% 5<BR>
84407 0.000000230 0.000017947 1.27% 98.73% 2<BR>
334685 0.000028543 0.000094147 23.26% 76.74% 8<BR>
406314 0.221755229 0.000009720 100.00% 0.00% 2<BR>
334694 0.000024913 0.000125229 16.59% 83.41% 10<BR>
335096 0.254359005 0.000240785 99.91% 0.09% 18<BR>
334695 0.000028966 0.000127823 18.47% 81.53% 10<BR>
334686 0.223770082 0.000267271 99.88% 0.12% 24<BR>
334687 0.000031265 0.000132905 19.04% 80.96% 9<BR>
334696 0.000033808 0.000131131 20.50% 79.50% 9<BR>
129075 0.000000102 0.000000000 100.00% 0.00% 1<BR>
341842 0.000000318 0.000000000 100.00% 0.00% 1<BR>
335100 0.059518133 0.000287934 99.52% 0.48% 19<BR>
224423 0.000000471 0.000000000 100.00% 0.00% 1<BR>
336812 0.000042720 0.000193294 18.10% 81.90% 10<BR>
21233 0.000556984 0.000083399 86.98% 13.02% 11<BR>
289606 0.000000088 0.000018043 0.49% 99.51% 2<BR>
362246 0.014440188 0.000046516 99.68% 0.32% 4<BR>
21234 0.000524848 0.000162353 76.37% 23.63% 13<BR>
336813 0.000046426 0.000175666 20.90% 79.10% 9<BR>
3339 0.000011816 0.272396876 0.00% 100.00% 29<BR>
341818 0.000000778 0.000000000 100.00% 0.00% 1<BR>
167735 0.000007866 0.000049468 13.72% 86.28% 3<BR>
175480 0.000000278 0.000000000 100.00% 0.00% 1<BR>
336006 0.000001170 0.250020470 0.00% 100.00% 16<BR>
44777 0.000000367 0.250149757 0.00% 100.00% 6<BR>
189680 0.000002717 0.000006518 29.42% 70.58% 1<BR>
184839 0.000003001 0.250144214 0.00% 100.00% 35<BR>
145858 0.000000687 0.000000000 100.00% 0.00% 1<BR>
333972 0.218656404 0.000043897 99.98% 0.02% 4<BR>
334691 0.187695040 0.000295117 99.84% 0.16% 25<BR>
<BR>
# total App-read/write = 7 Average duration = 0.000123672 sec<BR>
# time(sec) count % %ile read write avgBytesR avgBytesW<BR>
0.000500 7 1.000000 1.000000 7 0 1172 0<BR>
<BR>
<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: <a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a> <gpfsug-discuss-bounces@spectrumscale.org> On Behalf Of Uwe Falke<BR>
Sent: Wednesday, November 11, 2020 8:57 PM<BR>
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><BR>
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process<BR>
<BR>
Hi, Kamil,<BR>
I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.<BR>
In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.<BR>
<BR>
-> /var/adm/ras/mmfs.log.latest<BR>
mmdiag --waiters<BR>
<BR>
That might give you a first idea what is taking long and which components are involved.<BR>
<BR>
Also,<BR>
mmdiag --iohist<BR>
shows you the last IOs and some stats (service time, size) for them.<BR>
<BR>
Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).<BR>
GPFS has a nice tracing which you can configure or just run the default trace.<BR>
<BR>
Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N <your_critical_node> then, when the issue is seen, stop the trace by<BR>
mmtracectl --stop -N <your_critical_node><BR>
<BR>
Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).<BR>
<BR>
<BR>
There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.<BR>
example:<BR>
0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150<BR>
<BR>
-> inode is 248415<BR>
<BR>
there is a utility , tsfindinode, to translate that into the file path.<BR>
you need to build this first if not yet done:<BR>
cd /usr/lpp/mmfs/samples/util ; make<BR>
, then run<BR>
./tsfindinode -i <inode_num> <fs_mount_point><BR>
<BR>
For the IO trace analysis there is an older tool :<BR>
/usr/lpp/mmfs/samples/debugtools/trsum.awk.<BR>
<BR>
Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)<BR>
<BR>
Hope that halps a bit.<BR>
<BR>
Mit freundlichen Grüßen / Kind regards<BR>
<BR>
Dr. Uwe Falke<BR>
IT Specialist<BR>
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services<BR>
+49 175 575 2877 Mobile<BR>
Rathausstr. 7, 09111 Chemnitz, Germany<BR>
<a href="mailto:uwefalke@de.ibm.com">uwefalke@de.ibm.com</a><BR>
<BR>
IBM Services<BR>
<BR>
IBM Data Privacy Statement<BR>
<BR>
IBM Deutschland Business & Technology Services GmbH<BR>
Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen<BR>
Registergericht: Amtsgericht Stuttgart, HRB 17122<BR>
<BR>
<BR>
<BR>
From: "Czauz, Kamil" <Kamil.Czauz@Squarepoint-Capital.com><BR>
To: "gpfsug-discuss@spectrumscale.org"<BR>
<gpfsug-discuss@spectrumscale.org><BR>
Date: 11/11/2020 23:36<BR>
Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with<BR>
high cpu usage of mmfsd process<BR>
Sent by: <a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a><BR>
<BR>
<BR>
<BR>
We regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a<BR>
ls could take a few minutes to complete. This affects every gpfs mount<BR>
on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.<BR>
<BR>
The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.<BR>
<BR>
My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?<BR>
<BR>
-Kamil<BR>
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website <a href="http://www.squarepoint-capital.com" target="_blank">http://www.squarepoint-capital.com</a>. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.<BR>
_______________________________________________<BR>
gpfsug-discuss mailing list<BR>
gpfsug-discuss at spectrumscale.org<BR>
<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><BR>
<BR>
<BR>
<BR>
<BR>
<BR>
_______________________________________________<BR>
gpfsug-discuss mailing list<BR>
gpfsug-discuss at spectrumscale.org<BR>
<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><BR>
<BR>
<BR>
Confidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website <a href="http://www.squarepoint-capital.com" target="_blank">www.squarepoint-capital.com</a>. Please note that e-mails may be monitored for regulatory and compliance purposes. Thank you for your cooperation.<BR>
<br>
</body></html>