[gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

Uwe Falke UWEFALKE at de.ibm.com
Thu Nov 12 01:56:46 GMT 2020


Hi, Kamil, 
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes. 
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client. 

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters 

That might give you a first idea what is taking long and which components 
are involved. 

Also, 
mmdiag --iohist 
shows you the last IOs and some stats (service time, size) for them. 

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example). 
GPFS has a nice tracing which you can configure or just run the default 
trace. 

Running a dedicated (low-level) io trace can be achieved by 
mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N 
<your_critical_node>
then, when the issue is seen, stop the trace by 
mmtracectl --stop   -N <your_critical_node>

Do not wait  to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output). 


There you should see lines with  FIO which carry the inode of the related 
file after the "tag" keyword.
example: 
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path. 
you need to build this first if not yet done: 
cd /usr/lpp/mmfs/samples/util ; make
, then run 
./tsfindinode -i <inode_num> <fs_mount_point>

For the IO trace analysis there is an older tool : 
/usr/lpp/mmfs/samples/debugtools/trsum.awk. 

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit. 

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefalke at de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Czauz, Kamil" <Kamil.Czauz at Squarepoint-Capital.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/11/2020 23:36
Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance with 
high cpu usage of       mmfsd process
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete.   This affects every gpfs mount 
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.
 
The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science.  I often end up killing many processes before finding the 
offending one.
 
My question is really about finding the offending process easier.  Is 
there something similar to iotop or a trace that I can enable that can 
tell me what files/processes and being heavily used by the mmfsd process 
on the client?
 
-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation. 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 








More information about the gpfsug-discuss mailing list