[gpfsug-discuss] How to properly debug CES / Ganesha?

Leonardo Sala leonardo.sala at psi.ch
Fri Aug 25 15:45:27 BST 2023


Hallo,

since some time we do have seemingly random issues with a particular 
customer accessing data over Ganesha / CES (5.1.8). What happens is that 
the CES server owning their IP gets a very high cpu load, and every 
operation on the NFS clients become sluggish. It does seem not related 
to throughput, and looking at the metrics [*] I do not see a correlation 
with e.g. increased NFS ops. I see no events in GPFS, and nothing 
suspicious in the ganesha and gpfs log files.

What would be a good procedure to identify the misbehaving client (I 
suspect NFS, as it seems there is only 1 idle SMB client)? I have put 
now LOGLEVEL=INFO in ganesha to see if I catch anything interesting, but 
I would be curious on how this kind of apparently random issues could be 
better debugged and restricted to a client

Thanks a lot!

regards

leo

[*]

for i in read write; do for j in ops queue lat req err; do mmperfmon 
query "ces-server|NFSIO|/export/path|NFSv41|nfs_${i}_$j" 
2023-08-25-14:40:00 2023-08-25-15:05:00 -b60; done; done


-- 
Paul Scherrer Institut
Dr. Leonardo Sala
Group Leader Data Analysis and Research Infrastructure
Group Leader Data Curation a.i.
Deputy Department Head Science IT Infrastructure and Services department
Science IT Infrastructure and Services department (AWI)
WHGA/036
Forschungstrasse 111
5232 Villigen PSI
Switzerland

Phone: +41 56 310 3369
leonardo.sala at psi.ch
www.psi.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20230825/a9a5a4a6/attachment.htm>


More information about the gpfsug-discuss mailing list