<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><font face="monospace">Hallo,</font></p>
<p><font face="monospace">since some time we do have seemingly
random issues with a particular customer accessing data over
Ganesha / CES (5.1.8). What happens is that the CES server
owning their IP gets a very high cpu load, and every operation
on the NFS clients become sluggish. It does seem not related to
throughput, and looking at the metrics [*] I do not see a
correlation with e.g. increased NFS ops. I see no events in
GPFS, and nothing suspicious in the ganesha and gpfs log files.
<br>
</font></p>
<p><font face="monospace">What would be a good procedure to identify
the misbehaving client (I suspect NFS, as it seems there is only
1 idle SMB client)? I have put now LOGLEVEL=INFO in ganesha to
see if I catch anything interesting, but I would be curious on
how this kind of apparently random issues could be better
debugged and restricted to a client</font></p>
<p><font face="monospace">Thanks a lot!</font></p>
<p><font face="monospace">regards</font></p>
<p><font face="monospace">leo<br>
</font></p>
<p><font face="monospace">[*]<br>
</font></p>
<p><font face="monospace">for i in read write; do for j in ops queue
lat req err; do mmperfmon query
"ces-server|NFSIO|/export/path|NFSv41|nfs_${i}_$j"
2023-08-25-14:40:00 2023-08-25-15:05:00 -b60; done; done<br>
</font><br>
<br>
</p>
<pre class="moz-signature" cols="72">--
Paul Scherrer Institut
Dr. Leonardo Sala
Group Leader Data Analysis and Research Infrastructure
Group Leader Data Curation a.i.
Deputy Department Head Science IT Infrastructure and Services department
Science IT Infrastructure and Services department (AWI)
WHGA/036
Forschungstrasse 111
5232 Villigen PSI
Switzerland
Phone: +41 56 310 3369
<a class="moz-txt-link-abbreviated" href="mailto:leonardo.sala@psi.ch">leonardo.sala@psi.ch</a>
<a class="moz-txt-link-abbreviated" href="http://www.psi.ch">www.psi.ch</a></pre>
</body>
</html>