[gpfsug-discuss] How to properly debug CES / Ganesha?

Tue Aug 29 10:15:09 BST 2023

Hi,

To identify which address sends most packages to and from a protocol
node, I use a variation of this:

| tcpdump -c 20000 -i <interface> 2>/dev/null | grep IP | cut -d' ' -f3 | sort | uniq -c | sort -nr | head -10

(Collect 20.000 packages, pick out sender address and port, sort and
count those, make a top 10 list.)

You could limit to only NFS traffic by adding "port nfs" at the end of
the "tcpdump" command, but then you would not see e.g SMB clients with a
lot of traffic, if there are any of those.

> Hallo,
>
> since some time we do have seemingly random issues with a particular
> customer accessing data over Ganesha / CES (5.1.8). What happens is
> that the CES server owning their IP gets a very high cpu load, and
> every operation on the NFS clients become sluggish. It does seem not
> related to throughput, and looking at the metrics [*] I do not see a
> correlation with e.g. increased NFS ops. I see no events in GPFS, and
> nothing suspicious in the ganesha and gpfs log files.
>
> What would be a good procedure to identify the misbehaving client (I
> suspect NFS, as it seems there is only 1 idle SMB client)? I have put
> now LOGLEVEL=INFO in ganesha to see if I catch anything interesting,
> but I would be curious on how this kind of apparently random issues
> could be better debugged and restricted to a client
>
> Thanks a lot!
>
> regards
>
> leo
>
> [*]
>
> for i in read write; do for j in ops queue lat req err; do mmperfmon
> query "ces-server|NFSIO|/export/path|NFSv41|nfs_${i}_$j"
> 2023-08-25-14:40:00 2023-08-25-15:05:00 -b60; done; done

-- 
Regards,

Helge Hauglin

----------------------------------------------------------------
Mr. Helge Hauglin, Senior Engineer
System administrator
Center for Information Technology, University of Oslo, Norway