[gpfsug-discuss] Strange performance issue on interface nodes

Thu Dec 19 17:00:17 GMT 2013

This morning we started getting complaints that NFS mounts from our GPFS filesystem were hanging.  After investigating I found that our interface nodes (we have 2) had load averages over 800, but there was essentially no CPU usage (>99% idle).  I rebooted the nodes, which restored service, but in less than an hour the load averages have already climbed higher than 70. I have no waiters:

[root at d1prpstg2nsd2 ~]# mmlsnode -N waiters -L 2>/dev/null
[root at d1prpstg2nsd2 ~]#

I also have a lot of nfsd and smbd processes in a 'D' state.  One of my interface servers also shows the following processes:
root     29901  0.0  0.0 105172   620 ?        D<   10:03   0:00 touch /rsrch1/cnfsSharedRoot/.ha/recovery/10.113.115.56/10.113.115.57.tmp
root     30076  0.0  0.0 115688   860 ?        D    10:03   0:00 ls -A -I *.tmp /rsrch1/cnfsSharedRoot/.ha/recovery/10.113.115.56

Those processes have been running almost an hour.

The cluster is running GPFS 3.5.12, there are 6 NSD servers and 2 interface servers.

Does anyone have thoughts as to what is going on?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20131219/685438c4/attachment-0002.htm>