<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; ">

<div>This morning we started getting complaints that NFS mounts from our GPFS filesystem were hanging.  After investigating I found that our interface nodes (we have 2) had load averages over 800, but there was essentially no CPU usage (>99% idle).  I rebooted

 the nodes, which restored service, but in less than an hour the load averages have already climbed higher than 70. I have no waiters:</div>

<div><br>

</div>

<div>

<div>[root@d1prpstg2nsd2 ~]# mmlsnode -N waiters -L 2>/dev/null</div>

<div>[root@d1prpstg2nsd2 ~]# </div>

</div>

<div><br>

</div>

<div>I also have a lot of nfsd and smbd processes in a 'D' state.  One of my interface servers also shows the following processes:</div>

<div>root     29901  0.0  0.0 105172   620 ?        D<   10:03   0:00 touch /rsrch1/cnfsSharedRoot/.ha/recovery/10.113.115.56/10.113.115.57.tmp</div>

<div>root     30076  0.0  0.0 115688   860 ?        D    10:03   0:00 ls -A -I *.tmp /rsrch1/cnfsSharedRoot/.ha/recovery/10.113.115.56</div>

<div><br>

</div>

<div>Those processes have been running almost an hour.</div>

<div><br>

</div>

<div>The cluster is running GPFS 3.5.12, there are 6 NSD servers and 2 interface servers.</div>

<div><br>

</div>

<div>Does anyone have thoughts as to what is going on?</div>

</body>

</html>