<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" id="owaParaStyle">P {margin-top:0;margin-bottom:0;}</style>
</head>
<body fpstyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">
<div>Dear experts,</div>
<div>I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3).</div>
<div>The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth:</div>
<div><br>
</div>
<div>[root@sf-gssio1 ~]# mmhealth node show<br>
<br>
Node name: sf-gssio1.psi.ch<br>
Node status: DEGRADED<br>
Status Change: 23 min. ago<br>
<br>
Component Status Status Change Reasons<br>
-------------------------------------------------------------------------------------------------------------------------------------------<br>
GPFS HEALTHY 22 min. ago -<br>
NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2)</div>
<div>[...]<br>
</div>
<div><br>
</div>
<div>This event is clearly an outlier because the network, verbs and IB are correctly working:</div>
<div><br>
</div>
<div>[root@sf-gssio1 ~]# mmfsadm test verbs status<br>
VERBS RDMA status: started</div>
<div><br>
</div>
<div>[root@sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1<br>
verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] <br>
</div>
<div><br>
</div>
<div>[root@sf-gssio1 ~]# mmdiag --config|grep verbsPorts<br>
! verbsPorts mlx5_0/1<br>
</div>
<div><br>
</div>
<div>[root@sf-gssio1 ~]# ibstat mlx5_0<br>
CA 'mlx5_0'<br>
CA type: MT4113<br>
Number of ports: 2<br>
Firmware version: 10.16.1020<br>
Hardware version: 0<br>
Node GUID: 0xec0d9a03002b5db0<br>
System image GUID: 0xec0d9a03002b5db0<br>
Port 1:<br>
State: Active<br>
Physical state: LinkUp<br>
Rate: 56<br>
Base lid: 42<br>
LMC: 0<br>
SM lid: 1<br>
Capability mask: 0x26516848<br>
Port GUID: 0xec0d9a03002b5db0<br>
Link layer: InfiniBand<br>
Port 2:<br>
State: Down<br>
Physical state: Disabled<br>
Rate: 10<br>
Base lid: 65535<br>
LMC: 0<br>
SM lid: 0<br>
Capability mask: 0x26516848<br>
Port GUID: 0xec0d9a03002b5db8<br>
Link layer: InfiniBand<br>
</div>
<div><br>
</div>
<div>That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup).</div>
<div>My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification
(which is not good if some real bad event happens).</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div> Alvise<br>
</div>
</div>
</body>
</html>