[gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized

Billich Heinrich Rainer (PSI) heiner.billich at psi.ch
Tue Jan 9 08:24:22 GMT 2018


Hello,

I just upgraded to 4.2.3-5 and now see many failures ‘ib_rdma_nic_unrecognized’ in mmhealth,  like


Component        Status        Status Change            Reasons
------------------------------------------------------------------------------------------
NETWORK          DEGRADED      2018-01-06 15:57:21      ib_rdma_nic_unrecognized(mlx4_0/1)
  mlx4_0/1       FAILED        2018-01-06 15:57:21      ib_rdma_nic_unrecognized


I didn’t see this messages with 4.2.3-4. The relevant lines in /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py changed between -4 and -5.

What seems to happen: I have Mellanox VPI cards with one port Infiniband and one port Ethernet.  mmhealth complains about the Ethernet port.  Hmm – I did specify the active Infiniband ports only in verbsPorts, I don’t see why mmhealth cares about any other ports when it checks RDMA.

So probably a bug, I’ll open a PMR unless somebody points me to a different solution.  I tried but I can’t hide this event in mmhealth.

Cheers,
Heiner

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch
 






More information about the gpfsug-discuss mailing list