[gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized
Billich Heinrich Rainer (PSI)
heiner.billich at psi.ch
Tue Jan 9 08:24:22 GMT 2018
Hello,
I just upgraded to 4.2.3-5 and now see many failures ‘ib_rdma_nic_unrecognized’ in mmhealth, like
Component Status Status Change Reasons
------------------------------------------------------------------------------------------
NETWORK DEGRADED 2018-01-06 15:57:21 ib_rdma_nic_unrecognized(mlx4_0/1)
mlx4_0/1 FAILED 2018-01-06 15:57:21 ib_rdma_nic_unrecognized
I didn’t see this messages with 4.2.3-4. The relevant lines in /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py changed between -4 and -5.
What seems to happen: I have Mellanox VPI cards with one port Infiniband and one port Ethernet. mmhealth complains about the Ethernet port. Hmm – I did specify the active Infiniband ports only in verbsPorts, I don’t see why mmhealth cares about any other ports when it checks RDMA.
So probably a bug, I’ll open a PMR unless somebody points me to a different solution. I tried but I can’t hide this event in mmhealth.
Cheers,
Heiner
--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232 Villigen PSI
056 310 36 02
https://www.psi.ch
More information about the gpfsug-discuss
mailing list