[gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized

Mathias Dietz MDIETZ at de.ibm.com
Tue Jan 9 09:43:58 GMT 2018


Hello Heiner,

with 4.2.3-5 mmhealth is always monitoring all ports of a configured IB 
adapter even if the port is not specified in verbsPorts. 
Development has implemented a fix which is planned to be part of 4.2.3-7 
(February).

To get rid of the false alarm in the meantime you could disable the 
Infiniband monitoring altogether. 

To disable Infiniband monitoring on a node:
1. Open the file /var/mmfs/mmsysmon/mmsysmonitor.conf
2. Locate the [network]section 
3. Add below: ib_rdma_enable_monitoring=False 
4. Save file and run "mmsysmoncontrol restart"

If you have questions feel free to contact me directly by email. 

Mit freundlichen Grüßen / Kind regards

Mathias Dietz

Spectrum Scale RAS Architect & Release Lead Architect (4.2.3/5.0)
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk 
WittkoppSitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
Stuttgart, HRB 243294



From:   "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   01/09/2018 09:31 AM
Subject:        [gpfsug-discuss] mmhealth with 4.2.3-5 gives many false 
alarms ib_rdma_nic_unrecognized
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hello,

I just upgraded to 4.2.3-5 and now see many failures 
?ib_rdma_nic_unrecognized? in mmhealth,  like


Component        Status        Status Change            Reasons
------------------------------------------------------------------------------------------
NETWORK          DEGRADED      2018-01-06 15:57:21 
ib_rdma_nic_unrecognized(mlx4_0/1)
  mlx4_0/1       FAILED        2018-01-06 15:57:21 
ib_rdma_nic_unrecognized


I didn?t see this messages with 4.2.3-4. The relevant lines in 
/usr/lpp/mmfs/lib/mmsysmon/NetworkService.py changed between -4 and -5.

What seems to happen: I have Mellanox VPI cards with one port Infiniband 
and one port Ethernet.  mmhealth complains about the Ethernet port.  Hmm ? 
I did specify the active Infiniband ports only in verbsPorts, I don?t see 
why mmhealth cares about any other ports when it checks RDMA.

So probably a bug, I?ll open a PMR unless somebody points me to a 
different solution.  I tried but I can?t hide this event in mmhealth.

Cheers,
Heiner

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch
 




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180109/6ef32c83/attachment-0002.htm>


More information about the gpfsug-discuss mailing list