[gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized

Bryan Banister bbanister at jumptrading.com
Tue Jan 9 15:51:03 GMT 2018


I can't help but comment that it's amazing that GPFS is using a txt config file instead of requiring a command run that stores config data into a non-editable (but still editable) flat file database... Wow 2018!!

Hahahahaha!
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Mathias Dietz
Sent: Tuesday, January 09, 2018 3:44 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized

Note: External Email
________________________________
Hello Heiner,

with 4.2.3-5 mmhealth is always monitoring all ports of a configured IB adapter even if the port is not specified in verbsPorts.
Development has implemented a fix which is planned to be part of 4.2.3-7 (February).

To get rid of the false alarm in the meantime you could disable the Infiniband monitoring altogether.

To disable Infiniband monitoring on a node:
1. Open the file /var/mmfs/mmsysmon/mmsysmonitor.conf
2. Locate the [network]section
3. Add below: ib_rdma_enable_monitoring=False
4. Save file and run "mmsysmoncontrol restart"

If you have questions feel free to contact me directly by email.

Mit freundlichen Grüßen / Kind regards

Mathias Dietz

Spectrum Scale RAS Architect & Release Lead Architect (4.2.3/5.0)
---------------------------------------------------------------------------
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdietz at de.ibm.com<mailto:mdietz at de.ibm.com>
-----------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk WittkoppSitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294



From:        "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch<mailto:heiner.billich at psi.ch>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        01/09/2018 09:31 AM
Subject:        [gpfsug-discuss] mmhealth with 4.2.3-5 gives many false alarms ib_rdma_nic_unrecognized
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



Hello,

I just upgraded to 4.2.3-5 and now see many failures 'ib_rdma_nic_unrecognized' in mmhealth,  like


Component        Status        Status Change            Reasons
------------------------------------------------------------------------------------------
NETWORK          DEGRADED      2018-01-06 15:57:21      ib_rdma_nic_unrecognized(mlx4_0/1)
 mlx4_0/1       FAILED        2018-01-06 15:57:21      ib_rdma_nic_unrecognized


I didn't see this messages with 4.2.3-4. The relevant lines in /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py changed between -4 and -5.

What seems to happen: I have Mellanox VPI cards with one port Infiniband and one port Ethernet.  mmhealth complains about the Ethernet port.  Hmm - I did specify the active Infiniband ports only in verbsPorts, I don't see why mmhealth cares about any other ports when it checks RDMA.

So probably a bug, I'll open a PMR unless somebody points me to a different solution.  I tried but I can't hide this event in mmhealth.

Cheers,
Heiner

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch<https://www.psi.ch/>





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180109/730808ed/attachment-0002.htm>


More information about the gpfsug-discuss mailing list