[gpfsug-discuss] rmda errors scatter thru gpfs logs

Simon Thompson (Research Computing - IT Services) S.J.Thompson at bham.ac.uk
Wed Jan 18 08:59:48 GMT 2017


I'd be inclined to look at something like:

ibqueryerrors -s PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors -c

And see if you have a high number of symbol errors, might be a cable needs replugging or replacing.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "J. Eric Wonderley" <eric.wonderley at vt.edu<mailto:eric.wonderley at vt.edu>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Tuesday, 17 January 2017 at 21:16
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs

I have messages like these frequent my logs:
Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136
Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23

Any ideas on cause..?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170118/a9458ccd/attachment-0002.htm>


More information about the gpfsug-discuss mailing list