[gpfsug-discuss] Infiniband connection rejected, ibv_create_qp err 13

Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] aaron.s.knister at nasa.gov
Tue Dec 5 13:23:43 GMT 2017



Looks like 13 is EPERM which means apparently permissions didn’t exist to create the QP of the desired type which is odd since mmfsd runs as root. Is there any remote chance SELinux is enabled (e.g. sestatus)? Although I’d think mmfsd would run unconfined in the default policy, but maybe it didn’t transition correctly.

On December 5, 2017 at 08:16:49 EST, Andreas Mattsson <andreas.mattsson at maxiv.lu.se> wrote:

Hi.



Have anyone here experienced having VERBS RDMA connection request rejects on Scale NSD servers with the error message “ibv_create_qp err 13”?

I’m having issues with this on a IBM ESS system.



The error mostly affects only one of the two GSSIO-nodes, and moves with the node even if I put all four of the infiniband links on the same infiniband switch as the working node is connected to.

The issue affects client nodes in different blade-chassis, going through different Infiniband swithes and cables, and also non-blade nodes running a slightly different os-setup and different infiniband HCAs.

MPI-jobs on the client nodes can communicate over the infiniband fabric without issues.

Upgrading all switches and HCAs to the latest firmware and making sure that client nodes have the same OFED-version as the ESS has had no impact on the issue.

When the issue is there, I can still do ibping between the nodes, ibroute gives me a working and correct path between the nodes that get connection rejects, and if I set up IPoIB, ip traffic works on the afflicted interfaces.



I have opened a PMR with IBM on the issue, so asking here is a parallel track for trying to find a solution to this.



Any help or suggestions is appreciated.

Regards,

Andreas Mattsson

_____________________________________________

[mid:d8d07f7e01ec4fcca5ae124f40c2d457 at maxiv.lu.se/part1.08040705.03090509 at maxiv.lu.se]

Andreas Mattsson
Systems Engineer



MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 225 94 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.se<mailto:andreas.mattsson at maxiv.se>
www.maxiv.se<http://www.maxiv.se/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171205/1d5fad85/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 5610 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171205/1d5fad85/attachment-0002.png>


More information about the gpfsug-discuss mailing list