[gpfsug-discuss] [EXTERNAL] Anyone using RoCE?

Bolinches, Luis (WorldQuant) Luis.Bolinches at worldquant.com
Fri Jan 23 19:23:06 GMT 2026


Hi

We are

Somethings is odd

We had pretty good results. Latency halved which had a great improvement on throughput

We use two separated fabrics. Configuration was not straight forward as this case is not covered by essgennetworks but worth it

In your case you are down to one fabric as clients have one 25G port



________________________________
From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of Luke Sudbery <l.r.sudbery at bham.ac.uk>
Sent: Friday, January 23, 2026 5:38:43 PM
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: [EXTERNAL] [gpfsug-discuss] Anyone using RoCE?


Is anyone using RoCE with good results? We are planning on it, but initial tests are not great – we get much better performance using plain Ethernet over the exact same links.



It’s up and working, I can see RDMA connections and counters, no errors, but performance is unstable. And worse than Ethernet, which was just meant to be a sanity check!



Things I’ve looked at based on Lenovo and IBM guides, which I think are all configured correctly:

  *   RoCE interfaces all on the same subnet
  *   They all have IPv6 enabled with  addresses using eui64 addr-gen-mode
  *   DSCP trust mode on NICs
  *   PFC flow control on NICs
  *   Global Pause disabled on NICs
  *   ToS configured for RDMA_CM
  *   Source based routing for multiple interfaces on the same subnet.
  *   Switches (nvidia cumulus) all enabled for RoCE QOS



Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line speed of the NIC in question – 25Gbps. Testing basic RDMA connections with ib_send_bw gets about the same. But GPFS over RoCE gets from 0.7GB/s to 1.9GB/s.



The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What’s frustrating and confusing is that we get better performance when we just enable 1 card at the server end, and also get better performance if we have 1 fabric ID per NIC on the server (with all 4 fabric ID on the same NIC at the client end).



I can go into more details if anyone has experience! Does this sound familiar to anyone? I am planning to open a call with Lenovo and/or IBM as I’m not quite sure where to look next.



Cheers,



Luke



--

Luke Sudbery

Principal Engineer (HPC and Storage).

Architecture, Infrastructure and Systems

Advanced Research Computing, IT Services

Room 132, Computer Centre G5, Elms Road



Please note I don’t work on Monday.




###################################################################################

The information contained in this communication is confidential, may be

subject to legal privilege, and is intended only for the individual named.

If you are not the named addressee, please notify the sender immediately and

delete this email from your system.  The views expressed in this email are

the views of the sender only.  Outgoing and incoming electronic communications

to this address are electronically archived and subject to review and/or disclosure

to someone other than the recipient.

###################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20260123/0e0a9cc5/attachment-0001.html>


More information about the gpfsug-discuss mailing list