[gpfsug-discuss] [EXTERNAL] Anyone using RoCE?
Bolinches, Luis (WorldQuant)
Luis.Bolinches at worldquant.com
Fri Jan 23 19:23:06 GMT 2026
Hi
We are
Somethings is odd
We had pretty good results. Latency halved which had a great improvement on throughput
We use two separated fabrics. Configuration was not straight forward as this case is not covered by essgennetworks but worth it
In your case you are down to one fabric as clients have one 25G port
________________________________
From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of Luke Sudbery <l.r.sudbery at bham.ac.uk>
Sent: Friday, January 23, 2026 5:38:43 PM
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: [EXTERNAL] [gpfsug-discuss] Anyone using RoCE?
Is anyone using RoCE with good results? We are planning on it, but initial tests are not great – we get much better performance using plain Ethernet over the exact same links.
It’s up and working, I can see RDMA connections and counters, no errors, but performance is unstable. And worse than Ethernet, which was just meant to be a sanity check!
Things I’ve looked at based on Lenovo and IBM guides, which I think are all configured correctly:
* RoCE interfaces all on the same subnet
* They all have IPv6 enabled with addresses using eui64 addr-gen-mode
* DSCP trust mode on NICs
* PFC flow control on NICs
* Global Pause disabled on NICs
* ToS configured for RDMA_CM
* Source based routing for multiple interfaces on the same subnet.
* Switches (nvidia cumulus) all enabled for RoCE QOS
Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line speed of the NIC in question – 25Gbps. Testing basic RDMA connections with ib_send_bw gets about the same. But GPFS over RoCE gets from 0.7GB/s to 1.9GB/s.
The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What’s frustrating and confusing is that we get better performance when we just enable 1 card at the server end, and also get better performance if we have 1 fabric ID per NIC on the server (with all 4 fabric ID on the same NIC at the client end).
I can go into more details if anyone has experience! Does this sound familiar to anyone? I am planning to open a call with Lenovo and/or IBM as I’m not quite sure where to look next.
Cheers,
Luke
--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road
Please note I don’t work on Monday.
###################################################################################
The information contained in this communication is confidential, may be
subject to legal privilege, and is intended only for the individual named.
If you are not the named addressee, please notify the sender immediately and
delete this email from your system. The views expressed in this email are
the views of the sender only. Outgoing and incoming electronic communications
to this address are electronically archived and subject to review and/or disclosure
to someone other than the recipient.
###################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20260123/0e0a9cc5/attachment-0001.html>
More information about the gpfsug-discuss
mailing list