[gpfsug-discuss] Joining RDMA over different networks?

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Tue Aug 22 10:28:38 BST 2023


On 22/08/2023 00:27, Ryan Novosielski wrote:

> I still have the guide from that system, and I saved some of the routing 
> scripts and what not. But really, it wasn’t much more complicated than 
> Ethernet routing.
> 
> The routing nodes, I guess obviously, had both Omnipath and Infiniband 
> interfaces. Compute knifes themselves I believe used a supervisord 
> script, if I’m remembering that name right, to try to balance out which 
> routing nide ione would use as a gateway. There were two as it was 
> configured when I got to it, but a larger number was possible.
> 

Having done it in a limited fashion previously I would recommend that 
you have two routing nodes and use keepalived on at least the Ethernet 
side with VRRP to try and maintain some redundancy.

Otherwise you get in a situation where you are entirely dependent on a 
single node which you can't reboot without a GPFS shutdown. Cyber 
security makes that an untenable position these days.

In our situation our DSS-G nodes where both Ethernet and Infiniband 
connected and we had a bunch of nodes that where using Infiniband for 
the data traffic and Ethernet for the management interface at 1Gbps. 
Everything else was on 10Gbps or better Ethernet. We therefore needed 
the Ethernet only connected nodes to be able to talk to the Infiniband 
connected nodes data interface.

Due to the way routing works on Linux when the Infiniband nodes 
attempted to connect to the Ethernet connected only nodes it went via 
the 1Gbps Ethernet interface.

So after a while and issues with a single gateway machine we switched to 
making it redundant. Basically the Ethernet only connected nodes had a 
custom route to reach the Infiniband network, and the DSS-G nodes where 
doing the forwarding and then had keepalived running VRRP to move the IP 
address around on the Ethernet side so there was redundancy in the 
gateway. The amount of traffic transiting the gateway was actually tiny 
because all the filesystem data was coming from the DSS-G nodes that 
were Infiniband connected :-)

I have no idea if you can do the equivalent of VRRP on Infiniband and 
Omnipath.

In the end the Infiniband nodes (a bunch of C6220's used to support 
undergraduate/MSc projects and classes) had to be upgraded to 10Gbps 
Ethernet as RedHat dropped support for the Intel Truescale Infiniband 
adapters in RHEL8. We don't let the student's run multinode jobs anyway 
so the loss of the Infiniband was not an issue. Though with the enforced 
move away from RHEL means we will get it back


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG




More information about the gpfsug-discuss mailing list