[gpfsug-discuss] Joining RDMA over different networks?

Kidger, Daniel daniel.kidger at hpe.com
Tue Aug 22 10:51:07 BST 2023


Jonathan,

Thank you for the great answer!
Just to be clear though - are you talking about TCP/IP mounting of the filesystem(s) rather than RDMA ?

I think routing of RDMA is perhaps something only Lustre can do?

Daniel

Daniel Kidger
HPC Storage Solutions Architect, EMEA
daniel.kidger at hpe.com

+44 (0)7818 522266  

hpe.com




-----Original Message-----
From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> On Behalf Of Jonathan Buzzard
Sent: 22 August 2023 10:29
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Joining RDMA over different networks?

On 22/08/2023 00:27, Ryan Novosielski wrote:

> I still have the guide from that system, and I saved some of the 
> routing scripts and what not. But really, it wasn’t much more 
> complicated than Ethernet routing.
> 
> The routing nodes, I guess obviously, had both Omnipath and Infiniband 
> interfaces. Compute knifes themselves I believe used a supervisord 
> script, if I’m remembering that name right, to try to balance out 
> which routing nide ione would use as a gateway. There were two as it 
> was configured when I got to it, but a larger number was possible.
> 

Having done it in a limited fashion previously I would recommend that you have two routing nodes and use keepalived on at least the Ethernet side with VRRP to try and maintain some redundancy.

Otherwise you get in a situation where you are entirely dependent on a single node which you can't reboot without a GPFS shutdown. Cyber security makes that an untenable position these days.

In our situation our DSS-G nodes where both Ethernet and Infiniband connected and we had a bunch of nodes that where using Infiniband for the data traffic and Ethernet for the management interface at 1Gbps. 
Everything else was on 10Gbps or better Ethernet. We therefore needed the Ethernet only connected nodes to be able to talk to the Infiniband connected nodes data interface.

Due to the way routing works on Linux when the Infiniband nodes attempted to connect to the Ethernet connected only nodes it went via the 1Gbps Ethernet interface.

So after a while and issues with a single gateway machine we switched to making it redundant. Basically the Ethernet only connected nodes had a custom route to reach the Infiniband network, and the DSS-G nodes where doing the forwarding and then had keepalived running VRRP to move the IP address around on the Ethernet side so there was redundancy in the gateway. The amount of traffic transiting the gateway was actually tiny because all the filesystem data was coming from the DSS-G nodes that were Infiniband connected :-)

I have no idea if you can do the equivalent of VRRP on Infiniband and Omnipath.

In the end the Infiniband nodes (a bunch of C6220's used to support undergraduate/MSc projects and classes) had to be upgraded to 10Gbps Ethernet as RedHat dropped support for the Intel Truescale Infiniband adapters in RHEL8. We don't let the student's run multinode jobs anyway so the loss of the Infiniband was not an issue. Though with the enforced move away from RHEL means we will get it back


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org 


More information about the gpfsug-discuss mailing list