[gpfsug-discuss] WAS: alternative path; Now: RDMA

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Sun Dec 12 11:19:07 GMT 2021


On 12/12/2021 02:19, Alec wrote:

> I feel the need to respond here...  I see many responses on this
> User Group forum that are dismissive of the fringe / extreme use
> cases and of the "what do you need that for '' mindset.  The thing is
> that Spectrum Scale is for the extreme, just take the word "Parallel"
> in the old moniker that was already an extreme use case.

I wasn't been dismissive, I was asking what the benefits of using RDMA
where. There is very little information about it out there and not a lot
of comparative benchmarking on it either. Without the benefits being
clearly laid out I am unlikely to consider it and might be missing a trick.

IBM's literature on the topic is underwhelming to say the least.

[SNIP]


> I have an AIX LPAR that traverses more than 300TB+ of data a day on a
> Spectrum Scale file system, it is fully virtualized, and handles a 
> million files.  If that performance level drops, regulatory reports 
> will be late, business decisions won't be current. However, the 
> systems of today and the future have to traverse this much data and 
> if they are slow then they can't keep up with real-time data feeds.

I have this nagging suspicion that modern all flash storage systems
could deliver that sort of performance without the overhead of a
parallel file system.

[SNIP]

> 
> Douglas's response is the right one, how much IO does the
> application / environment need, it's nice to see Spectrum Scale have
> the flexibility to deliver.  I'm pretty confident that if I can't
> deliver the required I/O performance on Spectrum Scale, nobody else
> can on any other storage platform within reasonable limits.
> 

I would note here that in our *shared HPC* environment I made a very 
deliberate design decision to attach the compute nodes with 10Gbps 
Ethernet for storage. Though I would probably pick 25Gbps if we where 
procuring the system today.

There where many reasons behind that, but the main ones being that 
historical file system performance showed that greater than 99% of the 
time the file system never got above 20% of it's benchmarked speed. 
Using 10Gbps Ethernet was not going to be a problem.

Secondly by limiting the connection to 10Gbps it stops one person 
hogging the file system to the detriment of other users. We have seen 
individual nodes peg their 10Gbps link from time to time, even several 
nodes at once (jobs from the same user) and had they had access to a 
100Gbps storage link that would have been curtains for everyone else's 
file system usage.

At this juncture I would note that the GPFS admin traffic is handled by
on separate IP address space on a separate VLAN which we prioritize with 
QOS on the switches. So even when a node floods it's 10Gbps link for 
extended periods of time it doesn't get ejected from the cluster. The 
need for a separate physical network for admin traffic is not necessary 
in my experience.

That said you can do RDMA with Ethernet... Unfortunately the teaching 
cluster and protocol nodes are on Intel X520's which I don't think do 
RDMA. Everything is X710's or Mellanox Connect-X4 which definitely do do 
RDMA. I could upgrade the protocol nodes but the teaching cluster would 
be a problem.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list