[gpfsug-discuss] WAS: alternative path; Now: RDMA

Mon Dec 13 23:55:23 GMT 2021

On 13/12/2021 00:03, Andrew Beattie wrote:

> What is the main outcome or business requirement of the teaching cluster 
> ( i notice your specific in the use of defining it as a teaching cluster)
> It is entirely possible that the use case for this cluster does not 
> warrant the use of high speed low latency networking, and it simply 
> needs the benefits of a parallel filesystem.

While we call it the "teaching cluster" it would be more appropriate to 
call them "teaching nodes" that shares resources (storage and login 
nodes) with the main research cluster. It's mainly used by 
undergraduates doing final year projects and M.Sc. students. It's 
getting a bit long in the tooth now but not many undergraduates have 
access to a 16 core machine with 64GB of RAM. Even if they did being 
able to let something go flat out for 48 hours means there personal 
laptop is available for other things :-)

I was just musing that the cards in the teaching nodes being Intel 
82599ES would be a stumbling block for RDMA over Ethernet, but on 
checking the Intel X710 doesn't do RDMA either so it would all be a bust 
anyway. I was clearly on the crack pipe when I thought they did. So 
aside from the DSS-G and GPU nodes with Connect-X4 cards nothing does RDMA.

[SNIP]

> For some of my research clients this is the ability to run 20-30% more 
> compute jobs on the same HPC resources in the same 24H period, which 
> means that they can reduce the amount of time they need on the HPC 
> cluster to get the data results that they are looking for.

Except as I said in our cluster the storage servers have never been 
maxed out except when running benchmarks. Individual compute nodes have 
been maxed out (mainly Gaussian writing 800GB temporary files) but as I 
explained that's a good thing from my perspective because I don't want 
one or two users to be able to pound the storage into oblivion and cause 
problems for everyone else.

We have enough problems with users tanking the login nodes by running 
computations on them. That should go away with our upgrade to RHEL8 and 
the wonders of per user cgroups; me I love systemd.

In the end nobody has complained that the storage speed is a problem 
yet, and putting the metadata on SSD would be my first port of call if 
they did and funds where available to make things go faster.

To be honest I think the users are just happy that GPFS doesn't eat 
itself and be out of action for a few weeks every couple of years like 
Lustre did on the previous system.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG