[gpfsug-discuss] gpfsug-discuss Digest, Vol 62, Issue 33

Thu Mar 16 17:18:22 GMT 2017

On Thu, 2017-03-16 at 10:43 -0400, Aaron Knister wrote:
> Perhaps an environment where one has OPA and IB fabrics. Taken from here 
> (https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html):
> 
> RDMA is not supported on a node when both Mellanox HCAs and Intel 
> Omni-Path HFIs are enabled for RDMA.
> 
> The alternative being a situation where multiple IB fabrics exist that 
> require different OFED versions from each other (and most likely from 
> ESS) for support reasons (speaking from experience). That is to say if 
> $VENDOR supports OFED version X on an IB fabric, and ESS/GSS ships with 
> version Y and there's a problem on the IB fabric $VENDOR may point at 
> the different OFED version on the ESS/GSS and say they don't support it 
> and then one is in a bad spot.
> 

Or just use Ethernet for the GPFS traffic everywhere. It's 2017, 10GbE
to compute nodes is cheap enough to be the norm, and you can use 40GbE
and 100GbE everywhere else as required, and note unless you are a pure
SSD system (which must be vanishingly rare at the moment) the latency is
all in the disks anyway.

I can't imagine why you would use IB and/or Omni-Path on anything other
than compute clusters, so using Ethernet for your storage has advantages
too especially if you are using IB. One of the features of Omni-Path
over IB is that it will prioritize MPI traffic over storage, but given
current core counts in CPU's separating storage out onto Ethernet is not
a bad thing anyway as it keeps the MPI bandwidth per core up.

My feeling is that in a compute cluster 10Gb for storage is more than
enough for a node because much more and it would only take a handful of
nodes to denial of service your storage, which is not good. A 500 node
compute cluster with 10GbE for storage can in theory try and hit your
storage for 600GB/s which very very few storage systems will be able to
keep up with.

On a storage GPFS cluster then you can up to 40/100GbE and put in
multiple links for redundancy which is not possible with IB/Omni-path as
far as I am aware which makes it a better solution. Even in a compute
cluster with Ethernet I can stick in multiple connections for my NSD
nodes for redundancy which is an improvement over the IB/Omni-path
options. Especially given in my experience IB links go wonky orders of
magnitude more often than Ethernet ones do. Loose a link to a compute
node I loose one job on the cluster. Loose a link to the NSD server and
I face loosing all the jobs on the cluster...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.