[gpfsug-discuss] Services on DSS/ESS nodes

Andrew Beattie abeattie at au1.ibm.com
Fri Oct 2 23:19:15 BST 2020



Jonathan,
I suggest you get a formal statement from Lenovo as the DSS-G Platform is
no longer an IBM platform.

But for ESS based platforms the answer would be, it is not supported to run
anything on the IO Servers other than GNR and the relevant Scale management
services, due to the fact that if you lose an IO Server, or if you in an
extended maintenance window the Server needs to host all the work that
would be being performed by both IO servers.

I don't know if Lenovo have different point if view.

Regards,

Andrew

Sent from my iPhone

> On 3 Oct 2020, at 02:14, Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
wrote:
>
> 
> What if any are the rules around running additional services on DSS/ESS
> nodes with regard to support? Let me outline our scenario
>
> Our main cluster uses 10Gbps ethernet for storage with the DSS-G nodes
> hooked up with redundant 40Gbps ethernet.
>
> However we have an older cluster that is used for undergraduate teaching
> that only has 1Gbps ethernet and QDR Infiniband. With no money to
> upgrade this to 10Gbps ethernet to support this we flipped one of the
> ports on the ConnectX4 cards on each DSS-G node to Infiniband and using
> IPoIB run the teaching nodes in this way.
>
> However it means that we need an Ethernet to Infiniband gateway as the
> ethernet only connected nodes want to talk to the Infiniband connected
> ones on their Infiniband address. Not a problem we grabbed an old spare
> machine installed CentOS and configured it up to act as a bridge, and
> deploy a custom route to all the ethernet only connected nodes. It has
> been working fine for a couple of years now.
>
> The problem is that this becomes firstly a single point of failure, on
> hardware that is six years old now. Secondly to apply updates on the
> gateway machine means all the teaching nodes have to be drained and GPFS
> umounted to reboot the machine after updates have been installed. It is
> currently not getting patched as frequently as I would like (and
> required by the Scottish government).
>
> So thinking about it I have come to the conclusion that the ideal
> situation would be to use the DSS-G nodes as the gateway and run
> keepalived to move the gateway ethernet IP address between the two
> machines. It is idea because as long as one DSS-G node is up then there
> is a functioning gateway and nodes don't get ejected from the cluster.
> If both DSS-G nodes are down then there is no GPFS to mount anyway and
> lack of a gateway is a moot point.
>
> I grabbed a couple of the teaching compute nodes in the summer and
> trialed it out. It works a treat.
>
> I now need to check IBM are not going to throw a wobbler down the line
> if I need to get support before deploying it to the DSS-G nodes :-)
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20201002/6b69501c/attachment-0002.htm>


More information about the gpfsug-discuss mailing list