[gpfsug-discuss] Services on DSS/ESS nodes

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Sat Oct 3 11:06:41 BST 2020

On 02/10/2020 23:19, Andrew Beattie wrote:
> Jonathan,
> I suggest you get a formal statement from Lenovo as the DSS-G Platform 
> is no longer an IBM platform.
> But for ESS based platforms the answer would be, it is not supported to 
> run anything on the IO Servers other than GNR and the relevant Scale 
> management services, due to the fact that if you lose an IO Server, or 
> if you in an extended maintenance window the Server needs to host all 
> the work that would be being performed by both IO servers.

In the past ~500 days the Infiniband to Ethernet gateway has shifted 
~13GB of data, or about 25MB a day. Meanwhile in the last 470 days the 
DSS-G nodes have each shifted several PB. The proposed additional 
traffic is a drop in the ocean.

On my actual routers which shift much more data (over 300TB externally) 
with an uptime of ~180 days at the moment the CPU time consumed by 
keepalived is just under 31 minutes or about 8 seconds a day. These are 
much punier CPU's too. The proposed additional CPU usage is another drop 
in the ocean.

Given Lenovo sold the *same* configuration with x3650's and SR650's the 
"need all the CPU grunt" is somewhat fishy. Between the bid being 
submitted and actual tender award the SR650's came out and we paid a bit 
extra to uplift to the newer server hardware with exactly the same disk 
configuration. I believe IBM have done the same with the ESS/GNR servers 
too over time the same applies there too.

IMHO given keepalived is a base RHEL package, IBM/Lenovo should be 
offering running Infiniband to Ethernet gateways on the DSS/ESS nodes as 
a supported configuration for mixed network technology clusters :-)

Running a couple extra servers for this purpose is obnoxious from an 
environmental standpoint. That's IBM's green credentials out the window 
if you ask me.

I would note under those rules running a Nagios, Zabbix etc. client on 
the nodes is not permitted either. I would suggest that most sites would 
be rather unhappy about that :-)

 > I don't know if Lenovo have different point if view.

Problem is when I ring up for support on my DSS-G I speak to an IBM 
employee not a Lenovo one :-)


Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

More information about the gpfsug-discuss mailing list