[gpfsug-discuss] Services on DSS/ESS nodes

Wed Oct 7 13:14:45 BST 2020

On 07/10/2020 11:28, Simon Thompson wrote:
> Agreed ...
> 
> Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
> Tell me that kswapd is having one of those days.
> Tell me rsyslogd has stopped sending for some reason.
> Tell me if there are long waiters on the hosts.
> Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...
> 
> Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...
> 

The problem is the developers know as much about looking after a system 
in the real world as a tea leaf knows the history of the East India 
Company. IMHO to even ask the question shows a total lack of 
understanding of the issue.

Consequently developers in their ivory towers have a habit of developing 
things that are as useful as a chocolate tea pot. Which putting it 
bluntly a competent sysadmins makes them look like a bunch of twits. I 
would note this is not a problem unique to IBM, it's developers in general.

The appropriate course of action would be not for IBM to develop a 
monitoring tool of their own but to provide a bunch of plugins for the 
popular monitoring tools that customers will already be using to monitor 
their whole IT estate.

Heaven forbid they could even run a poll to find out which ones the 
actual customers of their products are interested in rather than wasting 
effort developing software their customers are not actually interested in.

For my purposes there is I think an alternative. The actual routing of 
the IP packets is not a service, it's a kernel configuration to have the 
kernel route that packets :-) Keepalived just manages a floating IP 
address. There are other options to achieve this. They are clunkier but 
they side step IBM's silly rules.

I would however note at this point that at lots of sites all routing in 
the data centre is done using BGP. It comes in part out of the zero 
trust paradigm. I guess apparently running fail2ban is not permitted 
either. Can I even run firewalld? As you can seen a nothing else policy 
quickly becomes unsustainable IMHO.

There is a disjuncture between the developers in their ivory towers and 
the real world.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG