[gpfsug-discuss] Services on DSS/ESS nodes
jonathan.buzzard at strath.ac.uk
Wed Oct 7 13:14:45 BST 2020
On 07/10/2020 11:28, Simon Thompson wrote:
> Agreed ...
> Report to me a pdisk is failing in my monitoring dashboard we use for *everything else*.
> Tell me that kswapd is having one of those days.
> Tell me rsyslogd has stopped sending for some reason.
> Tell me if there are long waiters on the hosts.
> Read the ipmi status of the host to tell me an OS drive is failed, or the CMOS battery is flat or ...
> Whilst the GUI has a bunch of this stuff, in the real world the rest of us have reporting and dashboarding from many more systems...
The problem is the developers know as much about looking after a system
in the real world as a tea leaf knows the history of the East India
Company. IMHO to even ask the question shows a total lack of
understanding of the issue.
Consequently developers in their ivory towers have a habit of developing
things that are as useful as a chocolate tea pot. Which putting it
bluntly a competent sysadmins makes them look like a bunch of twits. I
would note this is not a problem unique to IBM, it's developers in general.
The appropriate course of action would be not for IBM to develop a
monitoring tool of their own but to provide a bunch of plugins for the
popular monitoring tools that customers will already be using to monitor
their whole IT estate.
Heaven forbid they could even run a poll to find out which ones the
actual customers of their products are interested in rather than wasting
effort developing software their customers are not actually interested in.
For my purposes there is I think an alternative. The actual routing of
the IP packets is not a service, it's a kernel configuration to have the
kernel route that packets :-) Keepalived just manages a floating IP
address. There are other options to achieve this. They are clunkier but
they side step IBM's silly rules.
I would however note at this point that at lots of sites all routing in
the data centre is done using BGP. It comes in part out of the zero
trust paradigm. I guess apparently running fail2ban is not permitted
either. Can I even run firewalld? As you can seen a nothing else policy
quickly becomes unsustainable IMHO.
There is a disjuncture between the developers in their ivory towers and
the real world.
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss