[gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working

Kristy Kallback-Rose kkr at lbl.gov
Wed May 24 20:57:49 BST 2017


Hello,

  We have been experimenting with Zimon and the SS GUI on our dev cluster
under 4.2.3. Things work well with one collector, but I'm running into
issues when trying to use symmetric collector peers, i.e. federation.

  hostA and hostB are setup as both collectors and sensors with each a
collector peer for the other. When this is done I can use mmperfmon to
query hostA from hostA or hostB and vice versa. However, with this
federation setup, the GUI fails to show data. The GUI is running on hostB.
>From the collector candidate pool, hostA has been selected (automatically,
not manually) as can be seen in the sensor configuration file. The GUI is
unable to load data (just shows "Loading" on the graph), *unless* I change
the setting of the ZIMonAddress variable in
/usr/lpp/mmfs/gui/conf/gpfsgui.properties
from localhost to hostA explicitly, it does not work if I change it to
hostB explicitly. The GUI also works fine if I remove the peer entries
altogether and just have one collector.

  I thought that federation meant that no matter which collector was
queried the data would be returned. This appears to work for mmperfmon, but
not the GUI. Can anyone advise? I also don't like the idea of having a pool
of collector candidates and hard-coding one into the GUI configuration. I
am including some output below to show the configs and query results.

Thanks,

Kristy


  The peers are added into the ZIMonCollector.cfg using the default port
9085:

 peers = {

        host = "hostA"

        port = "9085"

 },

 {

        host = "hostB"

        port = "9085"

 }


And the nodes are added as collector candidates, on hostA and hostB you
see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors.
cfg:

colCandidates = "hostA.nersc.gov <http://hosta.nersc.gov/>", "
hostB.nersc.gov <http://hostb.nersc.gov/>"

colRedundancy = 1

collectors = {

host = "hostA.nersc.gov <http://hosta.nersc.gov/>"

port = "4739"

}


Showing the config with mmperfmon config show:

colCandidates = "hostA.nersc.gov <http://hosta.nersc.gov/>", "
hostB.nersc.gov <http://hostb.nersc.gov/>"

colRedundancy = 1

collectors = {

host = ""


Using mmperfmon I can query either host.


[root at hostA ~]#  mmperfmon query cpu -N hostB


Legend:

 1: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_system

 2: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_user

 3: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_contexts



Row           Timestamp cpu_system cpu_user cpu_contexts

  1 2017-05-23-17:03:54       0.54     3.67         4961

  2 2017-05-23-17:03:55       0.63     3.55         6199

  3 2017-05-23-17:03:56       1.59     3.76         7914

  4 2017-05-23-17:03:57       1.38     5.34         5393

  5 2017-05-23-17:03:58       0.54     2.21         2435

  6 2017-05-23-17:03:59       0.13     0.29         2519

  7 2017-05-23-17:04:00       0.13     0.25         2197

  8 2017-05-23-17:04:01       0.13     0.29         2473

  9 2017-05-23-17:04:02       0.08     0.21         2336

 10 2017-05-23-17:04:03       0.13     0.21         2312


[root@ hostB ~]#  mmperfmon query cpu -N hostB


Legend:

 1: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_system

 2: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_user

 3: hostB.nersc.gov <http://hostb.nersc.gov/>|CPU|cpu_contexts



Row           Timestamp cpu_system cpu_user cpu_contexts

  1 2017-05-23-17:04:07       0.13     0.21         2010

  2 2017-05-23-17:04:08       0.04     0.21         2571

  3 2017-05-23-17:04:09       0.08     0.25         2766

  4 2017-05-23-17:04:10       0.13     0.29         3147

  5 2017-05-23-17:04:11       0.83     0.83         2596

  6 2017-05-23-17:04:12       0.33     0.54         2530

  7 2017-05-23-17:04:13       0.08     0.33         2428

  8 2017-05-23-17:04:14       0.13     0.25         2326

  9 2017-05-23-17:04:15       0.13     0.29         4190

 10 2017-05-23-17:04:16       0.58     1.92         5882


[root@ hostB ~]#  mmperfmon query cpu -N hostA


Legend:

 1: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_system

 2: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_user

 3: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_contexts



Row           Timestamp cpu_system cpu_user cpu_contexts

  1 2017-05-23-17:05:45       0.33     0.46         7460

  2 2017-05-23-17:05:46       0.33     0.42         8993

  3 2017-05-23-17:05:47       0.42     0.54         8709

  4 2017-05-23-17:05:48       0.38      0.5         5923

  5 2017-05-23-17:05:49       0.54     1.46         7381

  6 2017-05-23-17:05:50       0.58     3.51        10381

  7 2017-05-23-17:05:51       1.05     1.13        10995

  8 2017-05-23-17:05:52       0.88     0.92        10855

  9 2017-05-23-17:05:53        0.5     0.63        10958

 10 2017-05-23-17:05:54        0.5     0.59        10285


[root@ hostA ~]#  mmperfmon query cpu -N hostA


Legend:

 1: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_system

 2: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_user

 3: hostA.nersc.gov <http://hosta.nersc.gov/>|CPU|cpu_contexts



Row           Timestamp cpu_system cpu_user cpu_contexts

  1 2017-05-23-17:05:50       0.58     3.51        10381

  2 2017-05-23-17:05:51       1.05     1.13        10995

  3 2017-05-23-17:05:52       0.88     0.92        10855

  4 2017-05-23-17:05:53        0.5     0.63        10958

  5 2017-05-23-17:05:54        0.5     0.59        10285

  6 2017-05-23-17:05:55       0.46     0.63        11621

  7 2017-05-23-17:05:56       0.84     0.92        11477

  8 2017-05-23-17:05:57       1.47     1.88        11084

  9 2017-05-23-17:05:58       0.46     1.76         9125

 10 2017-05-23-17:05:59       0.42     0.63        11745
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170524/e64509b9/attachment-0001.htm>


More information about the gpfsug-discuss mailing list