[gpfsug-discuss] [External] Pmcollector fails to start

Nicolas CALIMET ncalimet at lenovo.com
Mon Nov 15 21:31:03 GMT 2021


Hi,

I’ve been experiencing this “start request repeated too quickly” issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic).

In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster:

mmchnode --noperfmon -N all   # required before deleting the perfmon config
mmperfmon config delete --all
mmperfmon config generate --collectors <GUINODES>  # start the pmcollector service on the GUI nodes
mmchnode --perfmon -N all  # start the pmsensors service on all nodes

It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies.

HTH

--
Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Monday, November 15, 2021 19:44
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [External] [gpfsug-discuss] Pmcollector fails to start

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

“service pmcollector start” - fails:

Redirecting to /bin/systemctl status pmcollector.service
● pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/425cea19/attachment-0002.htm>


More information about the gpfsug-discuss mailing list