From s.j.thompson at bham.ac.uk Mon Nov 1 14:50:54 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Nov 2021 14:50:54 +0000 Subject: [gpfsug-discuss] SSUG UK User Group Message-ID: Hi All, I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK. I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards. If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community? I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment. Hopefully there?s someone interested in taking the group forwards in the future ? Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.j.thompson at bham.ac.uk Tue Nov 2 14:02:10 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Nov 2021 14:02:10 +0000 Subject: [gpfsug-discuss] Upcoming Events Message-ID: Hi All, We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person! IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON) IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI. See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/ Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html SSUG::Digital (1st, 2nd December 2021, VIRTUAL) For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners. See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/ Joining link: To be confirmed SSUG @CIUK 2021 (10th December 2021, IN PERSON) This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ? See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/ As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group. CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Nov 4 21:17:33 2021 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 04 Nov 2021 17:17:33 -0400 Subject: [gpfsug-discuss] possible to rename a snapshot? Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark From heinrich.billich at id.ethz.ch Mon Nov 8 09:20:24 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 8 Nov 2021 09:20:24 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From olaf.weiser at de.ibm.com Mon Nov 8 09:53:04 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 8 Nov 2021 09:53:04 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Nov 8 09:54:18 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Nov 2021 09:54:18 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > notice that this directory randomly vanishes. Mmhealth does not > complain but just notes that it will no longer monitor the directory. > Still I doubt that trace collection and similar will create the > directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > 6.0.2.2 and 6.0.2.2. > I know several Linux distributions clear the contents of /tmp at boot time. Could that explain it? I would say using /tmp like you are doing is not a sensible idea anyway and that you should be using something under /var. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From lior at nyu.edu Mon Nov 8 14:38:35 2021 From: lior at nyu.edu (Lior Atar) Date: Mon, 8 Nov 2021 09:38:35 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4 In-Reply-To: References: Message-ID: Hello all, /tmp/mmfs is being deleted every 10 days by a systemd service " systemd-tmpfiles-setup.service ". That service calls a configuration file " /usr/lib/tmpfiles.d/tmp.conf . What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then create the directory /tmp/mmfs and then exclude deleting going forward. Here's our actual file and some commentary of what the options mean: # cat /etc/tmpfiles.d/tmp.conf # Create a /tmp/mmfs directory d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory x /tmp/mmfs/* <-------- the " x " says to ignore it That change helped us avoid /tmp/mmfs from being deleted every 10 days. In addition I think also did a %systemctl daemon-reload ( but I don't have it in my notes, wouldn't hurt to run it ) Hope this helps, Lior On Mon, Nov 8, 2021 at 7:00 AM wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. /tmp/mmfs vanishes randomly? (Billich Heinrich Rainer (ID SD)) > 2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser) > 3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 8 Nov 2021 09:20:24 +0000 > From: "Billich Heinrich Rainer (ID SD)" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch> > Content-Type: text/plain; charset="utf-8" > > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I notice > that this directory randomly vanishes. Mmhealth does not complain but just > notes that it will no longer monitor the directory. Still I doubt that > trace collection and similar will create the directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, it's no > cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and > 6.0.2.2. > > Thank you, > > Mmhealth message: > local_fs_path_not_found INFO The configured dataStructureDump path > /tmp/mmfs does not exists. Skipping monitoring. > > Kind regards, > > Heiner > --- > ======================= > Heinrich Billich > ETH Z?rich > Informatikdienste > Tel.: +41 44 632 72 56 > heinrich.billich at id.ethz.ch > ======================== > > > > > > ------------------------------ > > Message: 2 > Date: Mon, 8 Nov 2021 09:53:04 +0000 > From: "Olaf Weiser" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: < > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e= > > > > ------------------------------ > > Message: 3 > Date: Mon, 8 Nov 2021 09:54:18 +0000 > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > > > Hello, > > > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > > notice that this directory randomly vanishes. Mmhealth does not > > complain but just notes that it will no longer monitor the directory. > > Still I doubt that trace collection and similar will create the > > directory when needed? > > > > Do you know of any spectrum scale internal mechanism that could cause > > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > > installation, too. It happens just on one or two nodes at a time, > > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > > 6.0.2.2 and 6.0.2.2. > > > > I know several Linux distributions clear the contents of /tmp at boot > time. Could that explain it? > > I would say using /tmp like you are doing is not a sensible idea anyway > and that you should be using something under /var. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > > > End of gpfsug-discuss Digest, Vol 118, Issue 4 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.r.sudbery at bham.ac.uk Tue Nov 9 16:55:36 2021 From: l.r.sudbery at bham.ac.uk (Luke Sudbery) Date: Tue, 9 Nov 2021 16:55:36 +0000 Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0? Message-ID: mmbuildgpl in 5.1.2.0 has build me a package with the filename: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm Before it would have been: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64. Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one... Many thanks, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don't work on Monday. -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Wed Nov 10 10:28:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 10 Nov 2021 10:28:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, have you ever received any reply to this or managed to solve it? We are seeing exactly the same error and it's filling up our logs. It seems all the monitoring data is still extracted, so I'm not sure when it started so not sure if this is related to any upgrade on our side, but it may have been going on for a while. We only noticed because the log file now is filling up the local log partition. Kind regards, Frederik On 26/08/2021 11:49, Ragho Mahalingam wrote: > We've been working on setting up mmperfmon; after creating a new > configuration with the new collector on the same manager node, mmsysmon > keeps throwing exceptions. > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > 123, in _getDataFromZimonSocket > sock.connect(SOCKET_PATH) > FileNotFoundError: [Errno 2] No such file or directory > > Tracing this a bit, it appears that SOCKET_PATH is > /var/run/perfmon/pmcollector.socket and this unix domain socket is absent, > even though pmcollector has started and is running successfully. > > Under what scenarios is pmcollector supposed to create this socket? I > don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg, > so I'm assuming the socket is automatically created when pmcollector starts. > > Any thoughts on how to debug and resolve this? > > Thanks, Ragu -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From ragho.mahalingam+spectrumscaleug at pathai.com Wed Nov 10 14:00:19 2021 From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam) Date: Wed, 10 Nov 2021 09:00:19 -0500 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Frederick, In our case the issue started appearing after upgrading from 5.0.4 to 5.1.1. If you've recently upgraded, then the following may be useful. Turns out that mmsysmon (gpfs-base package) requires the new gpfs.gss.pmcollector (from zimon packages) to function correctly (the AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In our case, we'd upgraded all the mandatory packages but had not upgraded the optional ones; the mmsysmonc python libs appears to be updated by the pmcollector package from my study. If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* packages installed. If gpfs.gss.pmcollector isn't installed, you'd definitely need that to make this runaway logging stop. Hope that helps! Ragu On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < frederik.ferner at diamond.ac.uk> wrote: > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We are > seeing exactly the same error and it's filling up our logs. It seems all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, mmsysmon > > keeps throwing exceptions. > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > 123, in _getDataFromZimonSocket > > sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket? I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > SciComp Help Desk can be reached on x8596 > > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright and or > privileged material, and are for the use of the intended addressee only. If > you are not the intended addressee or an authorised recipient of the > addressee please notify us of receipt by returning the e-mail and do not > use, copy, retain, distribute or disclose the information in or attached to > the e-mail. > Any opinions expressed within this e-mail are those of the individual and > not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for any > damage which you may sustain as a result of software viruses which may be > transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England > and Wales with its registered office at Diamond House, Harwell Science and > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Disclaimer: This email and any corresponding attachments may contain confidential information. If you're not the intended recipient, any copying, distribution, disclosure, or use of any information contained in the email or its attachments is strictly prohibited. If you believe to have received this email in error, please email security at pathai.com immediately, then destroy the email and any attachments without reading or saving.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Nov 10 14:14:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 10 Nov 2021 14:14:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?mmsysmon_exception_with_pmcollector_so?= =?utf-8?q?cket=09being_absent?= In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Nov 11 13:38:56 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:38:56 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, many thanks for the response. That was indeed the problem. We missed it when we upgraded a while ago and because our normal monitoring continued to work, we didn't notice until now. Kind regards, Frederik On 10/11/2021 09:00, Ragho Mahalingam wrote: > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1. If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In > our case, we'd upgraded all the mandatory packages but had not upgraded the > optional ones; the mmsysmonc python libs appears to be updated by the > pmcollector package from my study. > > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed. If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > > Hope that helps! > > Ragu > > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < > frederik.ferner at diamond.ac.uk> wrote: > > > Hi Ragu, > > > > have you ever received any reply to this or managed to solve it? We are > > seeing exactly the same error and it's filling up our logs. It seems all > > the monitoring data is still extracted, so I'm not sure when it > > started so not sure if this is related to any upgrade on our side, but > > it may have been going on for a while. We only noticed because the log > > file now is filling up the local log partition. > > > > Kind regards, > > Frederik > > > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > > We've been working on setting up mmperfmon; after creating a new > > > configuration with the new collector on the same manager node, mmsysmon > > > keeps throwing exceptions. > > > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > > 123, in _getDataFromZimonSocket > > > sock.connect(SOCKET_PATH) > > > FileNotFoundError: [Errno 2] No such file or directory > > > > > > Tracing this a bit, it appears that SOCKET_PATH is > > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > > absent, > > > even though pmcollector has started and is running successfully. > > > > > > Under what scenarios is pmcollector supposed to create this socket? I > > > don't see any configuration for this in > > /opt/IBM/zimon/ZIMonCollector.cfg, > > > so I'm assuming the socket is automatically created when pmcollector > > starts. > > > > > > Any thoughts on how to debug and resolve this? > > > > > > Thanks, Ragu > > > > -- > > Frederik Ferner (he/him) > > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > > > SciComp Help Desk can be reached on x8596 > > > > > > (Apologies in advance for the lines below. Some bits are a legal > > requirement and I have no control over them.) > > > > -- > > This e-mail and any attachments may contain confidential, copyright and or > > privileged material, and are for the use of the intended addressee only. If > > you are not the intended addressee or an authorised recipient of the > > addressee please notify us of receipt by returning the e-mail and do not > > use, copy, retain, distribute or disclose the information in or attached to > > the e-mail. > > Any opinions expressed within this e-mail are those of the individual and > > not necessarily of Diamond Light Source Ltd. > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > > attachments are free from viruses and we cannot accept liability for any > > damage which you may sustain as a result of software viruses which may be > > transmitted in or with the message. > > Diamond Light Source Limited (company no. 4375679). Registered in England > > and Wales with its registered office at Diamond House, Harwell Science and > > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > *Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained in > the email or its attachments is strictly prohibited. If you believe to have > received this email in error, please email security at pathai.com > immediately, then destroy the email and any > attachments without reading or saving.* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From frederik.ferner at diamond.ac.uk Thu Nov 11 13:45:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:45:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket?being absent In-Reply-To: References: Message-ID: Hi Fred, we haven't used the deployement tool anywhere so far, we always apply/upgrade the RPMs directly. (Centrally managed via CFengine, promising that certain Spectrum Scale RPMs are installed. I haven't yet checked how the gpfs.gss.pmcollector RPM were installed initially as they weren't in our list of promised packages, which is why the upgrade was missed.) Kind regards, Frederik On 10/11/2021 14:14, Frederick Stock wrote: > I am curious to know if you upgraded by manually applying rpms or if you > used the Spectrum Scale deployment tool (spectrumscale command) to apply > the upgrade? > Fred > _______________________________________________________ > Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 > stockf at us.ibm.com > ? > ? > > ----- Original message ----- > From: "Ragho Mahalingam" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with > pmcollector socket being absent > Date: Wed, Nov 10, 2021 9:00 AM > ? > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1.? If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).? > In our case, we'd upgraded all the mandatory packages but had > not?upgraded the optional ones; the mmsysmonc?python libs appears to be > updated by the pmcollector package from my study. > ? > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed.? If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > ? > Hope that helps! > ? > Ragu > ? > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner > <[1]frederik.ferner at diamond.ac.uk> wrote: > > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We > are > seeing exactly the same error and it's filling up our logs. It seems > all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, > mmsysmon > > keeps throwing exceptions. > > > >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", > line > > 123, in _getDataFromZimonSocket > >? ? ?sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > >? /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket?? > I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 > 8624 > Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08 > 5110 > > SciComp Help Desk can be reached on x8596 > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright > and or privileged material, and are for the use of the intended > addressee only. If you are not the intended addressee or an authorised > recipient of the addressee please notify us of receipt by returning > the e-mail and do not use, copy, retain, distribute or disclose the > information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual > and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for > any damage which you may sustain as a result of software viruses which > may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in > England and Wales with its registered office at Diamond House, Harwell > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United > Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at [2]spectrumscale.org > [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained > in the email or its attachments is strictly prohibited. If you believe > to have received this email in error, please email > [4]security at pathai.com immediately, then destroy the email and any > attachments without reading or saving. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > ? > > References > > Visible links > 1. mailto:frederik.ferner at diamond.ac.uk > 2. http://spectrumscale.org/ > 3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 4. mailto:security at pathai.com > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From pinkesh.valdria at oracle.com Fri Nov 12 07:57:14 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 07:57:14 +0000 Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 11:54:38 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 17:24:38 +0530 Subject: [gpfsug-discuss] =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?= =?utf-8?q?nvalid_skey=09=28secret_key=29?= In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinkesh.valdria at oracle.com Fri Nov 12 12:26:44 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 12:26:44 +0000 Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage - fails with invalid skey (secret key) In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS ________________________________ From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 12:50:48 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 18:20:48 +0530 Subject: [gpfsug-discuss] =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?= =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?= In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi Pinkesh, You could open a ticket to get the efix. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "Venkateswara R Puvvada" , "gpfsug main discussion list" Date: 11/12/2021 05:57 PM Subject: Re: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Nov 15 18:44:04 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 15 Nov 2021 18:44:04 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? ? fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncalimet at lenovo.com Mon Nov 15 21:31:03 2021 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Mon, 15 Nov 2021 21:31:03 +0000 Subject: [gpfsug-discuss] [External] Pmcollector fails to start In-Reply-To: References: Message-ID: Hi, I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic). In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster: mmchnode --noperfmon -N all # required before deleting the perfmon config mmperfmon config delete --all mmperfmon config generate --collectors # start the pmcollector service on the GUI nodes mmchnode --perfmon -N all # start the pmsensors service on all nodes It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies. HTH -- Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Monday, November 15, 2021 19:44 To: gpfsug main discussion list Subject: [External] [gpfsug-discuss] Pmcollector fails to start Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? - fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Nov 16 16:44:21 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 16 Nov 2021 16:44:21 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Nov 18 09:09:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 18 Nov 2021 17:09:25 +0800 Subject: [gpfsug-discuss] possible to rename a snapshot? In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE> References: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Message-ID: Mark, GPFS does not support to rename an existing snapshot. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: mark.bergman at uphs.upenn.edu To: "gpfsug main discussion list" Date: 2021/11/05 05:33 AM Subject: [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From HAUBRICH at de.ibm.com Thu Nov 18 13:01:39 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Thu, 18 Nov 2021 15:01:39 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Nov 18 13:53:47 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Nov 2021 13:53:47 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start In-Reply-To: References: Message-ID: That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size. chown?ing that to scalepm.scalepm fixed it. Now, creating a user ?scalepm? on the sly and not telling me ? not good! Bob Oesterlin Sr Principal Storage Engineer Nuance Communications From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Manfred Haubrich Date: Thursday, November 18, 2021 at 7:01 AM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development ________________________________ Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main ________________________________ IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 49 bytes Desc: ecblank.gif URL: From HAUBRICH at de.ibm.com Fri Nov 19 09:00:49 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Fri, 19 Nov 2021 11:00:49 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Sorry for that difficulty, but the new user for the performance monitoring tool was mentioned in the 5.1.1 summary of changes https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From PSAFRE at de.ibm.com Fri Nov 19 13:49:11 2021 From: PSAFRE at de.ibm.com (Pavel Safre) Date: Fri, 19 Nov 2021 15:49:11 +0200 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Message-ID: Hello Heiner, just a heads up for you and the other storage admins, regularly cleaning up /tmp, regarding one aspect to keep in mind: - If you are using Spectrum Scale software call home (mmcallhome), it would be using the directory ${dataStructureDump}/callhome to save the copies of the uploaded data. This would be /tmp/mmfs/callhome/ in your case, which you would be automatically regularly removing. - These copies are used by one of the features of call home: "mmcallhome status diff" - This feature allows to see an overview of the Spectrum Scale configuration changes, that occurred between 2 different points in time. - This effectively allows to quickly find out if any config changes occurred prior to an outage, thereby helping to find the root cause of self-caused problems in the Scale cluster. - It was added in Scale 5.0.5.0 See IBM KC for more details: https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command - As a source of the "config snapshots", mmcallhome status diff is using the DC packages inside of ${dataStructureDump}/callhome, which you would be regularly deleting, thereby hugely reducing the usability of this particular feature. - Of course, software call home automatically makes sure, it will not use too much space in dataStructureDump and it automatically removes the oldest entries, keeping at most 2GB or 300 files inside (default values, configurable). Mit freundlichen Gr??en / Kind regards Pavel Safre Software Engineer IBM Systems Group, IBM Spectrum Scale Development Dept. M925 Phone: IBM Deutschland Research & Development GmbH Email: psafre at de.ibm.com Wilhelm-Fay-Stra?e 32 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: "gpfsug main discussion list" Date: 16.11.2021 17:44 Subject: [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From novosirj at rutgers.edu Fri Nov 19 16:46:34 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 19 Nov 2021 16:46:34 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu> Has any progress been made here at all? I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create. What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data. Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Aug 23, 2018, at 7:50 AM, Markus Rohwedder wrote: > > Hello Juri, Keith, > > thank you for your responses. > > The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. > > Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. > However we have not set up, tested nor documented such a configuration as of today. > > Currently the GUI service manages the iptables redirect bring up and tear down. > If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. > > In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to > - not check for already used ports when the GUI service tries to start up > - don't create/destroy port forwarding rules during GUI service start and stop. > This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. > > The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. > > See this KC link: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > "Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case > > From: "Daniel Kidger" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 23.08.2018 12:13 > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Keith, > > I have another IBM customer who also wished to move Scale GUI's https ports. > In their case because they had their own web based management interface on the same https port. > Is this the same reason that you have? > If so I wonder how many other sites have the same issue? > > One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). > Then run the two different GUIs, one per IP address. > Is this an option, albeit a little ugly? > Daniel > > <17310450.gif> Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > ----- Original message ----- > From: "Markus Rohwedder" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Date: Thu, Aug 23, 2018 9:51 AM > Hello Keith, > > it is not so easy. > > The GUI receives events from other scale components using the currently defined ports. > Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). > Therefore at this point there is no procedure to change this behaviour across all components. > > Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. > Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. > The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. > If these ports are already used by another service, the GUI will not start up. > > Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. > If you want to emphasize your case as future development item, please let me know. > > I would also be interested in: > > Scale version you are running > > Do you need port 80 or 443 as well? > > Would it work for you if the xCAT service was bound to a single IP address? > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? > > From: Keith Ball > To: gpfsug-discuss at spectrumscale.org > Date: 22.08.2018 21:33 > Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Hello All, > > Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: > > > > > > but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. > > Many Thanks, > Keith > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Tue Nov 23 17:59:12 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 23 Nov 2021 17:59:12 +0000 Subject: [gpfsug-discuss] AFM does too small NFS writes, and I don't see parallel writes Message-ID: Hello, We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight: The migration to afm home seems slow. I note: Afm writes a whole file of ~100MB in much too many small chunks My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time). I the afm queue I see 8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862 I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091. So if I?m right my question is: What can I change to make afm ?write less and larger chunks per file? Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently? Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ? We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB. We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap. Kind regards, Heiner Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files ------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total- read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send ?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M ?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M ?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M ?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M ?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5254 bytes Desc: not available URL: From scl at virginia.edu Tue Nov 30 12:47:46 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Tue, 30 Nov 2021 12:47:46 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Hi folks, Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt -rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp -rw-r--r-- 1 scalemgmt scalemgmt 2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt -rw-r--r-- 1 scalemgmt scalemgmt 1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc -rw-r--r-- 1 scalemgmt scalemgmt 202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp The core.*.dmp files are cores from the java command. And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log. Any suggestions? Thanks for any help. 2021-11-30_07:25:09.944-0500: [W] ET_gui Event=gui_down identifier= arg0=started arg1=stopped 2021-11-30_07:25:09.961-0500: [I] ET_gui state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572 2021-11-30_07:25:09.963-0500: [I] ClientThread-4 received command: 'thresholds refresh collectors 4021694' 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 reload collectors 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 read_collectors 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.061-0500: [I] ClientThread-4 _activate_rules_scheduler completed 2021-11-30_07:25:10.147-0500: [I] ET_gui Event=component_state_change identifier= arg0=GUI arg1=FAILED 2021-11-30_07:25:10.148-0500: [I] ET_gui StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN 2021-11-30_07:25:10.148-0500: [I] ET_gui Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4 2021-11-30_07:25:10.148-0500: [I] ET_gui Monitor: LocalState:FAILED Events:607 Entities:0 RT: 0.83 2021-11-30_07:25:11.975-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693'] 2021-11-30_07:25:11.975-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:04.553-0500: [D] ET_perfmon File collectors has no newer version than 4021693 - CCRProxy.getFile:119 2021-11-30_07:25:11.975-0500: [W] ET_perfmon Conditional put for file collectors with version 4021693 failed 2021-11-30_07:25:11.975-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:11.976-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:12.077-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds refresh collectors 4021695' 2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors 2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors 2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed 2021-11-30_07:25:15.528-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694'] 2021-11-30_07:25:15.528-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:12.076-0500: [D] ET_perfmon File collectors has no newer version than 4021694 - CCRProxy.getFile:119 2021-11-30_07:25:15.529-0500: [W] ET_perfmon Conditional put for file collectors with version 4021694 failed 2021-11-30_07:25:15.529-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:15.529-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:15.626-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:16.594-0500: [I] ClientThread-3 received command: 'thresholds refresh collectors 4021696' 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 reload collectors 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 read_collectors 2021-11-30_07:25:19.780-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695'] 2021-11-30_07:25:19.780-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:15.625-0500: [D] ET_perfmon File collectors has no newer version than 4021695 - CCRProxy.getFile:119 2021-11-30_07:25:16.781-0500: [D] ClientThread-3 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:19.780-0500: [W] ET_perfmon Conditional put for file collectors with version 4021695 failed 2021-11-30_07:25:19.781-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:19.781-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:19.881-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:21.238-0500: [I] ClientThread-7 received command: 'thresholds refresh collectors 4021697' 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 reload collectors 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 read_collectors 2021-11-30_07:25:21.324-0500: [W] NMES monitor event arrived while still busy for perfmon 2021-11-30_07:25:21.481-0500: [I] ET_threshold Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor 2021-11-30_07:25:21.482-0500: [I] ET_threshold Monitor: LocalState:HEALTHY Events:1 Entities:1 RT: 0.16 2021-11-30_07:25:24.211-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696'] 2021-11-30_07:25:24.211-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:19.881-0500: [D] ET_perfmon File collectors has no newer version than 4021696 - CCRProxy.getFile:119 2021-11-30_07:25:21.411-0500: [D] ClientThread-7 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:24.211-0500: [W] ET_perfmon Conditional put for file collectors with version 4021696 failed 2021-11-30_07:25:24.212-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:24.212-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:24.314-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:24.543-0500: [I] ET_gui ServiceMonitor => out=Type=notify And then gpfsgui apparently crashes and systemd automatically restarts it. Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From luis.bolinches at fi.ibm.com Tue Nov 30 13:30:06 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 30 Nov 2021 13:30:06 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Nov 30 13:34:17 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Nov 2021 13:34:17 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: References: , <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From s.j.thompson at bham.ac.uk Mon Nov 1 14:50:54 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Nov 2021 14:50:54 +0000 Subject: [gpfsug-discuss] SSUG UK User Group Message-ID: Hi All, I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK. I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards. If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community? I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment. Hopefully there?s someone interested in taking the group forwards in the future ? Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.j.thompson at bham.ac.uk Tue Nov 2 14:02:10 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Nov 2021 14:02:10 +0000 Subject: [gpfsug-discuss] Upcoming Events Message-ID: Hi All, We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person! IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON) IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI. See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/ Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html SSUG::Digital (1st, 2nd December 2021, VIRTUAL) For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners. See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/ Joining link: To be confirmed SSUG @CIUK 2021 (10th December 2021, IN PERSON) This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ? See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/ As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group. CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Nov 4 21:17:33 2021 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 04 Nov 2021 17:17:33 -0400 Subject: [gpfsug-discuss] possible to rename a snapshot? Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark From heinrich.billich at id.ethz.ch Mon Nov 8 09:20:24 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 8 Nov 2021 09:20:24 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From olaf.weiser at de.ibm.com Mon Nov 8 09:53:04 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 8 Nov 2021 09:53:04 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Nov 8 09:54:18 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Nov 2021 09:54:18 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > notice that this directory randomly vanishes. Mmhealth does not > complain but just notes that it will no longer monitor the directory. > Still I doubt that trace collection and similar will create the > directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > 6.0.2.2 and 6.0.2.2. > I know several Linux distributions clear the contents of /tmp at boot time. Could that explain it? I would say using /tmp like you are doing is not a sensible idea anyway and that you should be using something under /var. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From lior at nyu.edu Mon Nov 8 14:38:35 2021 From: lior at nyu.edu (Lior Atar) Date: Mon, 8 Nov 2021 09:38:35 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4 In-Reply-To: References: Message-ID: Hello all, /tmp/mmfs is being deleted every 10 days by a systemd service " systemd-tmpfiles-setup.service ". That service calls a configuration file " /usr/lib/tmpfiles.d/tmp.conf . What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then create the directory /tmp/mmfs and then exclude deleting going forward. Here's our actual file and some commentary of what the options mean: # cat /etc/tmpfiles.d/tmp.conf # Create a /tmp/mmfs directory d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory x /tmp/mmfs/* <-------- the " x " says to ignore it That change helped us avoid /tmp/mmfs from being deleted every 10 days. In addition I think also did a %systemctl daemon-reload ( but I don't have it in my notes, wouldn't hurt to run it ) Hope this helps, Lior On Mon, Nov 8, 2021 at 7:00 AM wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. /tmp/mmfs vanishes randomly? (Billich Heinrich Rainer (ID SD)) > 2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser) > 3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 8 Nov 2021 09:20:24 +0000 > From: "Billich Heinrich Rainer (ID SD)" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch> > Content-Type: text/plain; charset="utf-8" > > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I notice > that this directory randomly vanishes. Mmhealth does not complain but just > notes that it will no longer monitor the directory. Still I doubt that > trace collection and similar will create the directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, it's no > cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and > 6.0.2.2. > > Thank you, > > Mmhealth message: > local_fs_path_not_found INFO The configured dataStructureDump path > /tmp/mmfs does not exists. Skipping monitoring. > > Kind regards, > > Heiner > --- > ======================= > Heinrich Billich > ETH Z?rich > Informatikdienste > Tel.: +41 44 632 72 56 > heinrich.billich at id.ethz.ch > ======================== > > > > > > ------------------------------ > > Message: 2 > Date: Mon, 8 Nov 2021 09:53:04 +0000 > From: "Olaf Weiser" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: < > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e= > > > > ------------------------------ > > Message: 3 > Date: Mon, 8 Nov 2021 09:54:18 +0000 > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > > > Hello, > > > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > > notice that this directory randomly vanishes. Mmhealth does not > > complain but just notes that it will no longer monitor the directory. > > Still I doubt that trace collection and similar will create the > > directory when needed? > > > > Do you know of any spectrum scale internal mechanism that could cause > > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > > installation, too. It happens just on one or two nodes at a time, > > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > > 6.0.2.2 and 6.0.2.2. > > > > I know several Linux distributions clear the contents of /tmp at boot > time. Could that explain it? > > I would say using /tmp like you are doing is not a sensible idea anyway > and that you should be using something under /var. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > > > End of gpfsug-discuss Digest, Vol 118, Issue 4 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.r.sudbery at bham.ac.uk Tue Nov 9 16:55:36 2021 From: l.r.sudbery at bham.ac.uk (Luke Sudbery) Date: Tue, 9 Nov 2021 16:55:36 +0000 Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0? Message-ID: mmbuildgpl in 5.1.2.0 has build me a package with the filename: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm Before it would have been: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64. Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one... Many thanks, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don't work on Monday. -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Wed Nov 10 10:28:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 10 Nov 2021 10:28:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, have you ever received any reply to this or managed to solve it? We are seeing exactly the same error and it's filling up our logs. It seems all the monitoring data is still extracted, so I'm not sure when it started so not sure if this is related to any upgrade on our side, but it may have been going on for a while. We only noticed because the log file now is filling up the local log partition. Kind regards, Frederik On 26/08/2021 11:49, Ragho Mahalingam wrote: > We've been working on setting up mmperfmon; after creating a new > configuration with the new collector on the same manager node, mmsysmon > keeps throwing exceptions. > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > 123, in _getDataFromZimonSocket > sock.connect(SOCKET_PATH) > FileNotFoundError: [Errno 2] No such file or directory > > Tracing this a bit, it appears that SOCKET_PATH is > /var/run/perfmon/pmcollector.socket and this unix domain socket is absent, > even though pmcollector has started and is running successfully. > > Under what scenarios is pmcollector supposed to create this socket? I > don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg, > so I'm assuming the socket is automatically created when pmcollector starts. > > Any thoughts on how to debug and resolve this? > > Thanks, Ragu -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From ragho.mahalingam+spectrumscaleug at pathai.com Wed Nov 10 14:00:19 2021 From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam) Date: Wed, 10 Nov 2021 09:00:19 -0500 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Frederick, In our case the issue started appearing after upgrading from 5.0.4 to 5.1.1. If you've recently upgraded, then the following may be useful. Turns out that mmsysmon (gpfs-base package) requires the new gpfs.gss.pmcollector (from zimon packages) to function correctly (the AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In our case, we'd upgraded all the mandatory packages but had not upgraded the optional ones; the mmsysmonc python libs appears to be updated by the pmcollector package from my study. If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* packages installed. If gpfs.gss.pmcollector isn't installed, you'd definitely need that to make this runaway logging stop. Hope that helps! Ragu On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < frederik.ferner at diamond.ac.uk> wrote: > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We are > seeing exactly the same error and it's filling up our logs. It seems all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, mmsysmon > > keeps throwing exceptions. > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > 123, in _getDataFromZimonSocket > > sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket? I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > SciComp Help Desk can be reached on x8596 > > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright and or > privileged material, and are for the use of the intended addressee only. If > you are not the intended addressee or an authorised recipient of the > addressee please notify us of receipt by returning the e-mail and do not > use, copy, retain, distribute or disclose the information in or attached to > the e-mail. > Any opinions expressed within this e-mail are those of the individual and > not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for any > damage which you may sustain as a result of software viruses which may be > transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England > and Wales with its registered office at Diamond House, Harwell Science and > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Disclaimer: This email and any corresponding attachments may contain confidential information. If you're not the intended recipient, any copying, distribution, disclosure, or use of any information contained in the email or its attachments is strictly prohibited. If you believe to have received this email in error, please email security at pathai.com immediately, then destroy the email and any attachments without reading or saving.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Nov 10 14:14:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 10 Nov 2021 14:14:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?mmsysmon_exception_with_pmcollector_so?= =?utf-8?q?cket=09being_absent?= In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Nov 11 13:38:56 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:38:56 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, many thanks for the response. That was indeed the problem. We missed it when we upgraded a while ago and because our normal monitoring continued to work, we didn't notice until now. Kind regards, Frederik On 10/11/2021 09:00, Ragho Mahalingam wrote: > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1. If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In > our case, we'd upgraded all the mandatory packages but had not upgraded the > optional ones; the mmsysmonc python libs appears to be updated by the > pmcollector package from my study. > > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed. If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > > Hope that helps! > > Ragu > > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < > frederik.ferner at diamond.ac.uk> wrote: > > > Hi Ragu, > > > > have you ever received any reply to this or managed to solve it? We are > > seeing exactly the same error and it's filling up our logs. It seems all > > the monitoring data is still extracted, so I'm not sure when it > > started so not sure if this is related to any upgrade on our side, but > > it may have been going on for a while. We only noticed because the log > > file now is filling up the local log partition. > > > > Kind regards, > > Frederik > > > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > > We've been working on setting up mmperfmon; after creating a new > > > configuration with the new collector on the same manager node, mmsysmon > > > keeps throwing exceptions. > > > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > > 123, in _getDataFromZimonSocket > > > sock.connect(SOCKET_PATH) > > > FileNotFoundError: [Errno 2] No such file or directory > > > > > > Tracing this a bit, it appears that SOCKET_PATH is > > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > > absent, > > > even though pmcollector has started and is running successfully. > > > > > > Under what scenarios is pmcollector supposed to create this socket? I > > > don't see any configuration for this in > > /opt/IBM/zimon/ZIMonCollector.cfg, > > > so I'm assuming the socket is automatically created when pmcollector > > starts. > > > > > > Any thoughts on how to debug and resolve this? > > > > > > Thanks, Ragu > > > > -- > > Frederik Ferner (he/him) > > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > > > SciComp Help Desk can be reached on x8596 > > > > > > (Apologies in advance for the lines below. Some bits are a legal > > requirement and I have no control over them.) > > > > -- > > This e-mail and any attachments may contain confidential, copyright and or > > privileged material, and are for the use of the intended addressee only. If > > you are not the intended addressee or an authorised recipient of the > > addressee please notify us of receipt by returning the e-mail and do not > > use, copy, retain, distribute or disclose the information in or attached to > > the e-mail. > > Any opinions expressed within this e-mail are those of the individual and > > not necessarily of Diamond Light Source Ltd. > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > > attachments are free from viruses and we cannot accept liability for any > > damage which you may sustain as a result of software viruses which may be > > transmitted in or with the message. > > Diamond Light Source Limited (company no. 4375679). Registered in England > > and Wales with its registered office at Diamond House, Harwell Science and > > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > *Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained in > the email or its attachments is strictly prohibited. If you believe to have > received this email in error, please email security at pathai.com > immediately, then destroy the email and any > attachments without reading or saving.* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From frederik.ferner at diamond.ac.uk Thu Nov 11 13:45:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:45:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket?being absent In-Reply-To: References: Message-ID: Hi Fred, we haven't used the deployement tool anywhere so far, we always apply/upgrade the RPMs directly. (Centrally managed via CFengine, promising that certain Spectrum Scale RPMs are installed. I haven't yet checked how the gpfs.gss.pmcollector RPM were installed initially as they weren't in our list of promised packages, which is why the upgrade was missed.) Kind regards, Frederik On 10/11/2021 14:14, Frederick Stock wrote: > I am curious to know if you upgraded by manually applying rpms or if you > used the Spectrum Scale deployment tool (spectrumscale command) to apply > the upgrade? > Fred > _______________________________________________________ > Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 > stockf at us.ibm.com > ? > ? > > ----- Original message ----- > From: "Ragho Mahalingam" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with > pmcollector socket being absent > Date: Wed, Nov 10, 2021 9:00 AM > ? > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1.? If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).? > In our case, we'd upgraded all the mandatory packages but had > not?upgraded the optional ones; the mmsysmonc?python libs appears to be > updated by the pmcollector package from my study. > ? > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed.? If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > ? > Hope that helps! > ? > Ragu > ? > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner > <[1]frederik.ferner at diamond.ac.uk> wrote: > > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We > are > seeing exactly the same error and it's filling up our logs. It seems > all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, > mmsysmon > > keeps throwing exceptions. > > > >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", > line > > 123, in _getDataFromZimonSocket > >? ? ?sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > >? /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket?? > I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 > 8624 > Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08 > 5110 > > SciComp Help Desk can be reached on x8596 > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright > and or privileged material, and are for the use of the intended > addressee only. If you are not the intended addressee or an authorised > recipient of the addressee please notify us of receipt by returning > the e-mail and do not use, copy, retain, distribute or disclose the > information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual > and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for > any damage which you may sustain as a result of software viruses which > may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in > England and Wales with its registered office at Diamond House, Harwell > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United > Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at [2]spectrumscale.org > [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained > in the email or its attachments is strictly prohibited. If you believe > to have received this email in error, please email > [4]security at pathai.com immediately, then destroy the email and any > attachments without reading or saving. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > ? > > References > > Visible links > 1. mailto:frederik.ferner at diamond.ac.uk > 2. http://spectrumscale.org/ > 3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 4. mailto:security at pathai.com > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From pinkesh.valdria at oracle.com Fri Nov 12 07:57:14 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 07:57:14 +0000 Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 11:54:38 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 17:24:38 +0530 Subject: [gpfsug-discuss] =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?= =?utf-8?q?nvalid_skey=09=28secret_key=29?= In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinkesh.valdria at oracle.com Fri Nov 12 12:26:44 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 12:26:44 +0000 Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage - fails with invalid skey (secret key) In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS ________________________________ From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 12:50:48 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 18:20:48 +0530 Subject: [gpfsug-discuss] =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?= =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?= In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi Pinkesh, You could open a ticket to get the efix. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "Venkateswara R Puvvada" , "gpfsug main discussion list" Date: 11/12/2021 05:57 PM Subject: Re: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Nov 15 18:44:04 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 15 Nov 2021 18:44:04 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? ? fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncalimet at lenovo.com Mon Nov 15 21:31:03 2021 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Mon, 15 Nov 2021 21:31:03 +0000 Subject: [gpfsug-discuss] [External] Pmcollector fails to start In-Reply-To: References: Message-ID: Hi, I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic). In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster: mmchnode --noperfmon -N all # required before deleting the perfmon config mmperfmon config delete --all mmperfmon config generate --collectors # start the pmcollector service on the GUI nodes mmchnode --perfmon -N all # start the pmsensors service on all nodes It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies. HTH -- Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Monday, November 15, 2021 19:44 To: gpfsug main discussion list Subject: [External] [gpfsug-discuss] Pmcollector fails to start Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? - fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Nov 16 16:44:21 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 16 Nov 2021 16:44:21 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Nov 18 09:09:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 18 Nov 2021 17:09:25 +0800 Subject: [gpfsug-discuss] possible to rename a snapshot? In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE> References: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Message-ID: Mark, GPFS does not support to rename an existing snapshot. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: mark.bergman at uphs.upenn.edu To: "gpfsug main discussion list" Date: 2021/11/05 05:33 AM Subject: [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From HAUBRICH at de.ibm.com Thu Nov 18 13:01:39 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Thu, 18 Nov 2021 15:01:39 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Nov 18 13:53:47 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Nov 2021 13:53:47 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start In-Reply-To: References: Message-ID: That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size. chown?ing that to scalepm.scalepm fixed it. Now, creating a user ?scalepm? on the sly and not telling me ? not good! Bob Oesterlin Sr Principal Storage Engineer Nuance Communications From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Manfred Haubrich Date: Thursday, November 18, 2021 at 7:01 AM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development ________________________________ Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main ________________________________ IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 49 bytes Desc: ecblank.gif URL: From HAUBRICH at de.ibm.com Fri Nov 19 09:00:49 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Fri, 19 Nov 2021 11:00:49 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Sorry for that difficulty, but the new user for the performance monitoring tool was mentioned in the 5.1.1 summary of changes https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From PSAFRE at de.ibm.com Fri Nov 19 13:49:11 2021 From: PSAFRE at de.ibm.com (Pavel Safre) Date: Fri, 19 Nov 2021 15:49:11 +0200 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Message-ID: Hello Heiner, just a heads up for you and the other storage admins, regularly cleaning up /tmp, regarding one aspect to keep in mind: - If you are using Spectrum Scale software call home (mmcallhome), it would be using the directory ${dataStructureDump}/callhome to save the copies of the uploaded data. This would be /tmp/mmfs/callhome/ in your case, which you would be automatically regularly removing. - These copies are used by one of the features of call home: "mmcallhome status diff" - This feature allows to see an overview of the Spectrum Scale configuration changes, that occurred between 2 different points in time. - This effectively allows to quickly find out if any config changes occurred prior to an outage, thereby helping to find the root cause of self-caused problems in the Scale cluster. - It was added in Scale 5.0.5.0 See IBM KC for more details: https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command - As a source of the "config snapshots", mmcallhome status diff is using the DC packages inside of ${dataStructureDump}/callhome, which you would be regularly deleting, thereby hugely reducing the usability of this particular feature. - Of course, software call home automatically makes sure, it will not use too much space in dataStructureDump and it automatically removes the oldest entries, keeping at most 2GB or 300 files inside (default values, configurable). Mit freundlichen Gr??en / Kind regards Pavel Safre Software Engineer IBM Systems Group, IBM Spectrum Scale Development Dept. M925 Phone: IBM Deutschland Research & Development GmbH Email: psafre at de.ibm.com Wilhelm-Fay-Stra?e 32 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: "gpfsug main discussion list" Date: 16.11.2021 17:44 Subject: [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From novosirj at rutgers.edu Fri Nov 19 16:46:34 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 19 Nov 2021 16:46:34 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu> Has any progress been made here at all? I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create. What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data. Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Aug 23, 2018, at 7:50 AM, Markus Rohwedder wrote: > > Hello Juri, Keith, > > thank you for your responses. > > The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. > > Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. > However we have not set up, tested nor documented such a configuration as of today. > > Currently the GUI service manages the iptables redirect bring up and tear down. > If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. > > In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to > - not check for already used ports when the GUI service tries to start up > - don't create/destroy port forwarding rules during GUI service start and stop. > This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. > > The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. > > See this KC link: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > "Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case > > From: "Daniel Kidger" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 23.08.2018 12:13 > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Keith, > > I have another IBM customer who also wished to move Scale GUI's https ports. > In their case because they had their own web based management interface on the same https port. > Is this the same reason that you have? > If so I wonder how many other sites have the same issue? > > One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). > Then run the two different GUIs, one per IP address. > Is this an option, albeit a little ugly? > Daniel > > <17310450.gif> Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > ----- Original message ----- > From: "Markus Rohwedder" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Date: Thu, Aug 23, 2018 9:51 AM > Hello Keith, > > it is not so easy. > > The GUI receives events from other scale components using the currently defined ports. > Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). > Therefore at this point there is no procedure to change this behaviour across all components. > > Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. > Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. > The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. > If these ports are already used by another service, the GUI will not start up. > > Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. > If you want to emphasize your case as future development item, please let me know. > > I would also be interested in: > > Scale version you are running > > Do you need port 80 or 443 as well? > > Would it work for you if the xCAT service was bound to a single IP address? > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? > > From: Keith Ball > To: gpfsug-discuss at spectrumscale.org > Date: 22.08.2018 21:33 > Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Hello All, > > Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: > > > > > > but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. > > Many Thanks, > Keith > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Tue Nov 23 17:59:12 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 23 Nov 2021 17:59:12 +0000 Subject: [gpfsug-discuss] AFM does too small NFS writes, and I don't see parallel writes Message-ID: Hello, We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight: The migration to afm home seems slow. I note: Afm writes a whole file of ~100MB in much too many small chunks My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time). I the afm queue I see 8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862 I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091. So if I?m right my question is: What can I change to make afm ?write less and larger chunks per file? Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently? Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ? We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB. We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap. Kind regards, Heiner Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files ------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total- read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send ?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M ?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M ?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M ?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M ?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5254 bytes Desc: not available URL: From scl at virginia.edu Tue Nov 30 12:47:46 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Tue, 30 Nov 2021 12:47:46 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Hi folks, Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt -rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp -rw-r--r-- 1 scalemgmt scalemgmt 2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt -rw-r--r-- 1 scalemgmt scalemgmt 1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc -rw-r--r-- 1 scalemgmt scalemgmt 202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp The core.*.dmp files are cores from the java command. And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log. Any suggestions? Thanks for any help. 2021-11-30_07:25:09.944-0500: [W] ET_gui Event=gui_down identifier= arg0=started arg1=stopped 2021-11-30_07:25:09.961-0500: [I] ET_gui state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572 2021-11-30_07:25:09.963-0500: [I] ClientThread-4 received command: 'thresholds refresh collectors 4021694' 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 reload collectors 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 read_collectors 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.061-0500: [I] ClientThread-4 _activate_rules_scheduler completed 2021-11-30_07:25:10.147-0500: [I] ET_gui Event=component_state_change identifier= arg0=GUI arg1=FAILED 2021-11-30_07:25:10.148-0500: [I] ET_gui StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN 2021-11-30_07:25:10.148-0500: [I] ET_gui Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4 2021-11-30_07:25:10.148-0500: [I] ET_gui Monitor: LocalState:FAILED Events:607 Entities:0 RT: 0.83 2021-11-30_07:25:11.975-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693'] 2021-11-30_07:25:11.975-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:04.553-0500: [D] ET_perfmon File collectors has no newer version than 4021693 - CCRProxy.getFile:119 2021-11-30_07:25:11.975-0500: [W] ET_perfmon Conditional put for file collectors with version 4021693 failed 2021-11-30_07:25:11.975-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:11.976-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:12.077-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds refresh collectors 4021695' 2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors 2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors 2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed 2021-11-30_07:25:15.528-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694'] 2021-11-30_07:25:15.528-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:12.076-0500: [D] ET_perfmon File collectors has no newer version than 4021694 - CCRProxy.getFile:119 2021-11-30_07:25:15.529-0500: [W] ET_perfmon Conditional put for file collectors with version 4021694 failed 2021-11-30_07:25:15.529-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:15.529-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:15.626-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:16.594-0500: [I] ClientThread-3 received command: 'thresholds refresh collectors 4021696' 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 reload collectors 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 read_collectors 2021-11-30_07:25:19.780-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695'] 2021-11-30_07:25:19.780-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:15.625-0500: [D] ET_perfmon File collectors has no newer version than 4021695 - CCRProxy.getFile:119 2021-11-30_07:25:16.781-0500: [D] ClientThread-3 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:19.780-0500: [W] ET_perfmon Conditional put for file collectors with version 4021695 failed 2021-11-30_07:25:19.781-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:19.781-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:19.881-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:21.238-0500: [I] ClientThread-7 received command: 'thresholds refresh collectors 4021697' 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 reload collectors 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 read_collectors 2021-11-30_07:25:21.324-0500: [W] NMES monitor event arrived while still busy for perfmon 2021-11-30_07:25:21.481-0500: [I] ET_threshold Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor 2021-11-30_07:25:21.482-0500: [I] ET_threshold Monitor: LocalState:HEALTHY Events:1 Entities:1 RT: 0.16 2021-11-30_07:25:24.211-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696'] 2021-11-30_07:25:24.211-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:19.881-0500: [D] ET_perfmon File collectors has no newer version than 4021696 - CCRProxy.getFile:119 2021-11-30_07:25:21.411-0500: [D] ClientThread-7 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:24.211-0500: [W] ET_perfmon Conditional put for file collectors with version 4021696 failed 2021-11-30_07:25:24.212-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:24.212-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:24.314-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:24.543-0500: [I] ET_gui ServiceMonitor => out=Type=notify And then gpfsgui apparently crashes and systemd automatically restarts it. Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From luis.bolinches at fi.ibm.com Tue Nov 30 13:30:06 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 30 Nov 2021 13:30:06 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Nov 30 13:34:17 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Nov 2021 13:34:17 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: References: , <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From s.j.thompson at bham.ac.uk Mon Nov 1 14:50:54 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Nov 2021 14:50:54 +0000 Subject: [gpfsug-discuss] SSUG UK User Group Message-ID: Hi All, I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK. I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards. If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community? I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment. Hopefully there?s someone interested in taking the group forwards in the future ? Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.j.thompson at bham.ac.uk Tue Nov 2 14:02:10 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Nov 2021 14:02:10 +0000 Subject: [gpfsug-discuss] Upcoming Events Message-ID: Hi All, We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person! IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON) IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI. See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/ Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html SSUG::Digital (1st, 2nd December 2021, VIRTUAL) For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners. See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/ Joining link: To be confirmed SSUG @CIUK 2021 (10th December 2021, IN PERSON) This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ? See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/ As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group. CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Thu Nov 4 21:17:33 2021 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Thu, 04 Nov 2021 17:17:33 -0400 Subject: [gpfsug-discuss] possible to rename a snapshot? Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark From heinrich.billich at id.ethz.ch Mon Nov 8 09:20:24 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 8 Nov 2021 09:20:24 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From olaf.weiser at de.ibm.com Mon Nov 8 09:53:04 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 8 Nov 2021 09:53:04 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Nov 8 09:54:18 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Nov 2021 09:54:18 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > notice that this directory randomly vanishes. Mmhealth does not > complain but just notes that it will no longer monitor the directory. > Still I doubt that trace collection and similar will create the > directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > 6.0.2.2 and 6.0.2.2. > I know several Linux distributions clear the contents of /tmp at boot time. Could that explain it? I would say using /tmp like you are doing is not a sensible idea anyway and that you should be using something under /var. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From lior at nyu.edu Mon Nov 8 14:38:35 2021 From: lior at nyu.edu (Lior Atar) Date: Mon, 8 Nov 2021 09:38:35 -0500 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4 In-Reply-To: References: Message-ID: Hello all, /tmp/mmfs is being deleted every 10 days by a systemd service " systemd-tmpfiles-setup.service ". That service calls a configuration file " /usr/lib/tmpfiles.d/tmp.conf . What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then create the directory /tmp/mmfs and then exclude deleting going forward. Here's our actual file and some commentary of what the options mean: # cat /etc/tmpfiles.d/tmp.conf # Create a /tmp/mmfs directory d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory x /tmp/mmfs/* <-------- the " x " says to ignore it That change helped us avoid /tmp/mmfs from being deleted every 10 days. In addition I think also did a %systemctl daemon-reload ( but I don't have it in my notes, wouldn't hurt to run it ) Hope this helps, Lior On Mon, Nov 8, 2021 at 7:00 AM wrote: > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. /tmp/mmfs vanishes randomly? (Billich Heinrich Rainer (ID SD)) > 2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser) > 3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 8 Nov 2021 09:20:24 +0000 > From: "Billich Heinrich Rainer (ID SD)" > To: gpfsug main discussion list > Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch> > Content-Type: text/plain; charset="utf-8" > > Hello, > > We use /tmp/mmfs as dataStructureDump directory. Since a while I notice > that this directory randomly vanishes. Mmhealth does not complain but just > notes that it will no longer monitor the directory. Still I doubt that > trace collection and similar will create the directory when needed? > > Do you know of any spectrum scale internal mechanism that could cause > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > installation, too. It happens just on one or two nodes at a time, it's no > cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and > 6.0.2.2. > > Thank you, > > Mmhealth message: > local_fs_path_not_found INFO The configured dataStructureDump path > /tmp/mmfs does not exists. Skipping monitoring. > > Kind regards, > > Heiner > --- > ======================= > Heinrich Billich > ETH Z?rich > Informatikdienste > Tel.: +41 44 632 72 56 > heinrich.billich at id.ethz.ch > ======================== > > > > > > ------------------------------ > > Message: 2 > Date: Mon, 8 Nov 2021 09:53:04 +0000 > From: "Olaf Weiser" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: < > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e= > > > > ------------------------------ > > Message: 3 > Date: Mon, 8 Nov 2021 09:54:18 +0000 > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote: > > > Hello, > > > > We use /tmp/mmfs as dataStructureDump directory. Since a while I > > notice that this directory randomly vanishes. Mmhealth does not > > complain but just notes that it will no longer monitor the directory. > > Still I doubt that trace collection and similar will create the > > directory when needed? > > > > Do you know of any spectrum scale internal mechanism that could cause > > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM > > installation, too. It happens just on one or two nodes at a time, > > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS > > 6.0.2.2 and 6.0.2.2. > > > > I know several Linux distributions clear the contents of /tmp at boot > time. Could that explain it? > > I would say using /tmp like you are doing is not a sensible idea anyway > and that you should be using something under /var. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e= > > > End of gpfsug-discuss Digest, Vol 118, Issue 4 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.r.sudbery at bham.ac.uk Tue Nov 9 16:55:36 2021 From: l.r.sudbery at bham.ac.uk (Luke Sudbery) Date: Tue, 9 Nov 2021 16:55:36 +0000 Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0? Message-ID: mmbuildgpl in 5.1.2.0 has build me a package with the filename: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm Before it would have been: gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64. Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one... Many thanks, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don't work on Monday. -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Wed Nov 10 10:28:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 10 Nov 2021 10:28:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, have you ever received any reply to this or managed to solve it? We are seeing exactly the same error and it's filling up our logs. It seems all the monitoring data is still extracted, so I'm not sure when it started so not sure if this is related to any upgrade on our side, but it may have been going on for a while. We only noticed because the log file now is filling up the local log partition. Kind regards, Frederik On 26/08/2021 11:49, Ragho Mahalingam wrote: > We've been working on setting up mmperfmon; after creating a new > configuration with the new collector on the same manager node, mmsysmon > keeps throwing exceptions. > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > 123, in _getDataFromZimonSocket > sock.connect(SOCKET_PATH) > FileNotFoundError: [Errno 2] No such file or directory > > Tracing this a bit, it appears that SOCKET_PATH is > /var/run/perfmon/pmcollector.socket and this unix domain socket is absent, > even though pmcollector has started and is running successfully. > > Under what scenarios is pmcollector supposed to create this socket? I > don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg, > so I'm assuming the socket is automatically created when pmcollector starts. > > Any thoughts on how to debug and resolve this? > > Thanks, Ragu -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From ragho.mahalingam+spectrumscaleug at pathai.com Wed Nov 10 14:00:19 2021 From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam) Date: Wed, 10 Nov 2021 09:00:19 -0500 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Frederick, In our case the issue started appearing after upgrading from 5.0.4 to 5.1.1. If you've recently upgraded, then the following may be useful. Turns out that mmsysmon (gpfs-base package) requires the new gpfs.gss.pmcollector (from zimon packages) to function correctly (the AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In our case, we'd upgraded all the mandatory packages but had not upgraded the optional ones; the mmsysmonc python libs appears to be updated by the pmcollector package from my study. If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* packages installed. If gpfs.gss.pmcollector isn't installed, you'd definitely need that to make this runaway logging stop. Hope that helps! Ragu On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < frederik.ferner at diamond.ac.uk> wrote: > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We are > seeing exactly the same error and it's filling up our logs. It seems all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, mmsysmon > > keeps throwing exceptions. > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > 123, in _getDataFromZimonSocket > > sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket? I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > SciComp Help Desk can be reached on x8596 > > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright and or > privileged material, and are for the use of the intended addressee only. If > you are not the intended addressee or an authorised recipient of the > addressee please notify us of receipt by returning the e-mail and do not > use, copy, retain, distribute or disclose the information in or attached to > the e-mail. > Any opinions expressed within this e-mail are those of the individual and > not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for any > damage which you may sustain as a result of software viruses which may be > transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England > and Wales with its registered office at Diamond House, Harwell Science and > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Disclaimer: This email and any corresponding attachments may contain confidential information. If you're not the intended recipient, any copying, distribution, disclosure, or use of any information contained in the email or its attachments is strictly prohibited. If you believe to have received this email in error, please email security at pathai.com immediately, then destroy the email and any attachments without reading or saving.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Nov 10 14:14:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 10 Nov 2021 14:14:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?mmsysmon_exception_with_pmcollector_so?= =?utf-8?q?cket=09being_absent?= In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Nov 11 13:38:56 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:38:56 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket being absent In-Reply-To: References: Message-ID: Hi Ragu, many thanks for the response. That was indeed the problem. We missed it when we upgraded a while ago and because our normal monitoring continued to work, we didn't notice until now. Kind regards, Frederik On 10/11/2021 09:00, Ragho Mahalingam wrote: > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1. If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1). In > our case, we'd upgraded all the mandatory packages but had not upgraded the > optional ones; the mmsysmonc python libs appears to be updated by the > pmcollector package from my study. > > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed. If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > > Hope that helps! > > Ragu > > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner < > frederik.ferner at diamond.ac.uk> wrote: > > > Hi Ragu, > > > > have you ever received any reply to this or managed to solve it? We are > > seeing exactly the same error and it's filling up our logs. It seems all > > the monitoring data is still extracted, so I'm not sure when it > > started so not sure if this is related to any upgrade on our side, but > > it may have been going on for a while. We only noticed because the log > > file now is filling up the local log partition. > > > > Kind regards, > > Frederik > > > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > > We've been working on setting up mmperfmon; after creating a new > > > configuration with the new collector on the same manager node, mmsysmon > > > keeps throwing exceptions. > > > > > > File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line > > > 123, in _getDataFromZimonSocket > > > sock.connect(SOCKET_PATH) > > > FileNotFoundError: [Errno 2] No such file or directory > > > > > > Tracing this a bit, it appears that SOCKET_PATH is > > > /var/run/perfmon/pmcollector.socket and this unix domain socket is > > absent, > > > even though pmcollector has started and is running successfully. > > > > > > Under what scenarios is pmcollector supposed to create this socket? I > > > don't see any configuration for this in > > /opt/IBM/zimon/ZIMonCollector.cfg, > > > so I'm assuming the socket is automatically created when pmcollector > > starts. > > > > > > Any thoughts on how to debug and resolve this? > > > > > > Thanks, Ragu > > > > -- > > Frederik Ferner (he/him) > > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 > > Diamond Light Source Ltd. mob: +44 7917 08 5110 > > > > SciComp Help Desk can be reached on x8596 > > > > > > (Apologies in advance for the lines below. Some bits are a legal > > requirement and I have no control over them.) > > > > -- > > This e-mail and any attachments may contain confidential, copyright and or > > privileged material, and are for the use of the intended addressee only. If > > you are not the intended addressee or an authorised recipient of the > > addressee please notify us of receipt by returning the e-mail and do not > > use, copy, retain, distribute or disclose the information in or attached to > > the e-mail. > > Any opinions expressed within this e-mail are those of the individual and > > not necessarily of Diamond Light Source Ltd. > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > > attachments are free from viruses and we cannot accept liability for any > > damage which you may sustain as a result of software viruses which may be > > transmitted in or with the message. > > Diamond Light Source Limited (company no. 4375679). Registered in England > > and Wales with its registered office at Diamond House, Harwell Science and > > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > *Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained in > the email or its attachments is strictly prohibited. If you believe to have > received this email in error, please email security at pathai.com > immediately, then destroy the email and any > attachments without reading or saving.* > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From frederik.ferner at diamond.ac.uk Thu Nov 11 13:45:16 2021 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 11 Nov 2021 13:45:16 +0000 Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket?being absent In-Reply-To: References: Message-ID: Hi Fred, we haven't used the deployement tool anywhere so far, we always apply/upgrade the RPMs directly. (Centrally managed via CFengine, promising that certain Spectrum Scale RPMs are installed. I haven't yet checked how the gpfs.gss.pmcollector RPM were installed initially as they weren't in our list of promised packages, which is why the upgrade was missed.) Kind regards, Frederik On 10/11/2021 14:14, Frederick Stock wrote: > I am curious to know if you upgraded by manually applying rpms or if you > used the Spectrum Scale deployment tool (spectrumscale command) to apply > the upgrade? > Fred > _______________________________________________________ > Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 > stockf at us.ibm.com > ? > ? > > ----- Original message ----- > From: "Ragho Mahalingam" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with > pmcollector socket being absent > Date: Wed, Nov 10, 2021 9:00 AM > ? > Hi Frederick, > > In our case the issue started appearing after upgrading from 5.0.4 to > 5.1.1.? If you've recently upgraded, then the following may be useful. > > Turns out that mmsysmon (gpfs-base package) requires the new > gpfs.gss.pmcollector (from zimon packages) to function correctly (the > AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).? > In our case, we'd upgraded all the mandatory packages but had > not?upgraded the optional ones; the mmsysmonc?python libs appears to be > updated by the pmcollector package from my study. > ? > If you're running >5.1, I'd suggest checking the versions of gpfs.gss.* > packages installed.? If gpfs.gss.pmcollector isn't installed, you'd > definitely need that to make this runaway logging stop. > ? > Hope that helps! > ? > Ragu > ? > On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner > <[1]frederik.ferner at diamond.ac.uk> wrote: > > Hi Ragu, > > have you ever received any reply to this or managed to solve it? We > are > seeing exactly the same error and it's filling up our logs. It seems > all > the monitoring data is still extracted, so I'm not sure when it > started so not sure if this is related to any upgrade on our side, but > it may have been going on for a while. We only noticed because the log > file now is filling up the local log partition. > > Kind regards, > Frederik > > On 26/08/2021 11:49, Ragho Mahalingam wrote: > > We've been working on setting up mmperfmon; after creating a new > > configuration with the new collector on the same manager node, > mmsysmon > > keeps throwing exceptions. > > > >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", > line > > 123, in _getDataFromZimonSocket > >? ? ?sock.connect(SOCKET_PATH) > > FileNotFoundError: [Errno 2] No such file or directory > > > > Tracing this a bit, it appears that SOCKET_PATH is > >? /var/run/perfmon/pmcollector.socket and this unix domain socket is > absent, > > even though pmcollector has started and is running successfully. > > > > Under what scenarios is pmcollector supposed to create this socket?? > I > > don't see any configuration for this in > /opt/IBM/zimon/ZIMonCollector.cfg, > > so I'm assuming the socket is automatically created when pmcollector > starts. > > > > Any thoughts on how to debug and resolve this? > > > > Thanks, Ragu > > -- > Frederik Ferner (he/him) > Senior Computer Systems Administrator (storage) phone: +44 1235 77 > 8624 > Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08 > 5110 > > SciComp Help Desk can be reached on x8596 > > (Apologies in advance for the lines below. Some bits are a legal > requirement and I have no control over them.) > > -- > This e-mail and any attachments may contain confidential, copyright > and or privileged material, and are for the use of the intended > addressee only. If you are not the intended addressee or an authorised > recipient of the addressee please notify us of receipt by returning > the e-mail and do not use, copy, retain, distribute or disclose the > information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual > and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for > any damage which you may sustain as a result of software viruses which > may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in > England and Wales with its registered office at Diamond House, Harwell > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United > Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at [2]spectrumscale.org > [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Disclaimer: This email and any corresponding attachments may contain > confidential information. If you're not the intended recipient, any > copying, distribution, disclosure, or use of any information contained > in the email or its attachments is strictly prohibited. If you believe > to have received this email in error, please email > [4]security at pathai.com immediately, then destroy the email and any > attachments without reading or saving. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > ? > > References > > Visible links > 1. mailto:frederik.ferner at diamond.ac.uk > 2. http://spectrumscale.org/ > 3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 4. mailto:security at pathai.com > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Frederik Ferner (he/him) Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 SciComp Help Desk can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From pinkesh.valdria at oracle.com Fri Nov 12 07:57:14 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 07:57:14 +0000 Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 11:54:38 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 17:24:38 +0530 Subject: [gpfsug-discuss] =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?= =?utf-8?q?nvalid_skey=09=28secret_key=29?= In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinkesh.valdria at oracle.com Fri Nov 12 12:26:44 2021 From: pinkesh.valdria at oracle.com (Pinkesh Valdria) Date: Fri, 12 Nov 2021 12:26:44 +0000 Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage - fails with invalid skey (secret key) In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS ________________________________ From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.comset --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Nov 12 12:50:48 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 12 Nov 2021 18:20:48 +0530 Subject: [gpfsug-discuss] =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?= =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?= In-Reply-To: References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com> Message-ID: Hi Pinkesh, You could open a ticket to get the efix. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "Venkateswara R Puvvada" , "gpfsug main discussion list" Date: 11/12/2021 05:57 PM Subject: Re: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Thanks Venkat for quick response. Unfortunately secret keys are auto generated and all of them have = at the end :-(. Is there a way to receive a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available? Get Outlook for iOS From: Venkateswara R Puvvada Sent: Friday, November 12, 2021 7:54:38 PM To: gpfsug main discussion list ; Pinkesh Valdria Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Hi, AFM does not accept character '=' as part of access and secret keys. It matches the keys with below expression "$KEY" =~ ^[0-9a-zA-Z/+._]+$ We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char. ~Venkat (vpuvvada at in.ibm.com) From: "Pinkesh Valdria" To: "gpfsug-discuss at spectrumscale.org" Date: 11/12/2021 02:31 PM Subject: [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello GPFS experts, Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me. I was wondering if you can help me or introduce me to the person/team who can help. Failed: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg= invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key. Proof: mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg Works mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com get 22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg I tried to use single quote, double quote around the secret keys, but it still fails. mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx 'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=' mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set 22f79xxxx ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=? I also tried to add the key in the keyfile and still it fails. [root at dr-compute-1 ras]# mmafmcoskeys afm-ocios: us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com set --keyfile /var/adm/ras/keyfile invalid skey (secret key) mmafmcoskeys: Command failed. Examine previous error messages to determine cause. [root at dr-compute-1 ras]# Thanks, Pinkesh Valdria Head of HPC Storage Master Principal Solutions Architect ? HPC Oracle Cloud Infrastructure +65-8932-3639 (m) - Singapore +1-425-205-7834 (m) ? USA Blogs on File Systems on OCI _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Nov 15 18:44:04 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 15 Nov 2021 18:44:04 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? ? fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncalimet at lenovo.com Mon Nov 15 21:31:03 2021 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Mon, 15 Nov 2021 21:31:03 +0000 Subject: [gpfsug-discuss] [External] Pmcollector fails to start In-Reply-To: References: Message-ID: Hi, I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic). In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster: mmchnode --noperfmon -N all # required before deleting the perfmon config mmperfmon config delete --all mmperfmon config generate --collectors # start the pmcollector service on the GUI nodes mmchnode --perfmon -N all # start the pmsensors service on all nodes It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies. HTH -- Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Monday, November 15, 2021 19:44 To: gpfsug main discussion list Subject: [External] [gpfsug-discuss] Pmcollector fails to start Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon ?service pmcollector start? - fails: Redirecting to /bin/systemctl status pmcollector.service ? pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Nov 16 16:44:21 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 16 Nov 2021 16:44:21 +0000 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Nov 18 09:09:25 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 18 Nov 2021 17:09:25 +0800 Subject: [gpfsug-discuss] possible to rename a snapshot? In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE> References: <1825700-1636060653.986878@yfV0.OUFD.5EUE> Message-ID: Mark, GPFS does not support to rename an existing snapshot. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: mark.bergman at uphs.upenn.edu To: "gpfsug main discussion list" Date: 2021/11/05 05:33 AM Subject: [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From HAUBRICH at de.ibm.com Thu Nov 18 13:01:39 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Thu, 18 Nov 2021 15:01:39 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Thu Nov 18 13:53:47 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Nov 2021 13:53:47 +0000 Subject: [gpfsug-discuss] Pmcollector fails to start In-Reply-To: References: Message-ID: That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size. chown?ing that to scalepm.scalepm fixed it. Now, creating a user ?scalepm? on the sly and not telling me ? not good! Bob Oesterlin Sr Principal Storage Engineer Nuance Communications From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Manfred Haubrich Date: Thursday, November 18, 2021 at 7:01 AM To: gpfsug-discuss at spectrumscale.org Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ status=203/EXEC could be a permission issue. Starting manually from command line (most likely as root) did work. With 5.1.1, pmcollector runs as user scalepm. The package scripts create the user and apply according access with chmod/chown. The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts Maybe user scalepm is gone or there was an issue during package install/upgrade. Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development ________________________________ Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main ________________________________ IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 49 bytes Desc: ecblank.gif URL: From HAUBRICH at de.ibm.com Fri Nov 19 09:00:49 2021 From: HAUBRICH at de.ibm.com (Manfred Haubrich) Date: Fri, 19 Nov 2021 11:00:49 +0200 Subject: [gpfsug-discuss] Pmcollector fails to start Message-ID: Sorry for that difficulty, but the new user for the performance monitoring tool was mentioned in the 5.1.1 summary of changes https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes Mit freundlichen Gr??en / Best regards / Saludos Manfred Haubrich IBM Spectrum Scale Development Phone: +49 162 4159 706 IBM Deutschland Research & Development GmbH Email: haubrich at de.ibm.com Wilhelm-Fay-Str. 34 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From PSAFRE at de.ibm.com Fri Nov 19 13:49:11 2021 From: PSAFRE at de.ibm.com (Pavel Safre) Date: Fri, 19 Nov 2021 15:49:11 +0200 Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly? In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch> <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch> Message-ID: Hello Heiner, just a heads up for you and the other storage admins, regularly cleaning up /tmp, regarding one aspect to keep in mind: - If you are using Spectrum Scale software call home (mmcallhome), it would be using the directory ${dataStructureDump}/callhome to save the copies of the uploaded data. This would be /tmp/mmfs/callhome/ in your case, which you would be automatically regularly removing. - These copies are used by one of the features of call home: "mmcallhome status diff" - This feature allows to see an overview of the Spectrum Scale configuration changes, that occurred between 2 different points in time. - This effectively allows to quickly find out if any config changes occurred prior to an outage, thereby helping to find the root cause of self-caused problems in the Scale cluster. - It was added in Scale 5.0.5.0 See IBM KC for more details: https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command - As a source of the "config snapshots", mmcallhome status diff is using the DC packages inside of ${dataStructureDump}/callhome, which you would be regularly deleting, thereby hugely reducing the usability of this particular feature. - Of course, software call home automatically makes sure, it will not use too much space in dataStructureDump and it automatically removes the oldest entries, keeping at most 2GB or 300 files inside (default values, configurable). Mit freundlichen Gr??en / Kind regards Pavel Safre Software Engineer IBM Systems Group, IBM Spectrum Scale Development Dept. M925 Phone: IBM Deutschland Research & Development GmbH Email: psafre at de.ibm.com Wilhelm-Fay-Stra?e 32 65936 Frankfurt am Main IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: "gpfsug main discussion list" Date: 16.11.2021 17:44 Subject: [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello Olaf, Thank you, you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory /tmp/mmfs - or even create it ? but to clean it?s content . Best regards, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 8 November 2021 at 10:53 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly? Hallo Heiner, multiple levels of answers.. (1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this: [root at ess5-ems1 ~]# ls -l /tmp/mmfs ls: cannot access '/tmp/mmfs': No such file or directory [root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# [root at ess5-ems1 ~]# ls -l /tmp/mmfs total 0 -rw-r--r-- 1 root root 0 Nov 8 10:47 lxtrace.trcerr.ems5k [root at ess5-ems1 ~]# (2nd) I think - the cleaning of /tmp is something done by the OS - please check - systemctl status systemd-tmpfiles-setup.service or look at this config file [root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override q /tmp 1777 root root 10d q /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp # Remove top-level private temporary directories on each boot R! /tmp/systemd-private-* R! /var/tmp/systemd-private-* [root at ess5-ems1 ~]# hope this helps - cheers Mit freundlichen Gr??en / Kind regards Olaf Weiser IBM Systems, SpectrumScale Client Adoption ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 ----- Urspr?ngliche Nachricht ----- Von: "Billich Heinrich Rainer (ID SD)" Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "gpfsug main discussion list" CC: Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly? Datum: Mo, 8. Nov 2021 10:35 Hello, We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed? Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2. Thank you, Mmhealth message: local_fs_path_not_found INFO The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring. Kind regards, Heiner --- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From novosirj at rutgers.edu Fri Nov 19 16:46:34 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 19 Nov 2021 16:46:34 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu> Has any progress been made here at all? I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create. What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data. Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Aug 23, 2018, at 7:50 AM, Markus Rohwedder wrote: > > Hello Juri, Keith, > > thank you for your responses. > > The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. > > Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. > However we have not set up, tested nor documented such a configuration as of today. > > Currently the GUI service manages the iptables redirect bring up and tear down. > If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. > > In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to > - not check for already used ports when the GUI service tries to start up > - don't create/destroy port forwarding rules during GUI service start and stop. > This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. > > The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. > > See this KC link: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > "Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case > > From: "Daniel Kidger" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 23.08.2018 12:13 > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Keith, > > I have another IBM customer who also wished to move Scale GUI's https ports. > In their case because they had their own web based management interface on the same https port. > Is this the same reason that you have? > If so I wonder how many other sites have the same issue? > > One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). > Then run the two different GUIs, one per IP address. > Is this an option, albeit a little ugly? > Daniel > > <17310450.gif> Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-(0)7818 522 266 > daniel.kidger at uk.ibm.com > > > > ----- Original message ----- > From: "Markus Rohwedder" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Date: Thu, Aug 23, 2018 9:51 AM > Hello Keith, > > it is not so easy. > > The GUI receives events from other scale components using the currently defined ports. > Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). > Therefore at this point there is no procedure to change this behaviour across all components. > > Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. > Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. > The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. > If these ports are already used by another service, the GUI will not start up. > > Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. > If you want to emphasize your case as future development item, please let me know. > > I would also be interested in: > > Scale version you are running > > Do you need port 80 or 443 as well? > > Would it work for you if the xCAT service was bound to a single IP address? > > Mit freundlichen Gr??en / Kind regards > > Dr. Markus Rohwedder > > Spectrum Scale GUI Development > > > Phone: +49 7034 6430190 IBM Deutschland Research & Development > <17153317.gif> > E-Mail: rohwedder at de.ibm.com Am Weiher 24 > 65451 Kelsterbach > Germany > > > Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? > > From: Keith Ball > To: gpfsug-discuss at spectrumscale.org > Date: 22.08.2018 21:33 > Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Hello All, > > Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: > > > > > > but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. > > Many Thanks, > Keith > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Tue Nov 23 17:59:12 2021 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 23 Nov 2021 17:59:12 +0000 Subject: [gpfsug-discuss] AFM does too small NFS writes, and I don't see parallel writes Message-ID: Hello, We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight: The migration to afm home seems slow. I note: Afm writes a whole file of ~100MB in much too many small chunks My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time). I the afm queue I see 8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862 I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091. So if I?m right my question is: What can I change to make afm ?write less and larger chunks per file? Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently? Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ? We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB. We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap. Kind regards, Heiner Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files ------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total- read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send ?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M ?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M ?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M ?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M ?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5254 bytes Desc: not available URL: From scl at virginia.edu Tue Nov 30 12:47:46 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Tue, 30 Nov 2021 12:47:46 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Hi folks, Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt -rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp -rw-r--r-- 1 scalemgmt scalemgmt 2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt -rw-r--r-- 1 scalemgmt scalemgmt 1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc -rw-r--r-- 1 scalemgmt scalemgmt 202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp The core.*.dmp files are cores from the java command. And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log. Any suggestions? Thanks for any help. 2021-11-30_07:25:09.944-0500: [W] ET_gui Event=gui_down identifier= arg0=started arg1=stopped 2021-11-30_07:25:09.961-0500: [I] ET_gui state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572 2021-11-30_07:25:09.963-0500: [I] ClientThread-4 received command: 'thresholds refresh collectors 4021694' 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 reload collectors 2021-11-30_07:25:09.964-0500: [I] ClientThread-4 read_collectors 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryHandler: query response has no data results 2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:10.061-0500: [I] ClientThread-4 _activate_rules_scheduler completed 2021-11-30_07:25:10.147-0500: [I] ET_gui Event=component_state_change identifier= arg0=GUI arg1=FAILED 2021-11-30_07:25:10.148-0500: [I] ET_gui StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN 2021-11-30_07:25:10.148-0500: [I] ET_gui Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4 2021-11-30_07:25:10.148-0500: [I] ET_gui Monitor: LocalState:FAILED Events:607 Entities:0 RT: 0.83 2021-11-30_07:25:11.975-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693'] 2021-11-30_07:25:11.975-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:04.553-0500: [D] ET_perfmon File collectors has no newer version than 4021693 - CCRProxy.getFile:119 2021-11-30_07:25:11.975-0500: [W] ET_perfmon Conditional put for file collectors with version 4021693 failed 2021-11-30_07:25:11.975-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:11.976-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:12.077-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds refresh collectors 4021695' 2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors 2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors 2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results 2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting 2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed 2021-11-30_07:25:15.528-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694'] 2021-11-30_07:25:15.528-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:12.076-0500: [D] ET_perfmon File collectors has no newer version than 4021694 - CCRProxy.getFile:119 2021-11-30_07:25:15.529-0500: [W] ET_perfmon Conditional put for file collectors with version 4021694 failed 2021-11-30_07:25:15.529-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:15.529-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:15.626-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:16.594-0500: [I] ClientThread-3 received command: 'thresholds refresh collectors 4021696' 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 reload collectors 2021-11-30_07:25:16.595-0500: [I] ClientThread-3 read_collectors 2021-11-30_07:25:19.780-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695'] 2021-11-30_07:25:19.780-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:15.625-0500: [D] ET_perfmon File collectors has no newer version than 4021695 - CCRProxy.getFile:119 2021-11-30_07:25:16.781-0500: [D] ClientThread-3 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:19.780-0500: [W] ET_perfmon Conditional put for file collectors with version 4021695 failed 2021-11-30_07:25:19.781-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:19.781-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:19.881-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:21.238-0500: [I] ClientThread-7 received command: 'thresholds refresh collectors 4021697' 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 reload collectors 2021-11-30_07:25:21.239-0500: [I] ClientThread-7 read_collectors 2021-11-30_07:25:21.324-0500: [W] NMES monitor event arrived while still busy for perfmon 2021-11-30_07:25:21.481-0500: [I] ET_threshold Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor 2021-11-30_07:25:21.482-0500: [I] ET_threshold Monitor: LocalState:HEALTHY Events:1 Entities:1 RT: 0.16 2021-11-30_07:25:24.211-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696'] 2021-11-30_07:25:24.211-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805) - CCRProxy._run_ccr_command:256 2021-09-29_20:03:53.322-0500: [I] MainThread --------------------------------- 2021-11-30_07:25:19.881-0500: [D] ET_perfmon File collectors has no newer version than 4021696 - CCRProxy.getFile:119 2021-11-30_07:25:21.411-0500: [D] ClientThread-7 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119 2021-11-30_07:25:24.211-0500: [W] ET_perfmon Conditional put for file collectors with version 4021696 failed 2021-11-30_07:25:24.212-0500: [W] ET_perfmon New version received, start new collectors update cycle 2021-11-30_07:25:24.212-0500: [I] ET_perfmon read_collectors 2021-11-30_07:25:24.314-0500: [I] ET_perfmon write_collectors 2021-11-30_07:25:24.543-0500: [I] ET_gui ServiceMonitor => out=Type=notify And then gpfsgui apparently crashes and systemd automatically restarts it. Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From luis.bolinches at fi.ibm.com Tue Nov 30 13:30:06 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 30 Nov 2021 13:30:06 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Nov 30 13:34:17 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Nov 2021 13:34:17 +0000 Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop In-Reply-To: References: , <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: