From s.j.thompson at bham.ac.uk  Mon Nov  1 14:50:54 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Nov 2021 14:50:54 +0000
Subject: [gpfsug-discuss] SSUG UK User Group
Message-ID: <CWXP265MB12224FDDECF959C89FD6B946E58A9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK.

I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards.

If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community?

I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment.

Hopefully there?s someone interested in taking the group forwards in the future ?

Simon
UK Group Chair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211101/a40b50b4/attachment.htm>

From s.j.thompson at bham.ac.uk  Tue Nov  2 14:02:10 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Nov 2021 14:02:10 +0000
Subject: [gpfsug-discuss] Upcoming Events
Message-ID: <CWXP265MB1222681963FB53720DE70E2BE58B9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person!

IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON)
IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI.

See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/
Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html

SSUG::Digital (1st, 2nd December 2021, VIRTUAL)
For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners.
See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/
Joining link: To be confirmed

SSUG @CIUK 2021 (10th December 2021, IN PERSON)
This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ?
See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/
As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group.
CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx

Thanks

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211102/afa5f990/attachment.htm>

From mark.bergman at uphs.upenn.edu  Thu Nov  4 21:17:33 2021
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Thu, 04 Nov 2021 17:17:33 -0400
Subject: [gpfsug-discuss] possible to rename a snapshot?
Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE>

Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7?

Thanks,

Mark


From heinrich.billich at id.ethz.ch  Mon Nov  8 09:20:24 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 8 Nov 2021 09:20:24 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
From olaf.weiser at de.ibm.com  Mon Nov  8 09:53:04 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 8 Nov 2021 09:53:04 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/1d32c09e/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Mon Nov  8 09:54:18 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Nov 2021 09:54:18 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2@strath.ac.uk>

On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:

> Hello,
> 
> We use /tmp/mmfs as dataStructureDump directory. Since a while I
> notice that this directory randomly vanishes. Mmhealth does not
> complain but just notes that it will no longer monitor the directory.
> Still I doubt that trace collection and similar will create the
> directory when needed?
> 
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time,
> it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> 6.0.2.2 and 6.0.2.2.
> 

I know several Linux distributions clear the contents of /tmp at boot 
time. Could that explain it?

I would say using /tmp like you are doing is not a sensible idea anyway 
and that you should be using something under /var.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From lior at nyu.edu  Mon Nov  8 14:38:35 2021
From: lior at nyu.edu (Lior Atar)
Date: Mon, 8 Nov 2021 09:38:35 -0500
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4
In-Reply-To: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAAzOg0orG4nkxev+0LRDwxRtGADnU7Nsv9q+Aw=3cU21LitVcA@mail.gmail.com>

Hello all,

/tmp/mmfs is being deleted every 10 days by a systemd service "
systemd-tmpfiles-setup.service
". That service calls a configuration file "  /usr/lib/tmpfiles.d/tmp.conf
. What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then
create the directory /tmp/mmfs and then exclude deleting going forward.
Here's our actual file and some commentary of what the options mean:

# cat /etc/tmpfiles.d/tmp.conf
# Create a /tmp/mmfs directory
d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory
x /tmp/mmfs/*                 <-------- the " x " says to ignore it

That change helped us avoid /tmp/mmfs from being deleted every 10 days.

In addition I think also did a %systemctl daemon-reload ( but I don't have
it in my notes, wouldn't hurt to run it )

Hope this helps,
Lior

On Mon, Nov 8, 2021 at 7:00 AM <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. /tmp/mmfs vanishes randomly? (Billich  Heinrich Rainer (ID SD))
>    2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser)
>    3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 8 Nov 2021 09:20:24 +0000
> From: "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> We use /tmp/mmfs as dataStructureDump directory. Since a while I notice
> that this directory randomly vanishes. Mmhealth does not complain but just
> notes that it will no longer monitor the directory. Still I doubt that
> trace collection and similar will create the directory when needed?
>
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time, it's no
> cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and
> 6.0.2.2.
>
> Thank you,
>
> Mmhealth message:
> local_fs_path_not_found   INFO       The configured dataStructureDump path
> /tmp/mmfs does not exists. Skipping monitoring.
>
> Kind regards,
>
> Heiner
> ---
> =======================
> Heinrich Billich
> ETH Z?rich
> Informatikdienste
> Tel.: +41 44 632 72 56
> heinrich.billich at id.ethz.ch
> ========================
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 8 Nov 2021 09:53:04 +0000
> From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
> To: gpfsug-discuss at spectrumscale.org
> Cc: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID:
>         <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25 at ibm.com
> >
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL: <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e=
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 8 Nov 2021 09:54:18 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:
>
> > Hello,
> >
> > We use /tmp/mmfs as dataStructureDump directory. Since a while I
> > notice that this directory randomly vanishes. Mmhealth does not
> > complain but just notes that it will no longer monitor the directory.
> > Still I doubt that trace collection and similar will create the
> > directory when needed?
> >
> > Do you know of any spectrum scale internal mechanism that could cause
> > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> > installation, too. It happens just on one or two nodes at a time,
> > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> > 6.0.2.2 and 6.0.2.2.
> >
>
> I know several Linux distributions clear the contents of /tmp at boot
> time. Could that explain it?
>
> I would say using /tmp like you are doing is not a sensible idea anyway
> and that you should be using something under /var.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
>
>
> End of gpfsug-discuss Digest, Vol 118, Issue 4
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/18f9990e/attachment.htm>

From l.r.sudbery at bham.ac.uk  Tue Nov  9 16:55:36 2021
From: l.r.sudbery at bham.ac.uk (Luke Sudbery)
Date: Tue, 9 Nov 2021 16:55:36 +0000
Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0?
Message-ID: <LO2P265MB0704E08CD27D3538B6FB111B90929@LO2P265MB0704.GBRP265.PROD.OUTLOOK.COM>

mmbuildgpl in 5.1.2.0 has build me a package with the filename:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm

Before it would have been:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm

The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.

Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one...

Many thanks,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don't work on Monday.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211109/3e57d8a0/attachment.htm>

From frederik.ferner at diamond.ac.uk  Wed Nov 10 10:28:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 10 Nov 2021 10:28:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
Message-ID: <YYuewK2QIGo/VR23@diamond.ac.uk>

Hi Ragu,

have you ever received any reply to this or managed to solve it? We are
seeing exactly the same error and it's filling up our logs. It seems all
the monitoring data is still extracted, so I'm not sure when it
started so not sure if this is related to any upgrade on our side, but
it may have been going on for a while. We only noticed because the log
file now is filling up the local log partition.

Kind regards,
Frederik

On 26/08/2021 11:49, Ragho Mahalingam wrote:
> We've been working on setting up mmperfmon; after creating a new
> configuration with the new collector on the same manager node, mmsysmon
> keeps throwing exceptions.
> 
>   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> 123, in _getDataFromZimonSocket
>     sock.connect(SOCKET_PATH)
> FileNotFoundError: [Errno 2] No such file or directory
> 
> Tracing this a bit, it appears that SOCKET_PATH is
>  /var/run/perfmon/pmcollector.socket and this unix domain socket is absent,
> even though pmcollector has started and is running successfully.
> 
> Under what scenarios is pmcollector supposed to create this socket?  I
> don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg,
> so I'm assuming the socket is automatically created when pmcollector starts.
> 
> Any thoughts on how to debug and resolve this?
> 
> Thanks, Ragu

-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From ragho.mahalingam+spectrumscaleug at pathai.com  Wed Nov 10 14:00:19 2021
From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam)
Date: Wed, 10 Nov 2021 09:00:19 -0500
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
	being absent
In-Reply-To: <YYuewK2QIGo/VR23@diamond.ac.uk>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>

Hi Frederick,

In our case the issue started appearing after upgrading from 5.0.4 to
5.1.1.  If you've recently upgraded, then the following may be useful.

Turns out that mmsysmon (gpfs-base package) requires the new
gpfs.gss.pmcollector (from zimon packages) to function correctly (the
AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
our case, we'd upgraded all the mandatory packages but had not upgraded the
optional ones; the mmsysmonc python libs appears to be updated by the
pmcollector package from my study.

If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
definitely need that to make this runaway logging stop.

Hope that helps!

Ragu

On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
frederik.ferner at diamond.ac.uk> wrote:

> Hi Ragu,
>
> have you ever received any reply to this or managed to solve it? We are
> seeing exactly the same error and it's filling up our logs. It seems all
> the monitoring data is still extracted, so I'm not sure when it
> started so not sure if this is related to any upgrade on our side, but
> it may have been going on for a while. We only noticed because the log
> file now is filling up the local log partition.
>
> Kind regards,
> Frederik
>
> On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > We've been working on setting up mmperfmon; after creating a new
> > configuration with the new collector on the same manager node, mmsysmon
> > keeps throwing exceptions.
> >
> >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > 123, in _getDataFromZimonSocket
> >     sock.connect(SOCKET_PATH)
> > FileNotFoundError: [Errno 2] No such file or directory
> >
> > Tracing this a bit, it appears that SOCKET_PATH is
> >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> absent,
> > even though pmcollector has started and is running successfully.
> >
> > Under what scenarios is pmcollector supposed to create this socket?  I
> > don't see any configuration for this in
> /opt/IBM/zimon/ZIMonCollector.cfg,
> > so I'm assuming the socket is automatically created when pmcollector
> starts.
> >
> > Any thoughts on how to debug and resolve this?
> >
> > Thanks, Ragu
>
> --
> Frederik Ferner (he/him)
> Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
>
> SciComp Help Desk can be reached on x8596
>
>
> (Apologies in advance for the lines below. Some bits are a legal
> requirement and I have no control over them.)
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
*Disclaimer: This email and any corresponding attachments may contain 
confidential information. If you're not the intended recipient, any 
copying, distribution, disclosure, or use of any information contained in 
the email or its attachments is strictly prohibited. If you believe to have 
received this email in error, please email security at pathai.com 
<mailto:security at pathai.com> immediately, then destroy the email and any 
attachments without reading or saving.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/016a38b0/attachment.htm>

From stockf at us.ibm.com  Wed Nov 10 14:14:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Wed, 10 Nov 2021 14:14:47 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?mmsysmon_exception_with_pmcollector_so?=
 =?utf-8?q?cket=09being_absent?=
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>,
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com><YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/7d72b727/attachment.htm>

From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:38:56 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:38:56 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
Message-ID: <YY0c8HUC2Pc9kOwA@diamond.ac.uk>

Hi Ragu,

many thanks for the response. That was indeed the problem. We missed it
when we upgraded a while ago and because our normal monitoring continued
to work, we didn't notice until now.

Kind regards,
Frederik

On 10/11/2021 09:00, Ragho Mahalingam wrote:
> Hi Frederick,
> 
> In our case the issue started appearing after upgrading from 5.0.4 to
> 5.1.1.  If you've recently upgraded, then the following may be useful.
> 
> Turns out that mmsysmon (gpfs-base package) requires the new
> gpfs.gss.pmcollector (from zimon packages) to function correctly (the
> AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
> our case, we'd upgraded all the mandatory packages but had not upgraded the
> optional ones; the mmsysmonc python libs appears to be updated by the
> pmcollector package from my study.
> 
> If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
> packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
> definitely need that to make this runaway logging stop.
> 
> Hope that helps!
> 
> Ragu
> 
> On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
> frederik.ferner at diamond.ac.uk> wrote:
> 
> > Hi Ragu,
> >
> > have you ever received any reply to this or managed to solve it? We are
> > seeing exactly the same error and it's filling up our logs. It seems all
> > the monitoring data is still extracted, so I'm not sure when it
> > started so not sure if this is related to any upgrade on our side, but
> > it may have been going on for a while. We only noticed because the log
> > file now is filling up the local log partition.
> >
> > Kind regards,
> > Frederik
> >
> > On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > > We've been working on setting up mmperfmon; after creating a new
> > > configuration with the new collector on the same manager node, mmsysmon
> > > keeps throwing exceptions.
> > >
> > >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > > 123, in _getDataFromZimonSocket
> > >     sock.connect(SOCKET_PATH)
> > > FileNotFoundError: [Errno 2] No such file or directory
> > >
> > > Tracing this a bit, it appears that SOCKET_PATH is
> > >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> > absent,
> > > even though pmcollector has started and is running successfully.
> > >
> > > Under what scenarios is pmcollector supposed to create this socket?  I
> > > don't see any configuration for this in
> > /opt/IBM/zimon/ZIMonCollector.cfg,
> > > so I'm assuming the socket is automatically created when pmcollector
> > starts.
> > >
> > > Any thoughts on how to debug and resolve this?
> > >
> > > Thanks, Ragu
> >
> > --
> > Frederik Ferner (he/him)
> > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> > Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
> >
> > SciComp Help Desk can be reached on x8596
> >
> >
> > (Apologies in advance for the lines below. Some bits are a legal
> > requirement and I have no control over them.)
> >
> > --
> > This e-mail and any attachments may contain confidential, copyright and or
> > privileged material, and are for the use of the intended addressee only. If
> > you are not the intended addressee or an authorised recipient of the
> > addressee please notify us of receipt by returning the e-mail and do not
> > use, copy, retain, distribute or disclose the information in or attached to
> > the e-mail.
> > Any opinions expressed within this e-mail are those of the individual and
> > not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > attachments are free from viruses and we cannot accept liability for any
> > damage which you may sustain as a result of software viruses which may be
> > transmitted in or with the message.
> > Diamond Light Source Limited (company no. 4375679). Registered in England
> > and Wales with its registered office at Diamond House, Harwell Science and
> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> 
> -- 
> *Disclaimer: This email and any corresponding attachments may contain 
> confidential information. If you're not the intended recipient, any 
> copying, distribution, disclosure, or use of any information contained in 
> the email or its attachments is strictly prohibited. If you believe to have 
> received this email in error, please email security at pathai.com 
> <mailto:security at pathai.com> immediately, then destroy the email and any 
> attachments without reading or saving.*

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:45:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:45:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector
 socket?being absent
In-Reply-To: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
Message-ID: <YY0ebNPdMg5CU/sf@diamond.ac.uk>

Hi Fred,

we haven't used the deployement tool anywhere so far, we always
apply/upgrade the RPMs directly. (Centrally managed via CFengine,
promising that certain Spectrum Scale RPMs are installed. I haven't yet
checked how the gpfs.gss.pmcollector RPM were installed initially as
they weren't in our list of promised packages, which is why the upgrade
was missed.)

Kind regards,
Frederik

On 10/11/2021 14:14, Frederick Stock wrote:
>    I am curious to know if you upgraded by manually applying rpms or if you
>    used the Spectrum Scale deployment tool (spectrumscale command) to apply
>    the upgrade?
>    Fred
>    _______________________________________________________
>    Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
>    stockf at us.ibm.com
>    ?
>    ?
> 
>      ----- Original message -----
>      From: "Ragho Mahalingam" <ragho.mahalingam+spectrumscaleug at pathai.com>
>      Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      To: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
>      Cc:
>      Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with
>      pmcollector socket being absent
>      Date: Wed, Nov 10, 2021 9:00 AM
>      ?
>      Hi Frederick,
> 
>      In our case the issue started appearing after upgrading from 5.0.4 to
>      5.1.1.? If you've recently upgraded, then the following may be useful.
> 
>      Turns out that mmsysmon (gpfs-base package) requires the new
>      gpfs.gss.pmcollector (from zimon packages) to function correctly (the
>      AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).?
>      In our case, we'd upgraded all the mandatory packages but had
>      not?upgraded the optional ones; the mmsysmonc?python libs appears to be
>      updated by the pmcollector package from my study.
>      ?
>      If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
>      packages installed.? If gpfs.gss.pmcollector isn't installed, you'd
>      definitely need that to make this runaway logging stop.
>      ?
>      Hope that helps!
>      ?
>      Ragu
>      ?
>      On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner
>      <[1]frederik.ferner at diamond.ac.uk> wrote:
> 
>        Hi Ragu,
> 
>        have you ever received any reply to this or managed to solve it? We
>        are
>        seeing exactly the same error and it's filling up our logs. It seems
>        all
>        the monitoring data is still extracted, so I'm not sure when it
>        started so not sure if this is related to any upgrade on our side, but
>        it may have been going on for a while. We only noticed because the log
>        file now is filling up the local log partition.
> 
>        Kind regards,
>        Frederik
> 
>        On 26/08/2021 11:49, Ragho Mahalingam wrote:
>        > We've been working on setting up mmperfmon; after creating a new
>        > configuration with the new collector on the same manager node,
>        mmsysmon
>        > keeps throwing exceptions.
>        >
>        >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py",
>        line
>        > 123, in _getDataFromZimonSocket
>        >? ? ?sock.connect(SOCKET_PATH)
>        > FileNotFoundError: [Errno 2] No such file or directory
>        >
>        > Tracing this a bit, it appears that SOCKET_PATH is
>        >? /var/run/perfmon/pmcollector.socket and this unix domain socket is
>        absent,
>        > even though pmcollector has started and is running successfully.
>        >
>        > Under what scenarios is pmcollector supposed to create this socket??
>        I
>        > don't see any configuration for this in
>        /opt/IBM/zimon/ZIMonCollector.cfg,
>        > so I'm assuming the socket is automatically created when pmcollector
>        starts.
>        >
>        > Any thoughts on how to debug and resolve this?
>        >
>        > Thanks, Ragu
> 
>        --
>        Frederik Ferner (he/him)
>        Senior Computer Systems Administrator (storage) phone: +44 1235 77
>        8624
>        Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08
>        5110
> 
>        SciComp Help Desk can be reached on x8596
> 
>        (Apologies in advance for the lines below. Some bits are a legal
>        requirement and I have no control over them.)
> 
>        --
>        This e-mail and any attachments may contain confidential, copyright
>        and or privileged material, and are for the use of the intended
>        addressee only. If you are not the intended addressee or an authorised
>        recipient of the addressee please notify us of receipt by returning
>        the e-mail and do not use, copy, retain, distribute or disclose the
>        information in or attached to the e-mail.
>        Any opinions expressed within this e-mail are those of the individual
>        and not necessarily of Diamond Light Source Ltd.
>        Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>        attachments are free from viruses and we cannot accept liability for
>        any damage which you may sustain as a result of software viruses which
>        may be transmitted in or with the message.
>        Diamond Light Source Limited (company no. 4375679). Registered in
>        England and Wales with its registered office at Diamond House, Harwell
>        Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
>        Kingdom
>        _______________________________________________
>        gpfsug-discuss mailing list
>        gpfsug-discuss at [2]spectrumscale.org
>        [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>      Disclaimer: This email and any corresponding attachments may contain
>      confidential information. If you're not the intended recipient, any
>      copying, distribution, disclosure, or use of any information contained
>      in the email or its attachments is strictly prohibited. If you believe
>      to have received this email in error, please email
>      [4]security at pathai.com immediately, then destroy the email and any
>      attachments without reading or saving.
>      _______________________________________________
>      gpfsug-discuss mailing list
>      gpfsug-discuss at spectrumscale.org
>      [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss?
> 
>    ?
> 
> References
> 
>    Visible links
>    1. mailto:frederik.ferner at diamond.ac.uk
>    2. http://spectrumscale.org/
>    3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    4. mailto:security at pathai.com
>    5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From pinkesh.valdria at oracle.com  Fri Nov 12 07:57:14 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 07:57:14 +0000
Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey
	(secret key)
Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>

Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>  get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/540b022f/attachment.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 11:54:38 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 17:24:38 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?=
 =?utf-8?q?nvalid_skey=09=28secret_key=29?=
In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
Message-ID: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>

Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help. 
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key. 
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
 get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/6aa45d39/attachment.htm>

From pinkesh.valdria at oracle.com  Fri Nov 12 12:26:44 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 12:26:44 +0000
Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage -
 fails with invalid skey	(secret key)
In-Reply-To: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
Message-ID: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>

Thanks Venkat for quick response.

Unfortunately secret keys are auto generated and all of them have = at the end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available?


Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key)

Hi,

AFM does not accept character '='  as part of  access and secret keys. It matches the keys with below expression

"$KEY" =~ ^[0-9a-zA-Z/+._]+$

We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!ACWV5N9M2RV99hQ!YKxmZ34lMfepVIlU8m6Srvcc6xP9cbgAPBc7Eqy31T2KQHRIvlAPQtM62TeOLsQdhpi-$>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/da4ee860/attachment.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 12:50:48 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 18:20:48 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?=
 =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?=
In-Reply-To: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
	<MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
Message-ID: <OF95EE4921.A4DD587C-ON0025878B.0045D2C4-6525878B.0046917F@ibm.com>

Hi Pinkesh,

You could open a ticket to get the efix.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "Venkateswara R Puvvada" <vpuvvada at in.ibm.com>, "gpfsug main 
discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 05:57 PM
Subject:        Re: [External] : Re:  [gpfsug-discuss] AFM with Object 
Storage - fails with invalid skey       (secret key)


Thanks Venkat for quick response. Unfortunately secret keys are auto 
generated and all of them have = at the end :-(. Is there a way to receive 
a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 
month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart 
This Message Is From an External Sender 
This message came from outside your organization. 
ZjQcmQRYFpfptBannerEnd
Thanks Venkat for quick response. 

Unfortunately secret keys are auto generated and all of them have = at the 
end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the 
next release with such a fix might be available? 


Get Outlook for iOS

From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; 
Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails 
with invalid skey (secret key) 
 
Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help.  
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key.  
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/8467452d/attachment.htm>

From Robert.Oesterlin at nuance.com  Mon Nov 15 18:44:04 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 15 Nov 2021 18:44:04 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? ? fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/ec921e7b/attachment.htm>

From ncalimet at lenovo.com  Mon Nov 15 21:31:03 2021
From: ncalimet at lenovo.com (Nicolas CALIMET)
Date: Mon, 15 Nov 2021 21:31:03 +0000
Subject: [gpfsug-discuss] [External]  Pmcollector fails to start
In-Reply-To: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
References: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
Message-ID: <SG2PR03MB5165A0716D116D63920D6BAAB1989@SG2PR03MB5165.apcprd03.prod.outlook.com>

Hi,

I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic).

In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster:

mmchnode --noperfmon -N all   # required before deleting the perfmon config
mmperfmon config delete --all
mmperfmon config generate --collectors <GUINODES>  # start the pmcollector service on the GUI nodes
mmchnode --perfmon -N all  # start the pmsensors service on all nodes

It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies.

HTH

--
Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Monday, November 15, 2021 19:44
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [External] [gpfsug-discuss] Pmcollector fails to start

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? - fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/425cea19/attachment.htm>

From heinrich.billich at id.ethz.ch  Tue Nov 16 16:44:21 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 16 Nov 2021 16:44:21 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
	<OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>

Hello Olaf,

Thank you,  you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory  /tmp/mmfs  - or even create it ? but to clean it?s content .

Best regards,

Heiner

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?

Hallo Heiner,

multiple levels of answers..

(1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]#


(2nd) I think - the cleaning of /tmp is something done by the OS -

please check -

systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]#


hope this helps -
cheers


Mit freundlichen Gr??en / Kind regards

Olaf Weiser

IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940


----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211116/575ae39e/attachment.htm>

From scale at us.ibm.com  Thu Nov 18 09:09:25 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 18 Nov 2021 17:09:25 +0800
Subject: [gpfsug-discuss] possible to rename a snapshot?
In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
References: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
Message-ID: <OF527EA0B8.1AD6361D-ON85258791.0032119E-48258791.00324CD2@ibm.com>

Mark,

GPFS does not support to rename an existing snapshot.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   mark.bergman at uphs.upenn.edu
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   2021/11/05 05:33 AM
Subject:        [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Does anyone know if it is possible to rename an existing snapshot under 
GPFS 5.0.5.7?

Thanks,

Mark
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/388fc64f/attachment.htm>

From HAUBRICH at de.ibm.com  Thu Nov 18 13:01:39 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Thu, 18 Nov 2021 15:01:39 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>


status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with
chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package
install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment.gif>

From Robert.Oesterlin at nuance.com  Thu Nov 18 13:53:47 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 18 Nov 2021 13:53:47 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
In-Reply-To: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
References: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
Message-ID: <CY4PR05MB30457CF9B9FEC21D1D31A0F3E49B9@CY4PR05MB3045.namprd05.prod.outlook.com>

That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size.  chown?ing that to scalepm.scalepm fixed it.

Now, creating a user ?scalepm? on the sly and not telling me ? not good!


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Manfred Haubrich <HAUBRICH at de.ibm.com>
Date: Thursday, November 18, 2021 at 7:01 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start
CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________

status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development

________________________________

Phone:
+49 162 4159 706
 IBM Deutschland Research & Development GmbH

Email:
haubrich at de.ibm.com
 Wilhelm-Fay-Str. 34


 65936 Frankfurt am Main
________________________________
IBM Data Privacy Statement<https://urldefense.com/v3/__https:/www.ibm.com/privacy/us/en/__;!!L7QdHkQ!39WC0jPgMUXDyGc_9sRYDW0Sob7QcF8ndlH6HvaOmh1WbKIbqC3i_KphHMdD9FhsUKuk$>
IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 49 bytes
Desc: ecblank.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment.gif>

From HAUBRICH at de.ibm.com  Fri Nov 19 09:00:49 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Fri, 19 Nov 2021 11:00:49 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFCAF111A4.CD99C999-ONC1258792.0030CE1B-C1258792.003183BD@ibm.com>


Sorry for that difficulty, but the new user for the performance monitoring
tool was mentioned in the 5.1.1 summary of changes
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment.gif>

From PSAFRE at de.ibm.com  Fri Nov 19 13:49:11 2021
From: PSAFRE at de.ibm.com (Pavel Safre)
Date: Fri, 19 Nov 2021 15:49:11 +0200
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch><OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
	<4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
Message-ID: <OF2A174552.AA2A9E3F-ONC1258792.004A3DCE-C1258792.004BEB45@ibm.com>

Hello Heiner,

just a heads up for you and the other storage admins, regularly cleaning 
up /tmp, regarding one aspect to keep in mind:

- If you are using Spectrum Scale software call home (mmcallhome), it 
would be using the directory ${dataStructureDump}/callhome to save the 
copies of the uploaded data.
        This would be /tmp/mmfs/callhome/ in your case, which you would be 
automatically regularly removing.
- These copies are used by one of the features of call home: "mmcallhome 
status diff"
        - This feature allows to see an overview of the Spectrum Scale 
configuration changes, that occurred between 2 different points in time.
        - This effectively allows to quickly find out if any config 
changes occurred prior to an outage, thereby helping to find the root 
cause of self-caused problems in the Scale cluster.
        - It was added in Scale 5.0.5.0
        See IBM KC for more details: 
https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command

        - As a source of the "config snapshots", mmcallhome status diff is 
using the DC packages inside of ${dataStructureDump}/callhome, which you 
would be regularly deleting, thereby hugely reducing the usability of this 
particular feature.
- Of course, software call home automatically makes sure, it will not use 
too much space in dataStructureDump and it automatically removes the 
oldest entries, keeping at most 2GB or 300 files inside (default values, 
configurable).

Mit freundlichen Gr??en / Kind regards

Pavel Safre

Software Engineer
IBM Systems Group, IBM Spectrum Scale Development
Dept. M925


Phone:

 IBM Deutschland Research & Development GmbH

Email:
psafre at de.ibm.com
 Wilhelm-Fay-Stra?e 32


 65936 Frankfurt am Main

IBM Data Privacy Statement 
IBM Deutschland Research & Development GmbH / Vorsitzender des 
Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294 


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   16.11.2021 17:44
Subject:        [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes 
randomly?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Olaf,
 
Thank you,  you are right. I was ignorant about the systemd-tmpfiles* 
services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least 
not on our nodes. I consider to modify the configuration a bit to keep the 
directory  /tmp/mmfs  - or even create it ? but to clean it?s content .
 
Best regards,
 
Heiner
 
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser 
<olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
 
Hallo Heiner,
 
multiple levels of answers..
 
(1st) ... it the directory is not there, the gpfs trace would create it 
automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]# 

 
(2nd) I think - the cleaning of /tmp is something done by the OS -
please check - 
systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf 
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published 
by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]# 
 
 
hope this helps -
cheers
 
 
Mit freundlichen Gr??en / Kind regards
 
Olaf Weiser
 
IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940
 
 
----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35
 
Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice 
that this directory randomly vanishes. Mmhealth does not complain but just 
notes that it will no longer monitor the directory. Still I doubt that 
trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause 
/tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM 
installation, too. It happens just on one or two nodes at a time, it's no 
cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 
6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path 
/tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment.gif>

From novosirj at rutgers.edu  Fri Nov 19 16:46:34 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Fri, 19 Nov 2021 16:46:34 +0000
Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
In-Reply-To: <OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
References: <OF67F2CFA5.B278755D-ON002582F2.002CFDE9-C12582F2.0030AC8F@notes.na.collabserv.com>
	<CAAxuGpE4fM+X1Gwc-OySE+ZWc_jygx5TACusAP4rO0BWNgtaKA@mail.gmail.com>
	<OF237C2868.64DA65AF-ON002582F2.003784F8-002582F2.0038210B@notes.na.collabserv.com>
	<OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu>

Has any progress been made here at all?

I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create.

What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data.

Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Aug 23, 2018, at 7:50 AM, Markus Rohwedder <rohwedder at de.ibm.com> wrote:
> 
> Hello Juri, Keith,
> 
> thank you for your responses.
> 
> The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level.
> 
> Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node.
> However we have not set up, tested nor documented such a configuration as of today. 
> 
> Currently the GUI service manages the iptables redirect bring up and tear down.
> If this would be managed externally it would be possible to bind services to specific ports based on specific IPs.
> 
> In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to 
> - not check for already used ports when the GUI service tries to start up
> - don't create/destroy port forwarding rules during GUI service start and stop.
> This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level.
> 
> The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade.
> 
> See this KC link:
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>"Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case
> 
> From:  "Daniel Kidger" <daniel.kidger at uk.ibm.com>
> To:  gpfsug-discuss at spectrumscale.org
> Cc:  gpfsug-discuss at spectrumscale.org
> Date:  23.08.2018 12:13
> Subject:  Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Keith,
> 
> I have another IBM customer who also wished to move Scale GUI's https ports.
> In their case because they had their own web based management interface on the same https port.
> Is this the same reason that you have?
> If so I wonder how many other sites have the same issue?
> 
> One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0').
> Then run the two different GUIs, one per IP address.
> Is this an option, albeit a little ugly?
> Daniel
> 
> <17310450.gif>				Dr Daniel Kidger
> IBM Technical Sales Specialist
> Software Defined Solution Sales
> 
> +44-(0)7818 522 266 
> daniel.kidger at uk.ibm.com
> 
> 
> 
> ----- Original message -----
> From: "Markus Rohwedder" <rohwedder at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Date: Thu, Aug 23, 2018 9:51 AM
> Hello Keith,
> 
> it is not so easy.
> 
> The GUI receives events from other scale components using the currently defined ports.
> Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES).
> Therefore at this point there is no procedure to change this behaviour across all components.
> 
> Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080.
> Tweaking the ports in the server.xml file will only change the native ports that the GUI uses.
> The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. 
> If these ports are already used by another service, the GUI will not start up.
> 
> Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap.
> If you want to emphasize your case as future development item, please let me know.
> 
> I would also be interested in:
> > Scale version you are running
> > Do you need port 80 or 443 as well?
> > Would it work for you if the xCAT service was bound to a single IP address?
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> 
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI?
> 
> From: Keith Ball <bipcuds at gmail.com>
> To: gpfsug-discuss at spectrumscale.org
> Date: 22.08.2018 21:33
> Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Hello All,
> 
> Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml:
> 
> <httpEndpoint id="defaultHttpEndpoint" host="*" httpPort="47080" httpsPort="47443">
> <tcpOptions soReuseAddr="true"/>
> </httpEndpoint>
> 
> but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can.
> 
> Many Thanks,
> Keith
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From heinrich.billich at id.ethz.ch  Tue Nov 23 17:59:12 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 23 Nov 2021 17:59:12 +0000
Subject: [gpfsug-discuss] AFM does too small NFS writes,
	and I don't see parallel writes
Message-ID: <D16766FC-4C3A-4088-80DD-660DE7D1506C@id.ethz.ch>

Hello,

 
We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight:

 
The migration to afm home seems slow. I note:

 
Afm writes a whole file of ~100MB in much too many small chunks 
 

My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time).

 
I the afm queue I see

 
8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862

 
I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091.

 
So if I?m right my question is: 

 
What can I change to make afm ?write less and larger chunks per file? 

Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently?

 
Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ?

 
We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB.

We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? 

A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap.

 
Kind regards,

 
Heiner

 
Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files

 
------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total-

 read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send

?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M

?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M

?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M

?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M

?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5254 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment.bin>

From scl at virginia.edu  Tue Nov 30 12:47:46 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Tue, 30 Nov 2021 12:47:46 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>

Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt

-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt    2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt    1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt        202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp

The core.*.dmp files are cores from the java command.

And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.

Any suggestions? Thanks for any help.


2021-11-30_07:25:09.944-0500: [W] ET_gui          Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui          state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4  received command: 'thresholds  refresh  collectors  4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  reload collectors                                 
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  read_collectors                                   
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4  _activate_rules_scheduler completed               
2021-11-30_07:25:10.147-0500: [I] ET_gui          Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui          StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui          Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui          Monitor: LocalState:FAILED Events:607 Entities:0 RT:  0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:04.553-0500: [D] ET_perfmon      File collectors has no newer version than 4021693  - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:12.077-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds  refresh  collectors  4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors                                 
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors                                   
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed               
2021-11-30_07:25:15.528-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:12.076-0500: [D] ET_perfmon      File collectors has no newer version than 4021694  - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:15.626-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:16.594-0500: [I] ClientThread-3  received command: 'thresholds  refresh  collectors  4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  reload collectors                                 
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  read_collectors                                   
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:15.625-0500: [D] ET_perfmon      File collectors has no newer version than 4021695  - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:19.881-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:21.238-0500: [I] ClientThread-7  received command: 'thresholds  refresh  collectors  4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  reload collectors                                 
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  read_collectors                                   
2021-11-30_07:25:21.324-0500: [W] NMES            monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold    Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold    Monitor: LocalState:HEALTHY Events:1 Entities:1 RT:  0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:19.881-0500: [D] ET_perfmon      File collectors has no newer version than 4021696  - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:24.314-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:24.543-0500: [I] ET_gui          ServiceMonitor => out=Type=notify

And then gpfsgui apparently crashes and systemd automatically restarts it.


Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From luis.bolinches at fi.ibm.com  Tue Nov 30 13:30:06 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Tue, 30 Nov 2021 13:30:06 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/d13e7239/attachment.htm>

From olaf.weiser at de.ibm.com  Tue Nov 30 13:34:17 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Nov 2021 13:34:17 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>
References: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>,
	<37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF2CFFE104.2A8DF5B5-ON0025879D.004A8B48-0025879D.004A8CF2@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/45f2587f/attachment.htm>

From s.j.thompson at bham.ac.uk  Mon Nov  1 14:50:54 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Nov 2021 14:50:54 +0000
Subject: [gpfsug-discuss] SSUG UK User Group
Message-ID: <CWXP265MB12224FDDECF959C89FD6B946E58A9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK.

I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards.

If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community?

I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment.

Hopefully there?s someone interested in taking the group forwards in the future ?

Simon
UK Group Chair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211101/a40b50b4/attachment-0001.htm>

From s.j.thompson at bham.ac.uk  Tue Nov  2 14:02:10 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Nov 2021 14:02:10 +0000
Subject: [gpfsug-discuss] Upcoming Events
Message-ID: <CWXP265MB1222681963FB53720DE70E2BE58B9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person!

IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON)
IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI.

See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/
Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html

SSUG::Digital (1st, 2nd December 2021, VIRTUAL)
For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners.
See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/
Joining link: To be confirmed

SSUG @CIUK 2021 (10th December 2021, IN PERSON)
This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ?
See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/
As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group.
CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx

Thanks

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211102/afa5f990/attachment-0001.htm>

From mark.bergman at uphs.upenn.edu  Thu Nov  4 21:17:33 2021
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Thu, 04 Nov 2021 17:17:33 -0400
Subject: [gpfsug-discuss] possible to rename a snapshot?
Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE>

Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7?

Thanks,

Mark


From heinrich.billich at id.ethz.ch  Mon Nov  8 09:20:24 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 8 Nov 2021 09:20:24 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
From olaf.weiser at de.ibm.com  Mon Nov  8 09:53:04 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 8 Nov 2021 09:53:04 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/1d32c09e/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Mon Nov  8 09:54:18 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Nov 2021 09:54:18 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2@strath.ac.uk>

On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:

> Hello,
> 
> We use /tmp/mmfs as dataStructureDump directory. Since a while I
> notice that this directory randomly vanishes. Mmhealth does not
> complain but just notes that it will no longer monitor the directory.
> Still I doubt that trace collection and similar will create the
> directory when needed?
> 
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time,
> it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> 6.0.2.2 and 6.0.2.2.
> 

I know several Linux distributions clear the contents of /tmp at boot 
time. Could that explain it?

I would say using /tmp like you are doing is not a sensible idea anyway 
and that you should be using something under /var.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From lior at nyu.edu  Mon Nov  8 14:38:35 2021
From: lior at nyu.edu (Lior Atar)
Date: Mon, 8 Nov 2021 09:38:35 -0500
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4
In-Reply-To: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAAzOg0orG4nkxev+0LRDwxRtGADnU7Nsv9q+Aw=3cU21LitVcA@mail.gmail.com>

Hello all,

/tmp/mmfs is being deleted every 10 days by a systemd service "
systemd-tmpfiles-setup.service
". That service calls a configuration file "  /usr/lib/tmpfiles.d/tmp.conf
. What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then
create the directory /tmp/mmfs and then exclude deleting going forward.
Here's our actual file and some commentary of what the options mean:

# cat /etc/tmpfiles.d/tmp.conf
# Create a /tmp/mmfs directory
d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory
x /tmp/mmfs/*                 <-------- the " x " says to ignore it

That change helped us avoid /tmp/mmfs from being deleted every 10 days.

In addition I think also did a %systemctl daemon-reload ( but I don't have
it in my notes, wouldn't hurt to run it )

Hope this helps,
Lior

On Mon, Nov 8, 2021 at 7:00 AM <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. /tmp/mmfs vanishes randomly? (Billich  Heinrich Rainer (ID SD))
>    2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser)
>    3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 8 Nov 2021 09:20:24 +0000
> From: "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> We use /tmp/mmfs as dataStructureDump directory. Since a while I notice
> that this directory randomly vanishes. Mmhealth does not complain but just
> notes that it will no longer monitor the directory. Still I doubt that
> trace collection and similar will create the directory when needed?
>
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time, it's no
> cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and
> 6.0.2.2.
>
> Thank you,
>
> Mmhealth message:
> local_fs_path_not_found   INFO       The configured dataStructureDump path
> /tmp/mmfs does not exists. Skipping monitoring.
>
> Kind regards,
>
> Heiner
> ---
> =======================
> Heinrich Billich
> ETH Z?rich
> Informatikdienste
> Tel.: +41 44 632 72 56
> heinrich.billich at id.ethz.ch
> ========================
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 8 Nov 2021 09:53:04 +0000
> From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
> To: gpfsug-discuss at spectrumscale.org
> Cc: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID:
>         <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25 at ibm.com
> >
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL: <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e=
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 8 Nov 2021 09:54:18 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:
>
> > Hello,
> >
> > We use /tmp/mmfs as dataStructureDump directory. Since a while I
> > notice that this directory randomly vanishes. Mmhealth does not
> > complain but just notes that it will no longer monitor the directory.
> > Still I doubt that trace collection and similar will create the
> > directory when needed?
> >
> > Do you know of any spectrum scale internal mechanism that could cause
> > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> > installation, too. It happens just on one or two nodes at a time,
> > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> > 6.0.2.2 and 6.0.2.2.
> >
>
> I know several Linux distributions clear the contents of /tmp at boot
> time. Could that explain it?
>
> I would say using /tmp like you are doing is not a sensible idea anyway
> and that you should be using something under /var.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
>
>
> End of gpfsug-discuss Digest, Vol 118, Issue 4
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/18f9990e/attachment-0001.htm>

From l.r.sudbery at bham.ac.uk  Tue Nov  9 16:55:36 2021
From: l.r.sudbery at bham.ac.uk (Luke Sudbery)
Date: Tue, 9 Nov 2021 16:55:36 +0000
Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0?
Message-ID: <LO2P265MB0704E08CD27D3538B6FB111B90929@LO2P265MB0704.GBRP265.PROD.OUTLOOK.COM>

mmbuildgpl in 5.1.2.0 has build me a package with the filename:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm

Before it would have been:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm

The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.

Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one...

Many thanks,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don't work on Monday.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211109/3e57d8a0/attachment-0001.htm>

From frederik.ferner at diamond.ac.uk  Wed Nov 10 10:28:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 10 Nov 2021 10:28:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
Message-ID: <YYuewK2QIGo/VR23@diamond.ac.uk>

Hi Ragu,

have you ever received any reply to this or managed to solve it? We are
seeing exactly the same error and it's filling up our logs. It seems all
the monitoring data is still extracted, so I'm not sure when it
started so not sure if this is related to any upgrade on our side, but
it may have been going on for a while. We only noticed because the log
file now is filling up the local log partition.

Kind regards,
Frederik

On 26/08/2021 11:49, Ragho Mahalingam wrote:
> We've been working on setting up mmperfmon; after creating a new
> configuration with the new collector on the same manager node, mmsysmon
> keeps throwing exceptions.
> 
>   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> 123, in _getDataFromZimonSocket
>     sock.connect(SOCKET_PATH)
> FileNotFoundError: [Errno 2] No such file or directory
> 
> Tracing this a bit, it appears that SOCKET_PATH is
>  /var/run/perfmon/pmcollector.socket and this unix domain socket is absent,
> even though pmcollector has started and is running successfully.
> 
> Under what scenarios is pmcollector supposed to create this socket?  I
> don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg,
> so I'm assuming the socket is automatically created when pmcollector starts.
> 
> Any thoughts on how to debug and resolve this?
> 
> Thanks, Ragu

-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From ragho.mahalingam+spectrumscaleug at pathai.com  Wed Nov 10 14:00:19 2021
From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam)
Date: Wed, 10 Nov 2021 09:00:19 -0500
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
	being absent
In-Reply-To: <YYuewK2QIGo/VR23@diamond.ac.uk>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>

Hi Frederick,

In our case the issue started appearing after upgrading from 5.0.4 to
5.1.1.  If you've recently upgraded, then the following may be useful.

Turns out that mmsysmon (gpfs-base package) requires the new
gpfs.gss.pmcollector (from zimon packages) to function correctly (the
AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
our case, we'd upgraded all the mandatory packages but had not upgraded the
optional ones; the mmsysmonc python libs appears to be updated by the
pmcollector package from my study.

If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
definitely need that to make this runaway logging stop.

Hope that helps!

Ragu

On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
frederik.ferner at diamond.ac.uk> wrote:

> Hi Ragu,
>
> have you ever received any reply to this or managed to solve it? We are
> seeing exactly the same error and it's filling up our logs. It seems all
> the monitoring data is still extracted, so I'm not sure when it
> started so not sure if this is related to any upgrade on our side, but
> it may have been going on for a while. We only noticed because the log
> file now is filling up the local log partition.
>
> Kind regards,
> Frederik
>
> On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > We've been working on setting up mmperfmon; after creating a new
> > configuration with the new collector on the same manager node, mmsysmon
> > keeps throwing exceptions.
> >
> >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > 123, in _getDataFromZimonSocket
> >     sock.connect(SOCKET_PATH)
> > FileNotFoundError: [Errno 2] No such file or directory
> >
> > Tracing this a bit, it appears that SOCKET_PATH is
> >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> absent,
> > even though pmcollector has started and is running successfully.
> >
> > Under what scenarios is pmcollector supposed to create this socket?  I
> > don't see any configuration for this in
> /opt/IBM/zimon/ZIMonCollector.cfg,
> > so I'm assuming the socket is automatically created when pmcollector
> starts.
> >
> > Any thoughts on how to debug and resolve this?
> >
> > Thanks, Ragu
>
> --
> Frederik Ferner (he/him)
> Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
>
> SciComp Help Desk can be reached on x8596
>
>
> (Apologies in advance for the lines below. Some bits are a legal
> requirement and I have no control over them.)
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
*Disclaimer: This email and any corresponding attachments may contain 
confidential information. If you're not the intended recipient, any 
copying, distribution, disclosure, or use of any information contained in 
the email or its attachments is strictly prohibited. If you believe to have 
received this email in error, please email security at pathai.com 
<mailto:security at pathai.com> immediately, then destroy the email and any 
attachments without reading or saving.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/016a38b0/attachment-0001.htm>

From stockf at us.ibm.com  Wed Nov 10 14:14:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Wed, 10 Nov 2021 14:14:47 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?mmsysmon_exception_with_pmcollector_so?=
 =?utf-8?q?cket=09being_absent?=
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>,
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com><YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/7d72b727/attachment-0001.htm>

From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:38:56 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:38:56 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
Message-ID: <YY0c8HUC2Pc9kOwA@diamond.ac.uk>

Hi Ragu,

many thanks for the response. That was indeed the problem. We missed it
when we upgraded a while ago and because our normal monitoring continued
to work, we didn't notice until now.

Kind regards,
Frederik

On 10/11/2021 09:00, Ragho Mahalingam wrote:
> Hi Frederick,
> 
> In our case the issue started appearing after upgrading from 5.0.4 to
> 5.1.1.  If you've recently upgraded, then the following may be useful.
> 
> Turns out that mmsysmon (gpfs-base package) requires the new
> gpfs.gss.pmcollector (from zimon packages) to function correctly (the
> AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
> our case, we'd upgraded all the mandatory packages but had not upgraded the
> optional ones; the mmsysmonc python libs appears to be updated by the
> pmcollector package from my study.
> 
> If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
> packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
> definitely need that to make this runaway logging stop.
> 
> Hope that helps!
> 
> Ragu
> 
> On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
> frederik.ferner at diamond.ac.uk> wrote:
> 
> > Hi Ragu,
> >
> > have you ever received any reply to this or managed to solve it? We are
> > seeing exactly the same error and it's filling up our logs. It seems all
> > the monitoring data is still extracted, so I'm not sure when it
> > started so not sure if this is related to any upgrade on our side, but
> > it may have been going on for a while. We only noticed because the log
> > file now is filling up the local log partition.
> >
> > Kind regards,
> > Frederik
> >
> > On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > > We've been working on setting up mmperfmon; after creating a new
> > > configuration with the new collector on the same manager node, mmsysmon
> > > keeps throwing exceptions.
> > >
> > >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > > 123, in _getDataFromZimonSocket
> > >     sock.connect(SOCKET_PATH)
> > > FileNotFoundError: [Errno 2] No such file or directory
> > >
> > > Tracing this a bit, it appears that SOCKET_PATH is
> > >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> > absent,
> > > even though pmcollector has started and is running successfully.
> > >
> > > Under what scenarios is pmcollector supposed to create this socket?  I
> > > don't see any configuration for this in
> > /opt/IBM/zimon/ZIMonCollector.cfg,
> > > so I'm assuming the socket is automatically created when pmcollector
> > starts.
> > >
> > > Any thoughts on how to debug and resolve this?
> > >
> > > Thanks, Ragu
> >
> > --
> > Frederik Ferner (he/him)
> > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> > Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
> >
> > SciComp Help Desk can be reached on x8596
> >
> >
> > (Apologies in advance for the lines below. Some bits are a legal
> > requirement and I have no control over them.)
> >
> > --
> > This e-mail and any attachments may contain confidential, copyright and or
> > privileged material, and are for the use of the intended addressee only. If
> > you are not the intended addressee or an authorised recipient of the
> > addressee please notify us of receipt by returning the e-mail and do not
> > use, copy, retain, distribute or disclose the information in or attached to
> > the e-mail.
> > Any opinions expressed within this e-mail are those of the individual and
> > not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > attachments are free from viruses and we cannot accept liability for any
> > damage which you may sustain as a result of software viruses which may be
> > transmitted in or with the message.
> > Diamond Light Source Limited (company no. 4375679). Registered in England
> > and Wales with its registered office at Diamond House, Harwell Science and
> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> 
> -- 
> *Disclaimer: This email and any corresponding attachments may contain 
> confidential information. If you're not the intended recipient, any 
> copying, distribution, disclosure, or use of any information contained in 
> the email or its attachments is strictly prohibited. If you believe to have 
> received this email in error, please email security at pathai.com 
> <mailto:security at pathai.com> immediately, then destroy the email and any 
> attachments without reading or saving.*

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:45:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:45:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector
 socket?being absent
In-Reply-To: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
Message-ID: <YY0ebNPdMg5CU/sf@diamond.ac.uk>

Hi Fred,

we haven't used the deployement tool anywhere so far, we always
apply/upgrade the RPMs directly. (Centrally managed via CFengine,
promising that certain Spectrum Scale RPMs are installed. I haven't yet
checked how the gpfs.gss.pmcollector RPM were installed initially as
they weren't in our list of promised packages, which is why the upgrade
was missed.)

Kind regards,
Frederik

On 10/11/2021 14:14, Frederick Stock wrote:
>    I am curious to know if you upgraded by manually applying rpms or if you
>    used the Spectrum Scale deployment tool (spectrumscale command) to apply
>    the upgrade?
>    Fred
>    _______________________________________________________
>    Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
>    stockf at us.ibm.com
>    ?
>    ?
> 
>      ----- Original message -----
>      From: "Ragho Mahalingam" <ragho.mahalingam+spectrumscaleug at pathai.com>
>      Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      To: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
>      Cc:
>      Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with
>      pmcollector socket being absent
>      Date: Wed, Nov 10, 2021 9:00 AM
>      ?
>      Hi Frederick,
> 
>      In our case the issue started appearing after upgrading from 5.0.4 to
>      5.1.1.? If you've recently upgraded, then the following may be useful.
> 
>      Turns out that mmsysmon (gpfs-base package) requires the new
>      gpfs.gss.pmcollector (from zimon packages) to function correctly (the
>      AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).?
>      In our case, we'd upgraded all the mandatory packages but had
>      not?upgraded the optional ones; the mmsysmonc?python libs appears to be
>      updated by the pmcollector package from my study.
>      ?
>      If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
>      packages installed.? If gpfs.gss.pmcollector isn't installed, you'd
>      definitely need that to make this runaway logging stop.
>      ?
>      Hope that helps!
>      ?
>      Ragu
>      ?
>      On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner
>      <[1]frederik.ferner at diamond.ac.uk> wrote:
> 
>        Hi Ragu,
> 
>        have you ever received any reply to this or managed to solve it? We
>        are
>        seeing exactly the same error and it's filling up our logs. It seems
>        all
>        the monitoring data is still extracted, so I'm not sure when it
>        started so not sure if this is related to any upgrade on our side, but
>        it may have been going on for a while. We only noticed because the log
>        file now is filling up the local log partition.
> 
>        Kind regards,
>        Frederik
> 
>        On 26/08/2021 11:49, Ragho Mahalingam wrote:
>        > We've been working on setting up mmperfmon; after creating a new
>        > configuration with the new collector on the same manager node,
>        mmsysmon
>        > keeps throwing exceptions.
>        >
>        >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py",
>        line
>        > 123, in _getDataFromZimonSocket
>        >? ? ?sock.connect(SOCKET_PATH)
>        > FileNotFoundError: [Errno 2] No such file or directory
>        >
>        > Tracing this a bit, it appears that SOCKET_PATH is
>        >? /var/run/perfmon/pmcollector.socket and this unix domain socket is
>        absent,
>        > even though pmcollector has started and is running successfully.
>        >
>        > Under what scenarios is pmcollector supposed to create this socket??
>        I
>        > don't see any configuration for this in
>        /opt/IBM/zimon/ZIMonCollector.cfg,
>        > so I'm assuming the socket is automatically created when pmcollector
>        starts.
>        >
>        > Any thoughts on how to debug and resolve this?
>        >
>        > Thanks, Ragu
> 
>        --
>        Frederik Ferner (he/him)
>        Senior Computer Systems Administrator (storage) phone: +44 1235 77
>        8624
>        Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08
>        5110
> 
>        SciComp Help Desk can be reached on x8596
> 
>        (Apologies in advance for the lines below. Some bits are a legal
>        requirement and I have no control over them.)
> 
>        --
>        This e-mail and any attachments may contain confidential, copyright
>        and or privileged material, and are for the use of the intended
>        addressee only. If you are not the intended addressee or an authorised
>        recipient of the addressee please notify us of receipt by returning
>        the e-mail and do not use, copy, retain, distribute or disclose the
>        information in or attached to the e-mail.
>        Any opinions expressed within this e-mail are those of the individual
>        and not necessarily of Diamond Light Source Ltd.
>        Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>        attachments are free from viruses and we cannot accept liability for
>        any damage which you may sustain as a result of software viruses which
>        may be transmitted in or with the message.
>        Diamond Light Source Limited (company no. 4375679). Registered in
>        England and Wales with its registered office at Diamond House, Harwell
>        Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
>        Kingdom
>        _______________________________________________
>        gpfsug-discuss mailing list
>        gpfsug-discuss at [2]spectrumscale.org
>        [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>      Disclaimer: This email and any corresponding attachments may contain
>      confidential information. If you're not the intended recipient, any
>      copying, distribution, disclosure, or use of any information contained
>      in the email or its attachments is strictly prohibited. If you believe
>      to have received this email in error, please email
>      [4]security at pathai.com immediately, then destroy the email and any
>      attachments without reading or saving.
>      _______________________________________________
>      gpfsug-discuss mailing list
>      gpfsug-discuss at spectrumscale.org
>      [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss?
> 
>    ?
> 
> References
> 
>    Visible links
>    1. mailto:frederik.ferner at diamond.ac.uk
>    2. http://spectrumscale.org/
>    3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    4. mailto:security at pathai.com
>    5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From pinkesh.valdria at oracle.com  Fri Nov 12 07:57:14 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 07:57:14 +0000
Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey
	(secret key)
Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>

Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>  get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/540b022f/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 11:54:38 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 17:24:38 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?=
 =?utf-8?q?nvalid_skey=09=28secret_key=29?=
In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
Message-ID: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>

Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help. 
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key. 
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
 get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/6aa45d39/attachment-0001.htm>

From pinkesh.valdria at oracle.com  Fri Nov 12 12:26:44 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 12:26:44 +0000
Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage -
 fails with invalid skey	(secret key)
In-Reply-To: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
Message-ID: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>

Thanks Venkat for quick response.

Unfortunately secret keys are auto generated and all of them have = at the end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available?


Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key)

Hi,

AFM does not accept character '='  as part of  access and secret keys. It matches the keys with below expression

"$KEY" =~ ^[0-9a-zA-Z/+._]+$

We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!ACWV5N9M2RV99hQ!YKxmZ34lMfepVIlU8m6Srvcc6xP9cbgAPBc7Eqy31T2KQHRIvlAPQtM62TeOLsQdhpi-$>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/da4ee860/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 12:50:48 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 18:20:48 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?=
 =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?=
In-Reply-To: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
	<MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
Message-ID: <OF95EE4921.A4DD587C-ON0025878B.0045D2C4-6525878B.0046917F@ibm.com>

Hi Pinkesh,

You could open a ticket to get the efix.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "Venkateswara R Puvvada" <vpuvvada at in.ibm.com>, "gpfsug main 
discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 05:57 PM
Subject:        Re: [External] : Re:  [gpfsug-discuss] AFM with Object 
Storage - fails with invalid skey       (secret key)


Thanks Venkat for quick response. Unfortunately secret keys are auto 
generated and all of them have = at the end :-(. Is there a way to receive 
a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 
month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart 
This Message Is From an External Sender 
This message came from outside your organization. 
ZjQcmQRYFpfptBannerEnd
Thanks Venkat for quick response. 

Unfortunately secret keys are auto generated and all of them have = at the 
end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the 
next release with such a fix might be available? 


Get Outlook for iOS

From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; 
Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails 
with invalid skey (secret key) 
 
Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help.  
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key.  
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/8467452d/attachment-0001.htm>

From Robert.Oesterlin at nuance.com  Mon Nov 15 18:44:04 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 15 Nov 2021 18:44:04 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? ? fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/ec921e7b/attachment-0001.htm>

From ncalimet at lenovo.com  Mon Nov 15 21:31:03 2021
From: ncalimet at lenovo.com (Nicolas CALIMET)
Date: Mon, 15 Nov 2021 21:31:03 +0000
Subject: [gpfsug-discuss] [External]  Pmcollector fails to start
In-Reply-To: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
References: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
Message-ID: <SG2PR03MB5165A0716D116D63920D6BAAB1989@SG2PR03MB5165.apcprd03.prod.outlook.com>

Hi,

I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic).

In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster:

mmchnode --noperfmon -N all   # required before deleting the perfmon config
mmperfmon config delete --all
mmperfmon config generate --collectors <GUINODES>  # start the pmcollector service on the GUI nodes
mmchnode --perfmon -N all  # start the pmsensors service on all nodes

It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies.

HTH

--
Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Monday, November 15, 2021 19:44
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [External] [gpfsug-discuss] Pmcollector fails to start

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? - fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/425cea19/attachment-0001.htm>

From heinrich.billich at id.ethz.ch  Tue Nov 16 16:44:21 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 16 Nov 2021 16:44:21 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
	<OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>

Hello Olaf,

Thank you,  you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory  /tmp/mmfs  - or even create it ? but to clean it?s content .

Best regards,

Heiner

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?

Hallo Heiner,

multiple levels of answers..

(1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]#


(2nd) I think - the cleaning of /tmp is something done by the OS -

please check -

systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]#


hope this helps -
cheers


Mit freundlichen Gr??en / Kind regards

Olaf Weiser

IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940


----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211116/575ae39e/attachment-0001.htm>

From scale at us.ibm.com  Thu Nov 18 09:09:25 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 18 Nov 2021 17:09:25 +0800
Subject: [gpfsug-discuss] possible to rename a snapshot?
In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
References: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
Message-ID: <OF527EA0B8.1AD6361D-ON85258791.0032119E-48258791.00324CD2@ibm.com>

Mark,

GPFS does not support to rename an existing snapshot.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   mark.bergman at uphs.upenn.edu
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   2021/11/05 05:33 AM
Subject:        [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Does anyone know if it is possible to rename an existing snapshot under 
GPFS 5.0.5.7?

Thanks,

Mark
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/388fc64f/attachment-0001.htm>

From HAUBRICH at de.ibm.com  Thu Nov 18 13:01:39 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Thu, 18 Nov 2021 15:01:39 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>


status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with
chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package
install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment-0001.gif>

From Robert.Oesterlin at nuance.com  Thu Nov 18 13:53:47 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 18 Nov 2021 13:53:47 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
In-Reply-To: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
References: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
Message-ID: <CY4PR05MB30457CF9B9FEC21D1D31A0F3E49B9@CY4PR05MB3045.namprd05.prod.outlook.com>

That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size.  chown?ing that to scalepm.scalepm fixed it.

Now, creating a user ?scalepm? on the sly and not telling me ? not good!


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Manfred Haubrich <HAUBRICH at de.ibm.com>
Date: Thursday, November 18, 2021 at 7:01 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start
CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________

status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development

________________________________

Phone:
+49 162 4159 706
 IBM Deutschland Research & Development GmbH

Email:
haubrich at de.ibm.com
 Wilhelm-Fay-Str. 34


 65936 Frankfurt am Main
________________________________
IBM Data Privacy Statement<https://urldefense.com/v3/__https:/www.ibm.com/privacy/us/en/__;!!L7QdHkQ!39WC0jPgMUXDyGc_9sRYDW0Sob7QcF8ndlH6HvaOmh1WbKIbqC3i_KphHMdD9FhsUKuk$>
IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 49 bytes
Desc: ecblank.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment-0001.gif>

From HAUBRICH at de.ibm.com  Fri Nov 19 09:00:49 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Fri, 19 Nov 2021 11:00:49 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFCAF111A4.CD99C999-ONC1258792.0030CE1B-C1258792.003183BD@ibm.com>


Sorry for that difficulty, but the new user for the performance monitoring
tool was mentioned in the 5.1.1 summary of changes
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment-0001.gif>

From PSAFRE at de.ibm.com  Fri Nov 19 13:49:11 2021
From: PSAFRE at de.ibm.com (Pavel Safre)
Date: Fri, 19 Nov 2021 15:49:11 +0200
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch><OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
	<4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
Message-ID: <OF2A174552.AA2A9E3F-ONC1258792.004A3DCE-C1258792.004BEB45@ibm.com>

Hello Heiner,

just a heads up for you and the other storage admins, regularly cleaning 
up /tmp, regarding one aspect to keep in mind:

- If you are using Spectrum Scale software call home (mmcallhome), it 
would be using the directory ${dataStructureDump}/callhome to save the 
copies of the uploaded data.
        This would be /tmp/mmfs/callhome/ in your case, which you would be 
automatically regularly removing.
- These copies are used by one of the features of call home: "mmcallhome 
status diff"
        - This feature allows to see an overview of the Spectrum Scale 
configuration changes, that occurred between 2 different points in time.
        - This effectively allows to quickly find out if any config 
changes occurred prior to an outage, thereby helping to find the root 
cause of self-caused problems in the Scale cluster.
        - It was added in Scale 5.0.5.0
        See IBM KC for more details: 
https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command

        - As a source of the "config snapshots", mmcallhome status diff is 
using the DC packages inside of ${dataStructureDump}/callhome, which you 
would be regularly deleting, thereby hugely reducing the usability of this 
particular feature.
- Of course, software call home automatically makes sure, it will not use 
too much space in dataStructureDump and it automatically removes the 
oldest entries, keeping at most 2GB or 300 files inside (default values, 
configurable).

Mit freundlichen Gr??en / Kind regards

Pavel Safre

Software Engineer
IBM Systems Group, IBM Spectrum Scale Development
Dept. M925


Phone:

 IBM Deutschland Research & Development GmbH

Email:
psafre at de.ibm.com
 Wilhelm-Fay-Stra?e 32


 65936 Frankfurt am Main

IBM Data Privacy Statement 
IBM Deutschland Research & Development GmbH / Vorsitzender des 
Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294 


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   16.11.2021 17:44
Subject:        [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes 
randomly?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Olaf,
 
Thank you,  you are right. I was ignorant about the systemd-tmpfiles* 
services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least 
not on our nodes. I consider to modify the configuration a bit to keep the 
directory  /tmp/mmfs  - or even create it ? but to clean it?s content .
 
Best regards,
 
Heiner
 
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser 
<olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
 
Hallo Heiner,
 
multiple levels of answers..
 
(1st) ... it the directory is not there, the gpfs trace would create it 
automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]# 

 
(2nd) I think - the cleaning of /tmp is something done by the OS -
please check - 
systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf 
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published 
by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]# 
 
 
hope this helps -
cheers
 
 
Mit freundlichen Gr??en / Kind regards
 
Olaf Weiser
 
IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940
 
 
----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35
 
Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice 
that this directory randomly vanishes. Mmhealth does not complain but just 
notes that it will no longer monitor the directory. Still I doubt that 
trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause 
/tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM 
installation, too. It happens just on one or two nodes at a time, it's no 
cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 
6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path 
/tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment-0001.gif>

From novosirj at rutgers.edu  Fri Nov 19 16:46:34 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Fri, 19 Nov 2021 16:46:34 +0000
Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
In-Reply-To: <OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
References: <OF67F2CFA5.B278755D-ON002582F2.002CFDE9-C12582F2.0030AC8F@notes.na.collabserv.com>
	<CAAxuGpE4fM+X1Gwc-OySE+ZWc_jygx5TACusAP4rO0BWNgtaKA@mail.gmail.com>
	<OF237C2868.64DA65AF-ON002582F2.003784F8-002582F2.0038210B@notes.na.collabserv.com>
	<OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu>

Has any progress been made here at all?

I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create.

What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data.

Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Aug 23, 2018, at 7:50 AM, Markus Rohwedder <rohwedder at de.ibm.com> wrote:
> 
> Hello Juri, Keith,
> 
> thank you for your responses.
> 
> The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level.
> 
> Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node.
> However we have not set up, tested nor documented such a configuration as of today. 
> 
> Currently the GUI service manages the iptables redirect bring up and tear down.
> If this would be managed externally it would be possible to bind services to specific ports based on specific IPs.
> 
> In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to 
> - not check for already used ports when the GUI service tries to start up
> - don't create/destroy port forwarding rules during GUI service start and stop.
> This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level.
> 
> The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade.
> 
> See this KC link:
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>"Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case
> 
> From:  "Daniel Kidger" <daniel.kidger at uk.ibm.com>
> To:  gpfsug-discuss at spectrumscale.org
> Cc:  gpfsug-discuss at spectrumscale.org
> Date:  23.08.2018 12:13
> Subject:  Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Keith,
> 
> I have another IBM customer who also wished to move Scale GUI's https ports.
> In their case because they had their own web based management interface on the same https port.
> Is this the same reason that you have?
> If so I wonder how many other sites have the same issue?
> 
> One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0').
> Then run the two different GUIs, one per IP address.
> Is this an option, albeit a little ugly?
> Daniel
> 
> <17310450.gif>				Dr Daniel Kidger
> IBM Technical Sales Specialist
> Software Defined Solution Sales
> 
> +44-(0)7818 522 266 
> daniel.kidger at uk.ibm.com
> 
> 
> 
> ----- Original message -----
> From: "Markus Rohwedder" <rohwedder at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Date: Thu, Aug 23, 2018 9:51 AM
> Hello Keith,
> 
> it is not so easy.
> 
> The GUI receives events from other scale components using the currently defined ports.
> Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES).
> Therefore at this point there is no procedure to change this behaviour across all components.
> 
> Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080.
> Tweaking the ports in the server.xml file will only change the native ports that the GUI uses.
> The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. 
> If these ports are already used by another service, the GUI will not start up.
> 
> Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap.
> If you want to emphasize your case as future development item, please let me know.
> 
> I would also be interested in:
> > Scale version you are running
> > Do you need port 80 or 443 as well?
> > Would it work for you if the xCAT service was bound to a single IP address?
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> 
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI?
> 
> From: Keith Ball <bipcuds at gmail.com>
> To: gpfsug-discuss at spectrumscale.org
> Date: 22.08.2018 21:33
> Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Hello All,
> 
> Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml:
> 
> <httpEndpoint id="defaultHttpEndpoint" host="*" httpPort="47080" httpsPort="47443">
> <tcpOptions soReuseAddr="true"/>
> </httpEndpoint>
> 
> but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can.
> 
> Many Thanks,
> Keith
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From heinrich.billich at id.ethz.ch  Tue Nov 23 17:59:12 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 23 Nov 2021 17:59:12 +0000
Subject: [gpfsug-discuss] AFM does too small NFS writes,
	and I don't see parallel writes
Message-ID: <D16766FC-4C3A-4088-80DD-660DE7D1506C@id.ethz.ch>

Hello,

 
We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight:

 
The migration to afm home seems slow. I note:

 
Afm writes a whole file of ~100MB in much too many small chunks 
 

My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time).

 
I the afm queue I see

 
8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862

 
I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091.

 
So if I?m right my question is: 

 
What can I change to make afm ?write less and larger chunks per file? 

Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently?

 
Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ?

 
We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB.

We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? 

A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap.

 
Kind regards,

 
Heiner

 
Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files

 
------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total-

 read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send

?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M

?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M

?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M

?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M

?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5254 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment-0001.bin>

From scl at virginia.edu  Tue Nov 30 12:47:46 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Tue, 30 Nov 2021 12:47:46 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>

Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt

-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt    2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt    1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt        202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp

The core.*.dmp files are cores from the java command.

And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.

Any suggestions? Thanks for any help.


2021-11-30_07:25:09.944-0500: [W] ET_gui          Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui          state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4  received command: 'thresholds  refresh  collectors  4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  reload collectors                                 
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  read_collectors                                   
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4  _activate_rules_scheduler completed               
2021-11-30_07:25:10.147-0500: [I] ET_gui          Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui          StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui          Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui          Monitor: LocalState:FAILED Events:607 Entities:0 RT:  0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:04.553-0500: [D] ET_perfmon      File collectors has no newer version than 4021693  - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:12.077-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds  refresh  collectors  4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors                                 
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors                                   
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed               
2021-11-30_07:25:15.528-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:12.076-0500: [D] ET_perfmon      File collectors has no newer version than 4021694  - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:15.626-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:16.594-0500: [I] ClientThread-3  received command: 'thresholds  refresh  collectors  4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  reload collectors                                 
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  read_collectors                                   
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:15.625-0500: [D] ET_perfmon      File collectors has no newer version than 4021695  - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:19.881-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:21.238-0500: [I] ClientThread-7  received command: 'thresholds  refresh  collectors  4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  reload collectors                                 
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  read_collectors                                   
2021-11-30_07:25:21.324-0500: [W] NMES            monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold    Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold    Monitor: LocalState:HEALTHY Events:1 Entities:1 RT:  0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:19.881-0500: [D] ET_perfmon      File collectors has no newer version than 4021696  - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:24.314-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:24.543-0500: [I] ET_gui          ServiceMonitor => out=Type=notify

And then gpfsgui apparently crashes and systemd automatically restarts it.


Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From luis.bolinches at fi.ibm.com  Tue Nov 30 13:30:06 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Tue, 30 Nov 2021 13:30:06 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/d13e7239/attachment-0001.htm>

From olaf.weiser at de.ibm.com  Tue Nov 30 13:34:17 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Nov 2021 13:34:17 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>
References: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>,
	<37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF2CFFE104.2A8DF5B5-ON0025879D.004A8B48-0025879D.004A8CF2@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/45f2587f/attachment-0001.htm>

From s.j.thompson at bham.ac.uk  Mon Nov  1 14:50:54 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Mon, 1 Nov 2021 14:50:54 +0000
Subject: [gpfsug-discuss] SSUG UK User Group
Message-ID: <CWXP265MB12224FDDECF959C89FD6B946E58A9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

I?m planning to take a step-back from running the Spectrum Scale user group in the UK later this year/early next year and this means we need someone (or people) to step up to run the user group in the UK.

I took over running the user group in 2015 and a lot has changed since then ? the group got bigger, we moved to multi-day sessions, a pandemic struck and we moved online ? now as things are maybe returning to normal, I think it is time for someone else to take leadership of the group in the UK and work out how to take it forwards.

If you are interested in taking up running the group in the UK, please drop me an email, or DM on Slack and let me know. It doesn?t necessarily need to be one person running the group, and having several would help with some of the logistics of running the events. To be truly independent, which we have always tried to be, I?ve always thought that the person/people running the group should come from the end-user community?

I?ll likely still be around at events, and happy to provide organisational support if needed ? but I don?t really have the time needed for the group at the moment.

Hopefully there?s someone interested in taking the group forwards in the future ?

Simon
UK Group Chair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211101/a40b50b4/attachment-0002.htm>

From s.j.thompson at bham.ac.uk  Tue Nov  2 14:02:10 2021
From: s.j.thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 2 Nov 2021 14:02:10 +0000
Subject: [gpfsug-discuss] Upcoming Events
Message-ID: <CWXP265MB1222681963FB53720DE70E2BE58B9@CWXP265MB1222.GBRP265.PROD.OUTLOOK.COM>

Hi All,

We thought it would be a good time to send an update on some upcoming events. We have three events coming up over November/December TWO of which are in person!

IBM User?s Group meeting ? SC21 (15th November 2021, IN PERSON)
IBM Spectrum Scale Development and Product Management team will be attending Super Computing 2021 in person. We will be hosting our yearly gathering on Monday, November 15, from 3:00-5:00 PM. This global user meeting provides an opportunity for peer-to-peer learning and interaction with IBM?s technical leadership team on the latest IBM Spectrum Scale roadmaps, latest features, ecosystem, and applications for AI.

See: https://www.spectrumscaleug.org/event/sc21-users-group-meeting/
Register at: https://www.ibm.com/events/event/pages/ibm/nz48hgmb/1581037797007001PJAd.html

SSUG::Digital (1st, 2nd December 2021, VIRTUAL)
For the Spectrum Scale Users who will not be able to attend user meeting at Super Computing in St Louis, or SSUG at CIUK, we plan to host Digital user meeting on Dec 1 & Dec 2 from 10am - 12pm EDT (3pm-5pm GMT). In the Digital user meeting, we will cover some of the contents covered at St Louis and additional expert talks from our development team and partners.
See: https://www.spectrumscaleug.org/event/digital-user-group-dec-2021/
Joining link: To be confirmed

SSUG @CIUK 2021 (10th December 2021, IN PERSON)
This year we will be returning to our traditional user group home of CIUK and will be running a break-out session on the Friday of CIUK (10:00 ? 12:00). We?re currently lining up a few speakers for the event, but if you are attending CIUK in Manchester this year and are interested in speaking, please let me know ? we have a few speaker slots available for user talks. I?m sure it has been soooo long since anyone has had the opportunity to speak, that I?ll be inundated with user talks ? ?
See: https://www.spectrumscaleug.org/event/ssug-ciuk-2021/
As usual with the CIUK meeting, you must be a registered attendee of CIUK to attend this user group.
CIUK Registration: https://www.scd.stfc.ac.uk/Pages/CIUK2021.aspx

Thanks

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211102/afa5f990/attachment-0002.htm>

From mark.bergman at uphs.upenn.edu  Thu Nov  4 21:17:33 2021
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Thu, 04 Nov 2021 17:17:33 -0400
Subject: [gpfsug-discuss] possible to rename a snapshot?
Message-ID: <1825700-1636060653.986878@yfV0.OUFD.5EUE>

Does anyone know if it is possible to rename an existing snapshot under GPFS 5.0.5.7?

Thanks,

Mark


From heinrich.billich at id.ethz.ch  Mon Nov  8 09:20:24 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 8 Nov 2021 09:20:24 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
From olaf.weiser at de.ibm.com  Mon Nov  8 09:53:04 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 8 Nov 2021 09:53:04 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/1d32c09e/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Mon Nov  8 09:54:18 2021
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Mon, 8 Nov 2021 09:54:18 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2@strath.ac.uk>

On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:

> Hello,
> 
> We use /tmp/mmfs as dataStructureDump directory. Since a while I
> notice that this directory randomly vanishes. Mmhealth does not
> complain but just notes that it will no longer monitor the directory.
> Still I doubt that trace collection and similar will create the
> directory when needed?
> 
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time,
> it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> 6.0.2.2 and 6.0.2.2.
> 

I know several Linux distributions clear the contents of /tmp at boot 
time. Could that explain it?

I would say using /tmp like you are doing is not a sensible idea anyway 
and that you should be using something under /var.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From lior at nyu.edu  Mon Nov  8 14:38:35 2021
From: lior at nyu.edu (Lior Atar)
Date: Mon, 8 Nov 2021 09:38:35 -0500
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 118, Issue 4
In-Reply-To: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1636372801.3127833.gpfsug-discuss@spectrumscale.org>
Message-ID: <CAAzOg0orG4nkxev+0LRDwxRtGADnU7Nsv9q+Aw=3cU21LitVcA@mail.gmail.com>

Hello all,

/tmp/mmfs is being deleted every 10 days by a systemd service "
systemd-tmpfiles-setup.service
". That service calls a configuration file "  /usr/lib/tmpfiles.d/tmp.conf
. What we did was add a drop in file in /etc/tmpfiles.d/tmp.conf to then
create the directory /tmp/mmfs and then exclude deleting going forward.
Here's our actual file and some commentary of what the options mean:

# cat /etc/tmpfiles.d/tmp.conf
# Create a /tmp/mmfs directory
d /tmp/mmfs 0755 root root 1s <-------- the " d " is to create directory
x /tmp/mmfs/*                 <-------- the " x " says to ignore it

That change helped us avoid /tmp/mmfs from being deleted every 10 days.

In addition I think also did a %systemctl daemon-reload ( but I don't have
it in my notes, wouldn't hurt to run it )

Hope this helps,
Lior

On Mon, Nov 8, 2021 at 7:00 AM <gpfsug-discuss-request at spectrumscale.org>
wrote:

> Send gpfsug-discuss mailing list submissions to
>         gpfsug-discuss at spectrumscale.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
> or, via email, send a message with subject or body 'help' to
>         gpfsug-discuss-request at spectrumscale.org
>
> You can reach the person managing the list at
>         gpfsug-discuss-owner at spectrumscale.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>    1. /tmp/mmfs vanishes randomly? (Billich  Heinrich Rainer (ID SD))
>    2. Re: /tmp/mmfs vanishes randomly? (Olaf Weiser)
>    3. Re: /tmp/mmfs vanishes randomly? (Jonathan Buzzard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 8 Nov 2021 09:20:24 +0000
> From: "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <739922FB-051D-4239-A6F6-3B7782E9849D at id.ethz.ch>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> We use /tmp/mmfs as dataStructureDump directory. Since a while I notice
> that this directory randomly vanishes. Mmhealth does not complain but just
> notes that it will no longer monitor the directory. Still I doubt that
> trace collection and similar will create the directory when needed?
>
> Do you know of any spectrum scale internal mechanism that could cause
> /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> installation, too. It happens just on one or two nodes at a time, it's no
> cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and
> 6.0.2.2.
>
> Thank you,
>
> Mmhealth message:
> local_fs_path_not_found   INFO       The configured dataStructureDump path
> /tmp/mmfs does not exists. Skipping monitoring.
>
> Kind regards,
>
> Heiner
> ---
> =======================
> Heinrich Billich
> ETH Z?rich
> Informatikdienste
> Tel.: +41 44 632 72 56
> heinrich.billich at id.ethz.ch
> ========================
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 8 Nov 2021 09:53:04 +0000
> From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
> To: gpfsug-discuss at spectrumscale.org
> Cc: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID:
>         <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25 at ibm.com
> >
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL: <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20211108_1d32c09e_attachment-2D0001.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=zpe2MuRXotkV_yDkY-UQSIE68CEBIWsRoj4Qya85nJU&e=
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 8 Nov 2021 09:54:18 +0000
> From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> To: gpfsug-discuss at spectrumscale.org
> Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
> Message-ID: <e018a360-b63b-6425-9a70-47713fb14bf2 at strath.ac.uk>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 08/11/2021 09:20, Billich Heinrich Rainer (ID SD) wrote:
>
> > Hello,
> >
> > We use /tmp/mmfs as dataStructureDump directory. Since a while I
> > notice that this directory randomly vanishes. Mmhealth does not
> > complain but just notes that it will no longer monitor the directory.
> > Still I doubt that trace collection and similar will create the
> > directory when needed?
> >
> > Do you know of any spectrum scale internal mechanism that could cause
> > /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM
> > installation, too. It happens just on one or two nodes at a time,
> > it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS
> > 6.0.2.2 and 6.0.2.2.
> >
>
> I know several Linux distributions clear the contents of /tmp at boot
> time. Could that explain it?
>
> I would say using /tmp like you are doing is not a sensible idea anyway
> and that you should be using something under /var.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=mpcjMHidaF8RcWRPB_iRCw&m=9QxnPQt1bSZxcCSYNtyRayTlYJXf34X5KKh3De5IgMDu-nH9CJqmaDSWLT8a55c6&s=vChJle7IBS3KbsRXb2h7akGKeDm_cjQUD6xeLHLSyDs&e=
>
>
> End of gpfsug-discuss Digest, Vol 118, Issue 4
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211108/18f9990e/attachment-0002.htm>

From l.r.sudbery at bham.ac.uk  Tue Nov  9 16:55:36 2021
From: l.r.sudbery at bham.ac.uk (Luke Sudbery)
Date: Tue, 9 Nov 2021 16:55:36 +0000
Subject: [gpfsug-discuss] gplbin package filename changed in 5.1.2.0?
Message-ID: <LO2P265MB0704E08CD27D3538B6FB111B90929@LO2P265MB0704.GBRP265.PROD.OUTLOOK.COM>

mmbuildgpl in 5.1.2.0 has build me a package with the filename:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64-5.1.2-0.x86_64.rpm

Before it would have been:
gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.rpm

The RPM package name itself still appears to be gpfs.gplbin-4.18.0-305.12.1.el8_4.x86_64.

Is this expected? Is this a permanent change? Just wondering whether to re-tool some of our existing build/install infrastructure or just create a symlink for this one...

Many thanks,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don't work on Monday.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211109/3e57d8a0/attachment-0002.htm>

From frederik.ferner at diamond.ac.uk  Wed Nov 10 10:28:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Wed, 10 Nov 2021 10:28:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
Message-ID: <YYuewK2QIGo/VR23@diamond.ac.uk>

Hi Ragu,

have you ever received any reply to this or managed to solve it? We are
seeing exactly the same error and it's filling up our logs. It seems all
the monitoring data is still extracted, so I'm not sure when it
started so not sure if this is related to any upgrade on our side, but
it may have been going on for a while. We only noticed because the log
file now is filling up the local log partition.

Kind regards,
Frederik

On 26/08/2021 11:49, Ragho Mahalingam wrote:
> We've been working on setting up mmperfmon; after creating a new
> configuration with the new collector on the same manager node, mmsysmon
> keeps throwing exceptions.
> 
>   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> 123, in _getDataFromZimonSocket
>     sock.connect(SOCKET_PATH)
> FileNotFoundError: [Errno 2] No such file or directory
> 
> Tracing this a bit, it appears that SOCKET_PATH is
>  /var/run/perfmon/pmcollector.socket and this unix domain socket is absent,
> even though pmcollector has started and is running successfully.
> 
> Under what scenarios is pmcollector supposed to create this socket?  I
> don't see any configuration for this in /opt/IBM/zimon/ZIMonCollector.cfg,
> so I'm assuming the socket is automatically created when pmcollector starts.
> 
> Any thoughts on how to debug and resolve this?
> 
> Thanks, Ragu

-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From ragho.mahalingam+spectrumscaleug at pathai.com  Wed Nov 10 14:00:19 2021
From: ragho.mahalingam+spectrumscaleug at pathai.com (Ragho Mahalingam)
Date: Wed, 10 Nov 2021 09:00:19 -0500
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
	being absent
In-Reply-To: <YYuewK2QIGo/VR23@diamond.ac.uk>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>

Hi Frederick,

In our case the issue started appearing after upgrading from 5.0.4 to
5.1.1.  If you've recently upgraded, then the following may be useful.

Turns out that mmsysmon (gpfs-base package) requires the new
gpfs.gss.pmcollector (from zimon packages) to function correctly (the
AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
our case, we'd upgraded all the mandatory packages but had not upgraded the
optional ones; the mmsysmonc python libs appears to be updated by the
pmcollector package from my study.

If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
definitely need that to make this runaway logging stop.

Hope that helps!

Ragu

On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
frederik.ferner at diamond.ac.uk> wrote:

> Hi Ragu,
>
> have you ever received any reply to this or managed to solve it? We are
> seeing exactly the same error and it's filling up our logs. It seems all
> the monitoring data is still extracted, so I'm not sure when it
> started so not sure if this is related to any upgrade on our side, but
> it may have been going on for a while. We only noticed because the log
> file now is filling up the local log partition.
>
> Kind regards,
> Frederik
>
> On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > We've been working on setting up mmperfmon; after creating a new
> > configuration with the new collector on the same manager node, mmsysmon
> > keeps throwing exceptions.
> >
> >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > 123, in _getDataFromZimonSocket
> >     sock.connect(SOCKET_PATH)
> > FileNotFoundError: [Errno 2] No such file or directory
> >
> > Tracing this a bit, it appears that SOCKET_PATH is
> >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> absent,
> > even though pmcollector has started and is running successfully.
> >
> > Under what scenarios is pmcollector supposed to create this socket?  I
> > don't see any configuration for this in
> /opt/IBM/zimon/ZIMonCollector.cfg,
> > so I'm assuming the socket is automatically created when pmcollector
> starts.
> >
> > Any thoughts on how to debug and resolve this?
> >
> > Thanks, Ragu
>
> --
> Frederik Ferner (he/him)
> Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
>
> SciComp Help Desk can be reached on x8596
>
>
> (Apologies in advance for the lines below. Some bits are a legal
> requirement and I have no control over them.)
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
*Disclaimer: This email and any corresponding attachments may contain 
confidential information. If you're not the intended recipient, any 
copying, distribution, disclosure, or use of any information contained in 
the email or its attachments is strictly prohibited. If you believe to have 
received this email in error, please email security at pathai.com 
<mailto:security at pathai.com> immediately, then destroy the email and any 
attachments without reading or saving.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/016a38b0/attachment-0002.htm>

From stockf at us.ibm.com  Wed Nov 10 14:14:47 2021
From: stockf at us.ibm.com (Frederick Stock)
Date: Wed, 10 Nov 2021 14:14:47 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?mmsysmon_exception_with_pmcollector_so?=
 =?utf-8?q?cket=09being_absent?=
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>,
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com><YYuewK2QIGo/VR23@diamond.ac.uk>
Message-ID: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211110/7d72b727/attachment-0002.htm>

From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:38:56 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:38:56 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector socket
 being absent
In-Reply-To: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
References: <CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
Message-ID: <YY0c8HUC2Pc9kOwA@diamond.ac.uk>

Hi Ragu,

many thanks for the response. That was indeed the problem. We missed it
when we upgraded a while ago and because our normal monitoring continued
to work, we didn't notice until now.

Kind regards,
Frederik

On 10/11/2021 09:00, Ragho Mahalingam wrote:
> Hi Frederick,
> 
> In our case the issue started appearing after upgrading from 5.0.4 to
> 5.1.1.  If you've recently upgraded, then the following may be useful.
> 
> Turns out that mmsysmon (gpfs-base package) requires the new
> gpfs.gss.pmcollector (from zimon packages) to function correctly (the
> AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).  In
> our case, we'd upgraded all the mandatory packages but had not upgraded the
> optional ones; the mmsysmonc python libs appears to be updated by the
> pmcollector package from my study.
> 
> If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
> packages installed.  If gpfs.gss.pmcollector isn't installed, you'd
> definitely need that to make this runaway logging stop.
> 
> Hope that helps!
> 
> Ragu
> 
> On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner <
> frederik.ferner at diamond.ac.uk> wrote:
> 
> > Hi Ragu,
> >
> > have you ever received any reply to this or managed to solve it? We are
> > seeing exactly the same error and it's filling up our logs. It seems all
> > the monitoring data is still extracted, so I'm not sure when it
> > started so not sure if this is related to any upgrade on our side, but
> > it may have been going on for a while. We only noticed because the log
> > file now is filling up the local log partition.
> >
> > Kind regards,
> > Frederik
> >
> > On 26/08/2021 11:49, Ragho Mahalingam wrote:
> > > We've been working on setting up mmperfmon; after creating a new
> > > configuration with the new collector on the same manager node, mmsysmon
> > > keeps throwing exceptions.
> > >
> > >   File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py", line
> > > 123, in _getDataFromZimonSocket
> > >     sock.connect(SOCKET_PATH)
> > > FileNotFoundError: [Errno 2] No such file or directory
> > >
> > > Tracing this a bit, it appears that SOCKET_PATH is
> > >  /var/run/perfmon/pmcollector.socket and this unix domain socket is
> > absent,
> > > even though pmcollector has started and is running successfully.
> > >
> > > Under what scenarios is pmcollector supposed to create this socket?  I
> > > don't see any configuration for this in
> > /opt/IBM/zimon/ZIMonCollector.cfg,
> > > so I'm assuming the socket is automatically created when pmcollector
> > starts.
> > >
> > > Any thoughts on how to debug and resolve this?
> > >
> > > Thanks, Ragu
> >
> > --
> > Frederik Ferner (he/him)
> > Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
> > Diamond Light Source Ltd.                       mob:   +44 7917 08 5110
> >
> > SciComp Help Desk can be reached on x8596
> >
> >
> > (Apologies in advance for the lines below. Some bits are a legal
> > requirement and I have no control over them.)
> >
> > --
> > This e-mail and any attachments may contain confidential, copyright and or
> > privileged material, and are for the use of the intended addressee only. If
> > you are not the intended addressee or an authorised recipient of the
> > addressee please notify us of receipt by returning the e-mail and do not
> > use, copy, retain, distribute or disclose the information in or attached to
> > the e-mail.
> > Any opinions expressed within this e-mail are those of the individual and
> > not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > attachments are free from viruses and we cannot accept liability for any
> > damage which you may sustain as a result of software viruses which may be
> > transmitted in or with the message.
> > Diamond Light Source Limited (company no. 4375679). Registered in England
> > and Wales with its registered office at Diamond House, Harwell Science and
> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> 
> -- 
> *Disclaimer: This email and any corresponding attachments may contain 
> confidential information. If you're not the intended recipient, any 
> copying, distribution, disclosure, or use of any information contained in 
> the email or its attachments is strictly prohibited. If you believe to have 
> received this email in error, please email security at pathai.com 
> <mailto:security at pathai.com> immediately, then destroy the email and any 
> attachments without reading or saving.*

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From frederik.ferner at diamond.ac.uk  Thu Nov 11 13:45:16 2021
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 11 Nov 2021 13:45:16 +0000
Subject: [gpfsug-discuss] mmsysmon exception with pmcollector
 socket?being absent
In-Reply-To: <OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
References: <CAPGcQxjSv1Vmph0merJw4iF8mg-B3sV0_C29K00Wccz=+nr_Qw@mail.gmail.com>
	<CAPGcQxjmFei3DsdftK4cxV0R4=fvpsptn6RLe4RyNot+k1QZyg@mail.gmail.com>
	<YYuewK2QIGo/VR23@diamond.ac.uk>
	<OF0A44A5FA.DC4305A8-ON00258789.004E1A69-00258789.004E425B@ibm.com>
Message-ID: <YY0ebNPdMg5CU/sf@diamond.ac.uk>

Hi Fred,

we haven't used the deployement tool anywhere so far, we always
apply/upgrade the RPMs directly. (Centrally managed via CFengine,
promising that certain Spectrum Scale RPMs are installed. I haven't yet
checked how the gpfs.gss.pmcollector RPM were installed initially as
they weren't in our list of promised packages, which is why the upgrade
was missed.)

Kind regards,
Frederik

On 10/11/2021 14:14, Frederick Stock wrote:
>    I am curious to know if you upgraded by manually applying rpms or if you
>    used the Spectrum Scale deployment tool (spectrumscale command) to apply
>    the upgrade?
>    Fred
>    _______________________________________________________
>    Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821
>    stockf at us.ibm.com
>    ?
>    ?
> 
>      ----- Original message -----
>      From: "Ragho Mahalingam" <ragho.mahalingam+spectrumscaleug at pathai.com>
>      Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      To: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
>      Cc:
>      Subject: [EXTERNAL] Re: [gpfsug-discuss] mmsysmon exception with
>      pmcollector socket being absent
>      Date: Wed, Nov 10, 2021 9:00 AM
>      ?
>      Hi Frederick,
> 
>      In our case the issue started appearing after upgrading from 5.0.4 to
>      5.1.1.? If you've recently upgraded, then the following may be useful.
> 
>      Turns out that mmsysmon (gpfs-base package) requires the new
>      gpfs.gss.pmcollector (from zimon packages) to function correctly (the
>      AF_INET -> AF_UNIX switch seems to have happened between 5.0 and 5.1).?
>      In our case, we'd upgraded all the mandatory packages but had
>      not?upgraded the optional ones; the mmsysmonc?python libs appears to be
>      updated by the pmcollector package from my study.
>      ?
>      If you're running >5.1, I'd suggest checking the versions of gpfs.gss.*
>      packages installed.? If gpfs.gss.pmcollector isn't installed, you'd
>      definitely need that to make this runaway logging stop.
>      ?
>      Hope that helps!
>      ?
>      Ragu
>      ?
>      On Wed, Nov 10, 2021 at 5:40 AM Frederik Ferner
>      <[1]frederik.ferner at diamond.ac.uk> wrote:
> 
>        Hi Ragu,
> 
>        have you ever received any reply to this or managed to solve it? We
>        are
>        seeing exactly the same error and it's filling up our logs. It seems
>        all
>        the monitoring data is still extracted, so I'm not sure when it
>        started so not sure if this is related to any upgrade on our side, but
>        it may have been going on for a while. We only noticed because the log
>        file now is filling up the local log partition.
> 
>        Kind regards,
>        Frederik
> 
>        On 26/08/2021 11:49, Ragho Mahalingam wrote:
>        > We've been working on setting up mmperfmon; after creating a new
>        > configuration with the new collector on the same manager node,
>        mmsysmon
>        > keeps throwing exceptions.
>        >
>        >? ?File "/usr/lpp/mmfs/lib/mmsysmon/container/PerfmonController.py",
>        line
>        > 123, in _getDataFromZimonSocket
>        >? ? ?sock.connect(SOCKET_PATH)
>        > FileNotFoundError: [Errno 2] No such file or directory
>        >
>        > Tracing this a bit, it appears that SOCKET_PATH is
>        >? /var/run/perfmon/pmcollector.socket and this unix domain socket is
>        absent,
>        > even though pmcollector has started and is running successfully.
>        >
>        > Under what scenarios is pmcollector supposed to create this socket??
>        I
>        > don't see any configuration for this in
>        /opt/IBM/zimon/ZIMonCollector.cfg,
>        > so I'm assuming the socket is automatically created when pmcollector
>        starts.
>        >
>        > Any thoughts on how to debug and resolve this?
>        >
>        > Thanks, Ragu
> 
>        --
>        Frederik Ferner (he/him)
>        Senior Computer Systems Administrator (storage) phone: +44 1235 77
>        8624
>        Diamond Light Source Ltd.? ? ? ? ? ? ? ? ? ? ? ?mob:? ?+44 7917 08
>        5110
> 
>        SciComp Help Desk can be reached on x8596
> 
>        (Apologies in advance for the lines below. Some bits are a legal
>        requirement and I have no control over them.)
> 
>        --
>        This e-mail and any attachments may contain confidential, copyright
>        and or privileged material, and are for the use of the intended
>        addressee only. If you are not the intended addressee or an authorised
>        recipient of the addressee please notify us of receipt by returning
>        the e-mail and do not use, copy, retain, distribute or disclose the
>        information in or attached to the e-mail.
>        Any opinions expressed within this e-mail are those of the individual
>        and not necessarily of Diamond Light Source Ltd.
>        Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>        attachments are free from viruses and we cannot accept liability for
>        any damage which you may sustain as a result of software viruses which
>        may be transmitted in or with the message.
>        Diamond Light Source Limited (company no. 4375679). Registered in
>        England and Wales with its registered office at Diamond House, Harwell
>        Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
>        Kingdom
>        _______________________________________________
>        gpfsug-discuss mailing list
>        gpfsug-discuss at [2]spectrumscale.org
>        [3]http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>      Disclaimer: This email and any corresponding attachments may contain
>      confidential information. If you're not the intended recipient, any
>      copying, distribution, disclosure, or use of any information contained
>      in the email or its attachments is strictly prohibited. If you believe
>      to have received this email in error, please email
>      [4]security at pathai.com immediately, then destroy the email and any
>      attachments without reading or saving.
>      _______________________________________________
>      gpfsug-discuss mailing list
>      gpfsug-discuss at spectrumscale.org
>      [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss?
> 
>    ?
> 
> References
> 
>    Visible links
>    1. mailto:frederik.ferner at diamond.ac.uk
>    2. http://spectrumscale.org/
>    3. http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>    4. mailto:security at pathai.com
>    5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Frederik Ferner (he/him)
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd.                       mob:   +44 7917 08 5110

SciComp Help Desk can be reached on x8596


(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From pinkesh.valdria at oracle.com  Fri Nov 12 07:57:14 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 07:57:14 +0000
Subject: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey
	(secret key)
Message-ID: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>

Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>  get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/540b022f/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 11:54:38 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 17:24:38 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?AFM_with_Object_Storage_-_fails_with_i?=
 =?utf-8?q?nvalid_skey=09=28secret_key=29?=
In-Reply-To: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
Message-ID: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>

Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help. 
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key. 
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
 get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/6aa45d39/attachment-0002.htm>

From pinkesh.valdria at oracle.com  Fri Nov 12 12:26:44 2021
From: pinkesh.valdria at oracle.com (Pinkesh Valdria)
Date: Fri, 12 Nov 2021 12:26:44 +0000
Subject: [gpfsug-discuss] [External] : Re: AFM with Object Storage -
 fails with invalid skey	(secret key)
In-Reply-To: <OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
Message-ID: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>

Thanks Venkat for quick response.

Unfortunately secret keys are auto generated and all of them have = at the end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the next release with such a fix might be available?


Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails with invalid skey (secret key)

Hi,

AFM does not accept character '='  as part of  access and secret keys. It matches the keys with below expression

"$KEY" =~ ^[0-9a-zA-Z/+._]+$

We will fix it to accept other allowed characters in future releases including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello GPFS experts,

Today I was trying to configure AFM with Object Storage (AWS s3 compatible) and its failing for me.  I was wondering if you can help me or introduce me to the person/team who can help.

Failed:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.

I figured out, it fails because it doesn?t like the equal to ?=? sign in the secret key.

Proof:
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com> get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg

I tried to use  single quote,  double quote around the secret keys, but it still fails.
mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='

mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?

I also tried to add the key in the keyfile and still it fails.

[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com<mailto:us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com>set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine cause.
[root at dr-compute-1 ras]#


Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI<https://blogs.oracle.com/cloud-infrastructure/authors/Blog-Author/CORE3492D43441E64BEBBE3E04A9C8D5EA40/pinkesh-valdria>
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!ACWV5N9M2RV99hQ!YKxmZ34lMfepVIlU8m6Srvcc6xP9cbgAPBc7Eqy31T2KQHRIvlAPQtM62TeOLsQdhpi-$>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/da4ee860/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Nov 12 12:50:48 2021
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 12 Nov 2021 18:20:48 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?=3A_Re=3A___AFM_with_Object_Storage_-_?=
 =?utf-8?q?fails_with_invalid_skey=09=28secret_key=29?=
In-Reply-To: <MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
References: <858E8034-B226-40A0-95D0-F20617697E69@oracle.com>
	<OF7D588615.A7827F7D-ON0025878B.003F5424-6525878B.00416D3D@ibm.com>
	<MWHPR1001MB22087957053A7F5268E8D1348A959@MWHPR1001MB2208.namprd10.prod.outlook.com>
Message-ID: <OF95EE4921.A4DD587C-ON0025878B.0045D2C4-6525878B.0046917F@ibm.com>

Hi Pinkesh,

You could open a ticket to get the efix.

~Venkat (vpuvvada at in.ibm.com)


From:   "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:     "Venkateswara R Puvvada" <vpuvvada at in.ibm.com>, "gpfsug main 
discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   11/12/2021 05:57 PM
Subject:        Re: [External] : Re:  [gpfsug-discuss] AFM with Object 
Storage - fails with invalid skey       (secret key)


Thanks Venkat for quick response. Unfortunately secret keys are auto 
generated and all of them have = at the end :-(. Is there a way to receive 
a patch fix or unofficial fix to unblock . Do you have a rough estimate (1 
month, 3 months, 6 months) ZjQcmQRYFpfptBannerStart 
This Message Is From an External Sender 
This message came from outside your organization. 
ZjQcmQRYFpfptBannerEnd
Thanks Venkat for quick response. 

Unfortunately secret keys are auto generated and all of them have = at the 
end :-(.

Is there a way to receive a patch fix or unofficial fix to  unblock .

Do you have a rough estimate (1 month, 3 months, 6 months) of when the 
next release with such a fix might be available? 


Get Outlook for iOS

From: Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Sent: Friday, November 12, 2021 7:54:38 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>; 
Pinkesh Valdria <pinkesh.valdria at oracle.com>
Subject: [External] : Re: [gpfsug-discuss] AFM with Object Storage - fails 
with invalid skey (secret key) 
 
Hi,

AFM does not accept character '='  as part of  access and secret keys. It 
matches the keys with below expression
 
"$KEY" =~ ^[0-9a-zA-Z/+._]+$ 

We will fix it to accept other allowed characters in future releases 
including char '=', for now generate secret key without '=' char.

~Venkat (vpuvvada at in.ibm.com)


From:        "Pinkesh Valdria" <pinkesh.valdria at oracle.com>
To:        "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:        11/12/2021 02:31 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM with Object Storage - 
fails with invalid skey        (secret key)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello GPFS experts, 
 
Today I was trying to configure AFM with Object Storage (AWS s3 
compatible) and its failing for me.  I was wondering if you can help me or 
introduce me to the person/team who can help.  
 
Failed:
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
 
I figured out, it fails because it doesn?t like the equal to ?=? sign in 
the secret key.  
 
Proof: 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
Works
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com 
get
22f79xxxx:clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg
 
I tried to use  single quote,  double quote around the secret keys, but it 
still fails. 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  'clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg='
 
mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set 22f79xxxx  ?clTQ1t4bGL57ca+kFKgJgKrteAwnzhj0854Zeg=?
 
I also tried to add the key in the keyfile and still it fails. 
 
[root at dr-compute-1 ras]# mmafmcoskeys afm-ocios:
us-ashburn-1 at hpc_limited_availability.compat.objectstorage.us-ashburn-1.oraclecloud.com
set --keyfile /var/adm/ras/keyfile
invalid skey (secret key)
mmafmcoskeys: Command failed. Examine previous error messages to determine 
cause.
[root at dr-compute-1 ras]#
 
 
Thanks,
Pinkesh Valdria
Head of HPC Storage
Master Principal Solutions Architect ? HPC
Oracle Cloud Infrastructure
+65-8932-3639 (m) - Singapore 
+1-425-205-7834 (m) ? USA
Blogs on File Systems on OCI
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211112/8467452d/attachment-0002.htm>

From Robert.Oesterlin at nuance.com  Mon Nov 15 18:44:04 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 15 Nov 2021 18:44:04 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? ? fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/ec921e7b/attachment-0002.htm>

From ncalimet at lenovo.com  Mon Nov 15 21:31:03 2021
From: ncalimet at lenovo.com (Nicolas CALIMET)
Date: Mon, 15 Nov 2021 21:31:03 +0000
Subject: [gpfsug-discuss] [External]  Pmcollector fails to start
In-Reply-To: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
References: <MWHPR05MB30564AB0002AF1D386A95345E4989@MWHPR05MB3056.namprd05.prod.outlook.com>
Message-ID: <SG2PR03MB5165A0716D116D63920D6BAAB1989@SG2PR03MB5165.apcprd03.prod.outlook.com>

Hi,

I?ve been experiencing this ?start request repeated too quickly? issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic).

In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster:

mmchnode --noperfmon -N all   # required before deleting the perfmon config
mmperfmon config delete --all
mmperfmon config generate --collectors <GUINODES>  # start the pmcollector service on the GUI nodes
mmchnode --perfmon -N all  # start the pmsensors service on all nodes

It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies.

HTH

--
Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Monday, November 15, 2021 19:44
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [External] [gpfsug-discuss] Pmcollector fails to start

Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

?service pmcollector start? - fails:

Redirecting to /bin/systemctl status pmcollector.service
? pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211115/425cea19/attachment-0002.htm>

From heinrich.billich at id.ethz.ch  Tue Nov 16 16:44:21 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 16 Nov 2021 16:44:21 +0000
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch>
	<OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
Message-ID: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>

Hello Olaf,

Thank you,  you are right. I was ignorant about the systemd-tmpfiles* services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least not on our nodes. I consider to modify the configuration a bit to keep the directory  /tmp/mmfs  - or even create it ? but to clean it?s content .

Best regards,

Heiner

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?

Hallo Heiner,

multiple levels of answers..

(1st) ... it the directory is not there, the gpfs trace would create it automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]#
[root at ess5-ems1 ~]# ls -l /tmp/mmfs
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]#


(2nd) I think - the cleaning of /tmp is something done by the OS -

please check -

systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]#


hope this helps -
cheers


Mit freundlichen Gr??en / Kind regards

Olaf Weiser

IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940


----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35

Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice that this directory randomly vanishes. Mmhealth does not complain but just notes that it will no longer monitor the directory. Still I doubt that trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause /tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM installation, too. It happens just on one or two nodes at a time, it's no cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path /tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211116/575ae39e/attachment-0002.htm>

From scale at us.ibm.com  Thu Nov 18 09:09:25 2021
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Thu, 18 Nov 2021 17:09:25 +0800
Subject: [gpfsug-discuss] possible to rename a snapshot?
In-Reply-To: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
References: <1825700-1636060653.986878@yfV0.OUFD.5EUE>
Message-ID: <OF527EA0B8.1AD6361D-ON85258791.0032119E-48258791.00324CD2@ibm.com>

Mark,

GPFS does not support to rename an existing snapshot.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   mark.bergman at uphs.upenn.edu
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   2021/11/05 05:33 AM
Subject:        [EXTERNAL] [gpfsug-discuss] possible to rename a snapshot?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Does anyone know if it is possible to rename an existing snapshot under 
GPFS 5.0.5.7?

Thanks,

Mark
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/388fc64f/attachment-0002.htm>

From HAUBRICH at de.ibm.com  Thu Nov 18 13:01:39 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Thu, 18 Nov 2021 15:01:39 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>


status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with
chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package
install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/0f66a959/attachment-0002.gif>

From Robert.Oesterlin at nuance.com  Thu Nov 18 13:53:47 2021
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 18 Nov 2021 13:53:47 +0000
Subject: [gpfsug-discuss] Pmcollector fails to start
In-Reply-To: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
References: <OFA4E005A9.6467F5CC-ONC1258791.00473A5B-C1258791.0047904C@ibm.com>
Message-ID: <CY4PR05MB30457CF9B9FEC21D1D31A0F3E49B9@CY4PR05MB3045.namprd05.prod.outlook.com>

That was indeed the issue! We?ve linked /opt/IBM/zimon to another directory due to database size.  chown?ing that to scalepm.scalepm fixed it.

Now, creating a user ?scalepm? on the sly and not telling me ? not good!


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Manfred Haubrich <HAUBRICH at de.ibm.com>
Date: Thursday, November 18, 2021 at 7:01 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Pmcollector fails to start
CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments.
________________________________

status=203/EXEC could be a permission issue.
Starting manually from command line (most likely as root) did work.
With 5.1.1, pmcollector runs as user scalepm.
The package scripts create the user and apply according access with chmod/chown.
The commands can be reviewed with rpm -ql gpfs.gss.pmcollector --scripts
Maybe user scalepm is gone or there was an issue during package install/upgrade.

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development

________________________________

Phone:
+49 162 4159 706
 IBM Deutschland Research & Development GmbH

Email:
haubrich at de.ibm.com
 Wilhelm-Fay-Str. 34


 65936 Frankfurt am Main
________________________________
IBM Data Privacy Statement<https://urldefense.com/v3/__https:/www.ibm.com/privacy/us/en/__;!!L7QdHkQ!39WC0jPgMUXDyGc_9sRYDW0Sob7QcF8ndlH6HvaOmh1WbKIbqC3i_KphHMdD9FhsUKuk$>
IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 49 bytes
Desc: ecblank.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211118/b6989750/attachment-0002.gif>

From HAUBRICH at de.ibm.com  Fri Nov 19 09:00:49 2021
From: HAUBRICH at de.ibm.com (Manfred Haubrich)
Date: Fri, 19 Nov 2021 11:00:49 +0200
Subject: [gpfsug-discuss] Pmcollector fails to start
Message-ID: <OFCAF111A4.CD99C999-ONC1258792.0030CE1B-C1258792.003183BD@ibm.com>


Sorry for that difficulty, but the new user for the performance monitoring
tool was mentioned in the 5.1.1 summary of changes
https://www.ibm.com/docs/en/spectrum-scale/5.1.1?topic=summary-changes

Mit freundlichen Gr??en / Best regards / Saludos

Manfred Haubrich

IBM Spectrum Scale Development
                                                                                                                 
                                                                                                                 
 Phone:            +49 162 4159 706                     IBM Deutschland Research & Development                   
                                                       GmbH                                                      
                                                                                                                 
 Email:            haubrich at de.ibm.com                  Wilhelm-Fay-Str. 34                                      
                                                                                                                 
                                                        65936 Frankfurt am Main                                  
                                                                                                                 
                                                                                                                 
 IBM Data Privacy                                                                                                
 Statement                                                                                                       
                                                                                                                 
 IBM Deutschland                                                                                                 
 Research &                                                                                                      
 Development                                                                                                     
 GmbH /                                                                                                          
 Vorsitzender des                                                                                                
 Aufsichtsrats:                                                                                                  
 Gregor Pillen                                                                                                   
 Gesch?ftsf?hrung:                                                                                               
 Dirk Wittkopp                                                                                                   
 Sitz der                                                                                                        
 Gesellschaft:                                                                                                   
 B?blingen /                                                                                                     
 Registergericht:                                                                                                
 Amtsgericht                                                                                                     
 Stuttgart, HRB                                                                                                  
 243294                                                                                                          
                                                                                                                 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/1fe50134/attachment-0002.gif>

From PSAFRE at de.ibm.com  Fri Nov 19 13:49:11 2021
From: PSAFRE at de.ibm.com (Pavel Safre)
Date: Fri, 19 Nov 2021 15:49:11 +0200
Subject: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
In-Reply-To: <4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
References: <739922FB-051D-4239-A6F6-3B7782E9849D@id.ethz.ch><OF2A5EBCF0.DD13F52D-ON00258787.0035D664-00258787.00364C25@ibm.com>
	<4A219904-880E-4646-BE92-15741153355A@id.ethz.ch>
Message-ID: <OF2A174552.AA2A9E3F-ONC1258792.004A3DCE-C1258792.004BEB45@ibm.com>

Hello Heiner,

just a heads up for you and the other storage admins, regularly cleaning 
up /tmp, regarding one aspect to keep in mind:

- If you are using Spectrum Scale software call home (mmcallhome), it 
would be using the directory ${dataStructureDump}/callhome to save the 
copies of the uploaded data.
        This would be /tmp/mmfs/callhome/ in your case, which you would be 
automatically regularly removing.
- These copies are used by one of the features of call home: "mmcallhome 
status diff"
        - This feature allows to see an overview of the Spectrum Scale 
configuration changes, that occurred between 2 different points in time.
        - This effectively allows to quickly find out if any config 
changes occurred prior to an outage, thereby helping to find the root 
cause of self-caused problems in the Scale cluster.
        - It was added in Scale 5.0.5.0
        See IBM KC for more details: 
https://www.ibm.com/docs/en/spectrum-scale/5.1.0?topic=cch-use-cases-detecting-system-changes-by-using-mmcallhome-command

        - As a source of the "config snapshots", mmcallhome status diff is 
using the DC packages inside of ${dataStructureDump}/callhome, which you 
would be regularly deleting, thereby hugely reducing the usability of this 
particular feature.
- Of course, software call home automatically makes sure, it will not use 
too much space in dataStructureDump and it automatically removes the 
oldest entries, keeping at most 2GB or 300 files inside (default values, 
configurable).

Mit freundlichen Gr??en / Kind regards

Pavel Safre

Software Engineer
IBM Systems Group, IBM Spectrum Scale Development
Dept. M925


Phone:

 IBM Deutschland Research & Development GmbH

Email:
psafre at de.ibm.com
 Wilhelm-Fay-Stra?e 32


 65936 Frankfurt am Main

IBM Data Privacy Statement 
IBM Deutschland Research & Development GmbH / Vorsitzender des 
Aufsichtsrats: Gregor Pillen
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294 


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
Date:   16.11.2021 17:44
Subject:        [EXTERNAL] Re: [gpfsug-discuss] /tmp/mmfs vanishes 
randomly?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hello Olaf,
 
Thank you,  you are right. I was ignorant about the systemd-tmpfiles* 
services and timers. The cleanup in /tmp wasn?t present in RHEL7, at least 
not on our nodes. I consider to modify the configuration a bit to keep the 
directory  /tmp/mmfs  - or even create it ? but to clean it?s content .
 
Best regards,
 
Heiner
 
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser 
<olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 8 November 2021 at 10:53
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] /tmp/mmfs vanishes randomly?
 
Hallo Heiner,
 
multiple levels of answers..
 
(1st) ... it the directory is not there, the gpfs trace would create it 
automatically - just like this:
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
ls: cannot access '/tmp/mmfs': No such file or directory
[root at ess5-ems1 ~]# mmtracectl --start -N ems5k.mmfsd.net
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
 affected nodes.  This is an asynchronous process.
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# 
[root at ess5-ems1 ~]# ls -l /tmp/mmfs 
total 0
-rw-r--r-- 1 root root 0 Nov  8 10:47 lxtrace.trcerr.ems5k
[root at ess5-ems1 ~]# 

 
(2nd) I think - the cleaning of /tmp is something done by the OS -
please check - 
systemctl status systemd-tmpfiles-setup.service
or look at this config file
[root at ess5-ems1 ~]# cat /usr/lib/tmpfiles.d/tmp.conf 
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published 
by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
q /tmp 1777 root root 10d
q /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

# Remove top-level private temporary directories on each boot
R! /tmp/systemd-private-*
R! /var/tmp/systemd-private-*
[root at ess5-ems1 ~]# 
 
 
hope this helps -
cheers
 
 
Mit freundlichen Gr??en / Kind regards
 
Olaf Weiser
 
IBM Systems, SpectrumScale Client Adoption
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.weiser at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940
 
 
----- Urspr?ngliche Nachricht -----
Von: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Gesendet von: gpfsug-discuss-bounces at spectrumscale.org
An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
CC:
Betreff: [EXTERNAL] [gpfsug-discuss] /tmp/mmfs vanishes randomly?
Datum: Mo, 8. Nov 2021 10:35
 
Hello,

We use /tmp/mmfs as dataStructureDump directory. Since a while I notice 
that this directory randomly vanishes. Mmhealth does not complain but just 
notes that it will no longer monitor the directory. Still I doubt that 
trace collection and similar will create the directory when needed?

Do you know of any spectrum scale internal mechanism that could cause 
/tmp/mmfs to get deleted? It happens on ESS nodes, with a plain IBM 
installation, too. It happens just on one or two nodes at a time, it's no 
cluster-wide cleanup or similar. We run scale 5.0.5 and ESS 6.0.2.2 and 
6.0.2.2.

Thank you,

Mmhealth message:
local_fs_path_not_found   INFO       The configured dataStructureDump path 
/tmp/mmfs does not exists. Skipping monitoring.

Kind regards,

Heiner
---
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211119/af0fd962/attachment-0002.gif>

From novosirj at rutgers.edu  Fri Nov 19 16:46:34 2021
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Fri, 19 Nov 2021 16:46:34 +0000
Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
In-Reply-To: <OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
References: <OF67F2CFA5.B278755D-ON002582F2.002CFDE9-C12582F2.0030AC8F@notes.na.collabserv.com>
	<CAAxuGpE4fM+X1Gwc-OySE+ZWc_jygx5TACusAP4rO0BWNgtaKA@mail.gmail.com>
	<OF237C2868.64DA65AF-ON002582F2.003784F8-002582F2.0038210B@notes.na.collabserv.com>
	<OFE9B5AE08.76EA4744-ON002582F2.003C63C0-C12582F2.00410D62@notes.na.collabserv.com>
Message-ID: <9A96D22E-7744-4E42-A0AD-6DDD06397E24@rutgers.edu>

Has any progress been made here at all?

I have the same problem as the user who opened this thread. I run xCAT on the server where I want to run the GUI. I?ve attempted to limit the xCAT IP addresses (changing httpd.conf and ssl.conf), but as you note, the UPDATE_IPTABLES setting causes this not to work right, as the GUI wants all interfaces. I could turn that off, but it?s not clear to me what rules I?d need to manually create.

What I /really/ would like to do is limit the GPFS GUI to a single interface. I guess the only issue with that would be that maybe the remote machines/performance monitors might contact the machine on its main IP with data.

Modifying the ports as I described elsewhere in the thread did work pretty well, but there were some lingering GUI update problems and lots of connections on 443 to "/scalemgmt/v2/info? and ?/CommonEventServlet" that I never was able to track down). Now, I?ve tried disabling xCAT?s httpd server, reinstalled the gpfs.gui RPM, and started the GUI and it doesn?t seem to have gotten any better, so maybe this wasn?t a real problem and I?ll go back to modifying the ports, but I?d really like to do this ?the right way? without having to provide another machine in order to do it.

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Aug 23, 2018, at 7:50 AM, Markus Rohwedder <rohwedder at de.ibm.com> wrote:
> 
> Hello Juri, Keith,
> 
> thank you for your responses.
> 
> The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level.
> 
> Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node.
> However we have not set up, tested nor documented such a configuration as of today. 
> 
> Currently the GUI service manages the iptables redirect bring up and tear down.
> If this would be managed externally it would be possible to bind services to specific ports based on specific IPs.
> 
> In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to 
> - not check for already used ports when the GUI service tries to start up
> - don't create/destroy port forwarding rules during GUI service start and stop.
> This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level.
> 
> The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade.
> 
> See this KC link:
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>"Daniel Kidger" ---23.08.2018 12:13:36---Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case
> 
> From:  "Daniel Kidger" <daniel.kidger at uk.ibm.com>
> To:  gpfsug-discuss at spectrumscale.org
> Cc:  gpfsug-discuss at spectrumscale.org
> Date:  23.08.2018 12:13
> Subject:  Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Keith,
> 
> I have another IBM customer who also wished to move Scale GUI's https ports.
> In their case because they had their own web based management interface on the same https port.
> Is this the same reason that you have?
> If so I wonder how many other sites have the same issue?
> 
> One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0').
> Then run the two different GUIs, one per IP address.
> Is this an option, albeit a little ugly?
> Daniel
> 
> <17310450.gif>				Dr Daniel Kidger
> IBM Technical Sales Specialist
> Software Defined Solution Sales
> 
> +44-(0)7818 522 266 
> daniel.kidger at uk.ibm.com
> 
> 
> 
> ----- Original message -----
> From: "Markus Rohwedder" <rohwedder at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:
> Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Date: Thu, Aug 23, 2018 9:51 AM
> Hello Keith,
> 
> it is not so easy.
> 
> The GUI receives events from other scale components using the currently defined ports.
> Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES).
> Therefore at this point there is no procedure to change this behaviour across all components.
> 
> Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080.
> Tweaking the ports in the server.xml file will only change the native ports that the GUI uses.
> The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. 
> If these ports are already used by another service, the GUI will not start up.
> 
> Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap.
> If you want to emphasize your case as future development item, please let me know.
> 
> I would also be interested in:
> > Scale version you are running
> > Do you need port 80 or 443 as well?
> > Would it work for you if the xCAT service was bound to a single IP address?
> 
> Mit freundlichen Gr??en / Kind regards
> 
> Dr. Markus Rohwedder
> 
> Spectrum Scale GUI Development
> 
> <ecblank.gif>
> Phone:	+49 7034 6430190	IBM Deutschland Research & Development	
> <17153317.gif>
> E-Mail:	rohwedder at de.ibm.com	Am Weiher 24
> <ecblank.gif>	<ecblank.gif>	65451 Kelsterbach
> <ecblank.gif>	<ecblank.gif>	Germany
> <ecblank.gif>
> 
> <graycol.gif>Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI?
> 
> From: Keith Ball <bipcuds at gmail.com>
> To: gpfsug-discuss at spectrumscale.org
> Date: 22.08.2018 21:33
> Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Hello All,
> 
> Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml:
> 
> <httpEndpoint id="defaultHttpEndpoint" host="*" httpPort="47080" httpsPort="47443">
> <tcpOptions soReuseAddr="true"/>
> </httpEndpoint>
> 
> but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can.
> 
> Many Thanks,
> Keith
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From heinrich.billich at id.ethz.ch  Tue Nov 23 17:59:12 2021
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 23 Nov 2021 17:59:12 +0000
Subject: [gpfsug-discuss] AFM does too small NFS writes,
	and I don't see parallel writes
Message-ID: <D16766FC-4C3A-4088-80DD-660DE7D1506C@id.ethz.ch>

Hello,

 
We currently move data to a new AFM fileset and I see poor performance and ask for advice and insight:

 
The migration to afm home seems slow. I note:

 
Afm writes a whole file of ~100MB in much too many small chunks 
 

My assumption: The many small writes reduce performance as we have 100km between the sites and a higher latency.? The writes are not fully sequentially, but they aren?t done heavily parallel, either (like 10-100 outstanding writes at each time).

 
I the afm queue I see

 
8100214 Write [563636091.563636091] inflight (0 @ 0) chunks 2938 bytes 170872410 vIdx 1 thread_id 67862

 
I guess this means afm will write 170?872?410 bytes in 2?938chunks resulting in an average write size of 58k to inode 563636091.

 
So if I?m right my question is: 

 
What can I change to make afm ?write less and larger chunks per file? 

Does it depend on how we copy data? We write through ganesha/nfs, hence even if we write sequentially ganesha may still do it differently?

 
Another question ? is there a way to dump the? afm in-memory queue for a fileset? That would make it easier to see what?s going on when we do changes. I could grep for the inode of a testfile ?

 
We don?t do parallel writes across afm gateways, the files are too small, our limit is 1GB.

We configured two mounts from two ces servers at home for each filesets. Hence AFM could do writes in parallel to both mounts on the single gateway? 

A short tcpdump suggests: afm writes to a single ces server only and writes to a single inode at a time. But at each time a few writes (2-5) may overlap.

 
Kind regards,

 
Heiner

 
Just to illustrate ? what I see on the afm gateway ? too many reads and writes. There are almost no open/close hence its all to the same few files

 
------------nfs3-client------------ --------gpfs-file-operations------- --gpfs-i/o- -net/total-

 read? writ? rdir? inod?? fs?? cmmt| open? clos? read? writ? rdir? inod| read write| recv? send

?? 0? 1295???? 0???? 0???? 0???? 0 |?? 0???? 0? 1294???? 0???? 0???? 0 |89.8M??? 0 | 451k?? 94M

?? 0? 1248???? 0???? 0???? 0???? 0 |?? 0???? 0? 1248???? 0???? 0???? 8 |86.2M??? 0 | 432k?? 91M

?? 0? 1394???? 0???? 0???? 0???? 0 |?? 0???? 0? 1394???? 0???? 0???? 0 |96.8M??? 0 | 498k? 101M

?? 0? 1583???? 0???? 0???? 0???? 0 |?? 0???? 0? 1582???? 0???? 0???? 1 | 110M??? 0 | 560k? 115M

?? 0? 1543???? 0???? 1???? 0??? ?0 |?? 0???? 0? 1544???? 0???? 0???? 0 | 107M??? 0 | 540k? 112M

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5254 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211123/8325de0d/attachment-0002.bin>

From scl at virginia.edu  Tue Nov 30 12:47:46 2021
From: scl at virginia.edu (Losen, Stephen C (scl))
Date: Tue, 30 Nov 2021 12:47:46 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
Message-ID: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>

Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt

-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt    2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt    1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt        202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp

The core.*.dmp files are cores from the java command.

And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.

Any suggestions? Thanks for any help.


2021-11-30_07:25:09.944-0500: [W] ET_gui          Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui          state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4  received command: 'thresholds  refresh  collectors  4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  reload collectors                                 
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  read_collectors                                   
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryHandler: query response has no data results  
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4  _activate_rules_scheduler completed               
2021-11-30_07:25:10.147-0500: [I] ET_gui          Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui          StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui          Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui          Monitor: LocalState:FAILED Events:607 Entities:0 RT:  0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:04.553-0500: [D] ET_perfmon      File collectors has no newer version than 4021693  - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:12.077-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds  refresh  collectors  4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors                                 
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors                                   
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results  
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed               
2021-11-30_07:25:15.528-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:12.076-0500: [D] ET_perfmon      File collectors has no newer version than 4021694  - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:15.626-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:16.594-0500: [I] ClientThread-3  received command: 'thresholds  refresh  collectors  4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  reload collectors                                 
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  read_collectors                                   
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:15.625-0500: [D] ET_perfmon      File collectors has no newer version than 4021695  - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:19.881-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:21.238-0500: [I] ClientThread-7  received command: 'thresholds  refresh  collectors  4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  reload collectors                                 
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  read_collectors                                   
2021-11-30_07:25:21.324-0500: [W] NMES            monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold    Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold    Monitor: LocalState:HEALTHY Events:1 Entities:1 RT:  0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon      fput failed: Version mismatch on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      ---------------------------------                 
2021-11-30_07:25:19.881-0500: [D] ET_perfmon      File collectors has no newer version than 4021696  - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7  File zmrules.json has no newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon      New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon      read_collectors                                   
2021-11-30_07:25:24.314-0500: [I] ET_perfmon      write_collectors                                  
2021-11-30_07:25:24.543-0500: [I] ET_gui          ServiceMonitor => out=Type=notify

And then gpfsgui apparently crashes and systemd automatically restarts it.


Steve Losen
Research Computing
University of Virginia
scl at virginia.edu   434-924-0640


From luis.bolinches at fi.ibm.com  Tue Nov 30 13:30:06 2021
From: luis.bolinches at fi.ibm.com (Luis Bolinches)
Date: Tue, 30 Nov 2021 13:30:06 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
References: <37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/d13e7239/attachment-0002.htm>

From olaf.weiser at de.ibm.com  Tue Nov 30 13:34:17 2021
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Tue, 30 Nov 2021 13:34:17 +0000
Subject: [gpfsug-discuss] gpfsgui in a core dump/restart loop
In-Reply-To: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>
References: <OF5C9F541C.06CDEAD7-ON0025879D.0049DDF3-0025879D.004A2AC4@ibm.com>,
	<37F3A608-291B-4B71-92D7-0A150EFE469A@virginia.edu>
Message-ID: <OF2CFFE104.2A8DF5B5-ON0025879D.004A8B48-0025879D.004A8CF2@ibm.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20211130/45f2587f/attachment-0002.htm>