From chair at spectrumscale.org  Tue Sep  1 09:17:12 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 01 Sep 2020 09:17:12 +0100
Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File
 Create and MMAP performance
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2596 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment.ics>

From joe at excelero.com  Tue Sep  1 14:39:47 2020
From: joe at excelero.com (joe at excelero.com)
Date: Tue, 1 Sep 2020 08:39:47 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1
Message-ID: <f57736a4-39c1-425a-be69-ec714e129232@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment.bin>

From russell at nordquist.info  Wed Sep  2 15:38:35 2020
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 2 Sep 2020 10:38:35 -0400
Subject: [gpfsug-discuss] data replicas and metadata space used
Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>

I was reading this slide deck on GPFS metadata sizing and I ran across something
http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf

On slide 51 it says

"Max replicas for Data, multiplies the MD capacity used

? Reserves space in MD for the replicas even if no files replicated!?

This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise.

This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? 

thanks
Russell


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/47f9f6ec/attachment.htm>

From giovanni.bracco at enea.it  Wed Sep  2 21:28:55 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Wed, 2 Sep 2020 22:28:55 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>

I am curious to know about AMD epyc support by GPFS: what is the status?
Giovanni Bracco

On 28/08/20 14:25, Frederick Stock wrote:
> Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) 
> processors.? You may want to open a help case to determine the cause of 
> this problem.
> Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 
> so you may want to consider upgrading your cluster.? And should Scale 
> officially support the AMD epyc processor it would not be on Scale 4.2.x.
> 
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
> 
>     ----- Original message -----
>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>     Cc:
>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>     Date: Fri, Aug 28, 2020 5:52 AM
>     Hello,
> 
>     we have a gpfs v4 cluster running with 4 nsds and i am trying to add
>     some clients:
> 
>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
> 
>     this commands hangs and do not finish
> 
>     When i look into the server, i can see the following processes which
>     never finish:
> 
>     root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3
>     lc/setupClient
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>     root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2
>     21479
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>     0 1191
>     root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
> 
>     The node is an AMD epyc.
> 
>     Any idea what could cause the issue?
> 
>     ssh is possible in both directions and firewall is disabled.
> 
> 
>     Kind regards
> 
>      ?Philipp Rehs
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Wed Sep  2 23:28:34 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Wed, 2 Sep 2020 22:28:34 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>,
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de><OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/5abc3c57/attachment.htm>

From knop at us.ibm.com  Thu Sep  3 05:00:38 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Thu, 3 Sep 2020 04:00:38 +0000
Subject: [gpfsug-discuss] data replicas and metadata space used
In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>
Message-ID: <OF587C92A5.A09D17DC-ON002585D8.00149AE1-002585D8.001607EE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/5307b0b8/attachment.htm>

From giovanni.bracco at enea.it  Thu Sep  3 08:44:29 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Thu, 3 Sep 2020 09:44:29 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>

OK from client side, but I would like to know if the same is also for 
NSD servers with AMD EPYC, do they operate with good performance 
compared to Intel CPUs?

Giovanni

On 03/09/20 00:28, Andrew Beattie wrote:
> Giovanni,
> I have clients in Australia that are running AMD ROME processors in 
> their Visualisation nodes connected to scale 5.0.4 clusters with no issues.
> Spectrum Scale doesn't differentiate between x86 processor technologies 
> -- it only looks at x86_64 (OS support more than anything else)
> Andrew Beattie
> File and Object Storage Technical Specialist - A/NZ
> IBM Systems - Storage
> Phone: 614-2133-7927
> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
> 
>     ----- Original message -----
>     From: Giovanni Bracco <giovanni.bracco at enea.it>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>     Frederick Stock <stockf at us.ibm.com>
>     Cc:
>     Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>     about AMD epyc support in GPFS?
>     Date: Thu, Sep 3, 2020 7:29 AM
>     I am curious to know about AMD epyc support by GPFS: what is the status?
>     Giovanni Bracco
> 
>     On 28/08/20 14:25, Frederick Stock wrote:
>      > Not sure that Spectrum Scale has stated it supports the AMD epyc
>     (Rome?)
>      > processors.? You may want to open a help case to determine the
>     cause of
>      > this problem.
>      > Note that Spectrum Scale 4.2.x goes out of service on September
>     30, 2020
>      > so you may want to consider upgrading your cluster.? And should Scale
>      > officially support the AMD epyc processor it would not be on
>     Scale 4.2.x.
>      >
>      > Fred
>      > __________________________________________________
>      > Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>      > stockf at us.ibm.com
>      >
>      > ? ? ----- Original message -----
>      > ? ? From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>      > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      > ? ? To: gpfsug main discussion list
>     <gpfsug-discuss at spectrumscale.org>
>      > ? ? Cc:
>      > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>      > ? ? Date: Fri, Aug 28, 2020 5:52 AM
>      > ? ? Hello,
>      >
>      > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying
>     to add
>      > ? ? some clients:
>      >
>      > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>      >
>      > ? ? this commands hangs and do not finish
>      >
>      > ? ? When i look into the server, i can see the following
>     processes which
>      > ? ? never finish:
>      >
>      > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>     checkNewClusterNode3
>      > ? ? lc/setupClient
>      >    
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>      > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>      > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>     setupClient 2
>      > ? ? 21479
>      >    
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>      > ? ? 0 1191
>      > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>      > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>      >
>      > ? ? The node is an AMD epyc.
>      >
>      > ? ? Any idea what could cause the issue?
>      >
>      > ? ? ssh is possible in both directions and firewall is disabled.
>      >
>      >
>      > ? ? Kind regards
>      >
>      > ? ? ??Philipp Rehs
>      >
>      >
>      > ? ? _______________________________________________
>      > ? ? gpfsug-discuss mailing list
>      > ? ? gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
>      >
>      >
>      > _______________________________________________
>      > gpfsug-discuss mailing list
>      > gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
> 
>     --
>     Giovanni Bracco
>     phone ?+39 351 8804788
>     E-mail ?giovanni.bracco at enea.it
>     WWW http://www.afs.enea.it/bracco
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Thu Sep  3 09:10:38 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 3 Sep 2020 08:10:38 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>
Message-ID: <OF686646EA.945202B1-ON002585D8.002CEB1A-1599120638761@notes.na.collabserv.com>


I don?t currently have any x86 based servers to do that kind of performance
testing,

But the PCI-Gen 4 advantages alone mean that the AMD server options have
significant benefits over current Intel processor platforms.

There are however limited storage controllers and Network adapters that can
help utilise the full benefits of PCI-gen4.

In terms of NSD architecture there are many variables that you also have to
take into consideration.

Are you looking at storage rich servers?
Are you looking at SAN attached Flash
Are you looking at scale ECE type deployment?

As an IBM employee and someone familiar with ESS 5000, and the
differences / benefits of the 5K architecture,
Unless your planning on building a Scale ECE type cluster with AMD
processors, storage class memory, and NVMe flash modules.  I would
seriously consider the ESS 5k over an x86 based NL-SAS storage topology
Including AMD.


Sent from my iPhone

> On 3 Sep 2020, at 17:44, Giovanni Bracco <giovanni.bracco at enea.it> wrote:
>
> ?OK from client side, but I would like to know if the same is also for
> NSD servers with AMD EPYC, do they operate with good performance
> compared to Intel CPUs?
>
> Giovanni
>
>> On 03/09/20 00:28, Andrew Beattie wrote:
>> Giovanni,
>> I have clients in Australia that are running AMD ROME processors in
>> their Visualisation nodes connected to scale 5.0.4 clusters with no
issues.
>> Spectrum Scale doesn't differentiate between x86 processor technologies
>> -- it only looks at x86_64 (OS support more than anything else)
>> Andrew Beattie
>> File and Object Storage Technical Specialist - A/NZ
>> IBM Systems - Storage
>> Phone: 614-2133-7927
>> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
>>
>>    ----- Original message -----
>>    From: Giovanni Bracco <giovanni.bracco at enea.it>
>>    Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>>    Frederick Stock <stockf at us.ibm.com>
>>    Cc:
>>    Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>>    about AMD epyc support in GPFS?
>>    Date: Thu, Sep 3, 2020 7:29 AM
>>    I am curious to know about AMD epyc support by GPFS: what is the
status?
>>    Giovanni Bracco
>>
>>>    On 28/08/20 14:25, Frederick Stock wrote:
>>> Not sure that Spectrum Scale has stated it supports the AMD epyc
>>    (Rome?)
>>> processors.  You may want to open a help case to determine the
>>    cause of
>>> this problem.
>>> Note that Spectrum Scale 4.2.x goes out of service on September
>>    30, 2020
>>> so you may want to consider upgrading your cluster.  And should Scale
>>> officially support the AMD epyc processor it would not be on
>>    Scale 4.2.x.
>>>
>>> Fred
>>> __________________________________________________
>>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>>> stockf at us.ibm.com
>>>
>>>     ----- Original message -----
>>>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>>>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>     To: gpfsug main discussion list
>>    <gpfsug-discuss at spectrumscale.org>
>>>     Cc:
>>>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>>>     Date: Fri, Aug 28, 2020 5:52 AM
>>>     Hello,
>>>
>>>     we have a gpfs v4 cluster running with 4 nsds and i am trying
>>    to add
>>>     some clients:
>>>
>>>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>>>
>>>     this commands hangs and do not finish
>>>
>>>     When i look into the server, i can see the following
>>    processes which
>>>     never finish:
>>>
>>>     root     38138  0.0  0.0 123048 10376 ?        Ss   11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>>    checkNewClusterNode3
>>>     lc/setupClient
>>>
>>
%%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:

>>>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>>>     root     38169  0.0  0.0 123564 10892 ?        S    11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>>    setupClient 2
>>>     21479
>>>
>>
1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191

>>>     0 1191
>>>     root     38212  100  0.0  35544  5752 ?        R    11:32   9:40
>>>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>>>
>>>     The node is an AMD epyc.
>>>
>>>     Any idea what could cause the issue?
>>>
>>>     ssh is possible in both directions and firewall is disabled.
>>>
>>>
>>>     Kind regards
>>>
>>>       Philipp Rehs
>>>
>>>
>>>     _______________________________________________
>>>     gpfsug-discuss mailing list
>>>     gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>
>>    --
>>    Giovanni Bracco
>>    phone  +39 351 8804788
>>    E-mail  giovanni.bracco at enea.it
>>    WWW
http://www.afs.enea.it/bracco

>>    _______________________________________________
>>    gpfsug-discuss mailing list
>>    gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>
> --
> Giovanni Bracco
> phone  +39 351 8804788
> E-mail  giovanni.bracco at enea.it
> WWW
http://www.afs.enea.it/bracco

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/c6c95685/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep  4 08:56:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 4 Sep 2020 08:56:41 +0100
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>

On 02/09/2020 23:28, Andrew Beattie wrote:
> Giovanni, I have clients in Australia that are running AMD ROME
> processors in their Visualisation nodes connected to scale 5.0.4
> clusters with no issues. Spectrum Scale doesn't differentiate between
> x86 processor technologies -- it only looks at x86_64 (OS support
> more than anything else) 

While true bear in mind their are limits on the number of cores that it 
might be quite easy to pass on a high end multi CPU AMD machine :-)

See question 5.3

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

192 is the largest tested limit for the number of cores and there is a 
hard limit at 1536 cores.

 From memory these limits are lower in older versions of GPFS.So I think 
the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
release), but works just fine on 80 cores as far as I can tell.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep  4 10:02:29 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 4 Sep 2020 09:02:29 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
	<06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk>

Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet):

https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/

... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.

Simon

?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    On 02/09/2020 23:28, Andrew Beattie wrote:
    > Giovanni, I have clients in Australia that are running AMD ROME
    > processors in their Visualisation nodes connected to scale 5.0.4
    > clusters with no issues. Spectrum Scale doesn't differentiate between
    > x86 processor technologies -- it only looks at x86_64 (OS support
    > more than anything else) 

    While true bear in mind their are limits on the number of cores that it 
    might be quite easy to pass on a high end multi CPU AMD machine :-)

    See question 5.3

    https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

    192 is the largest tested limit for the number of cores and there is a 
    hard limit at 1536 cores.

     From memory these limits are lower in older versions of GPFS.So I think 
    the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
    release), but works just fine on 80 cores as far as I can tell.

    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oluwasijibomi.saula at ndsu.edu  Fri Sep  4 17:03:17 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Fri, 4 Sep 2020 16:03:17 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
Message-ID: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello GPFS Experts,

Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers?

I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing.

Thoughts?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200904/a7aa2a9e/attachment.htm>

From heinrich.billich at id.ethz.ch  Mon Sep  7 14:29:59 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 7 Sep 2020 13:29:59 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>

Hi,

just came across this:

/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..

It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ?


Cheers,

Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200907/e5c19e73/attachment.htm>

From knop at us.ibm.com  Tue Sep  8 04:09:07 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Tue, 8 Sep 2020 03:09:07 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
In-Reply-To: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <OF812C6C5F.2B36DE84-ON002585DD.0010FEB9-002585DD.00115081@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/09a7cf74/attachment.htm>

From scale at us.ibm.com  Tue Sep  8 14:04:26 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 09:04:26 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
Message-ID: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>

I think a better metaphor is that the bridge we just crossed has collapsed 
and as long as we do not need to cross it again our journey should reach 
its intended destination :-)  As I understand the intent of this message 
is to alert the user (and our support teams) that the directory from which 
a command was executed no longer exist.  Should that be of consequence to 
the execution of the command then failure is not unexpected, however, many 
commands do not make use of the current directory so they likely will 
succeed.  If you consider the view point of a command failing because the 
working directory was removed, but not knowing that was the root cause, I 
think you can see why this message was added into the administration 
infrastructure.  It allows this odd failure scenario to be quickly 
recognized saving time for both the user and IBM support, in tracking down 
the root cause.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/07/2020 09:29 AM
Subject:        [EXTERNAL] [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
 
just came across this:
 
/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..
 
It?s like a bus driver telling you that the brakes don?t work and next 
speeding up even more. Honestly, why not just fail with a nice error 
messages ?. Don?t tell some customer asked for this to make the command 
more resilient ?
 
 
Cheers,
 
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/08abb202/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep  8 17:10:59 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 8 Sep 2020 17:10:59 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
Message-ID: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>

On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-) ?As I understand the intent of 
> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist. ?Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed. ?If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure. ?It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Tue Sep  8 18:37:59 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 13:37:59 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch><OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
Message-ID: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>

I think it is incorrect to assume that a command that continues after 
detecting the working directory has been removed is going to cause damage 
to the file system.  Further, there is no a priori means to confirm if the 
lack of a working directory will cause the command to fail.  I will agree 
that there may be admins that would prefer the command fail fast and allow 
them to restart the command anew, but I suspect there are admins that 
prefer the command press ahead in hopes that it can complete successfully 
and not require another execution.  I'm sure we can conjure scenarios that 
support both points of view.  Perhaps what is desired is a message that 
more clearly describes what is being undertaken.  For example, "The 
current working directory, <directory_name>, no longer exists.  Execution 
continues."

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To:     gpfsug-discuss at spectrumscale.org
Date:   09/08/2020 12:10 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-)  As I understand the intent of 

> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist.  Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed.  If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure.  It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/684aa7cb/attachment.htm>

From ewahl at osc.edu  Tue Sep  8 23:46:08 2020
From: ewahl at osc.edu (Wahl, Edward)
Date: Tue, 8 Sep 2020 22:46:08 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
	run a word count
Message-ID: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/a4b02cc5/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Wed Sep  9 12:02:53 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 9 Sep 2020 12:02:53 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>

On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> I think it is incorrect to assume that a command that continues
> after detecting the working directory has been removed is going to
> cause damage to the file system.

No I am not assuming it will cause damage. I am making the fairly 
reasonable assumption that any command which fails has an increased 
probability of causing damage to the file system over one that completes 
successfully.

> Further, there is no a priori means to confirm if the lack of a 
> working directory will cause the command to fail.

Which is why baling out is a more sensible default that ploughing on
regardless.

> I will agree that there may be admins that would prefer the command 
> fail fast and allow them to restart the command anew, but I suspect 
> there are admins that prefer the command press ahead in hopes that
> it can complete successfully and not require another execution.

I am sure that there are inexperienced admins who have yet to be battle 
scared that would want such reckless default behaviour. Pandering to 
their naivety is not a sensible approach IMHO.

The downside if a large file system (and production GPFS file systems 
tend to be large) going "puff" is so massive that the precaution 
principle should apply.

One wonders if we are seeing the difference between a US and European 
mindset here.

> I'm sure we can conjure scenarios that support both points of view.
> Perhaps what is desired is a message that more clearly describes what
> is being undertaken.  For example, "The current working directory, 
> <directory_name>, no longer exists.  Execution continues."
> 

That is what --force is for. If you are sufficiently reckless that you 
want something to continue in the event of a possible error you have the 
option to stick that on every command you run. Meanwhile the sane admins 
get a system that defaults to proceeding in the safer manner possible.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Wed Sep  9 15:04:27 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Wed, 9 Sep 2020 07:04:27 -0700
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion>

On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> > I think it is incorrect to assume that a command that continues
> > after detecting the working directory has been removed is going to
> > cause damage to the file system.
> 
> No I am not assuming it will cause damage. I am making the fairly reasonable
> assumption that any command which fails has an increased probability of
> causing damage to the file system over one that completes successfully.

I think there is another angle here, which is that this command's output
has the possibility of triggering an "oh ----" (fill in your preferred
colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
reaction has the possibility of causing its own problems (i.e. not sure if
mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
I'm with Jonathan here: the command should fail with an informative
message, and the admin can correct the problem (just cd somewhere else).

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From carlz at us.ibm.com  Thu Sep 10 13:55:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Thu, 10 Sep 2020 12:55:25 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com>

Jonathan,

Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish.

I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.)

Thanks,


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1291474181]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment.png>

From cblack at nygenome.org  Thu Sep 10 16:55:46 2020
From: cblack at nygenome.org (Christopher Black)
Date: Thu, 10 Sep 2020 15:55:46 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <EF9A5AAF-B759-4C76-9DF0-BD79A7AEBA08@nygenome.org>

We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools.
We see no grep hits for ?not trust? in our last few sklm_audit.log files.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, September 8, 2020 at 7:10 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/aafe8398/attachment.htm>

From ulmer at ulmer.org  Fri Sep 11 15:25:55 2020
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Fri, 11 Sep 2020 10:25:55 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>


> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu> wrote:
> 
> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>> I think it is incorrect to assume that a command that continues
>>> after detecting the working directory has been removed is going to
>>> cause damage to the file system.
>> 
>> No I am not assuming it will cause damage. I am making the fairly reasonable
>> assumption that any command which fails has an increased probability of
>> causing damage to the file system over one that completes successfully.
> 
> I think there is another angle here, which is that this command's output
> has the possibility of triggering an "oh ----" (fill in your preferred
> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
> reaction has the possibility of causing its own problems (i.e. not sure if
> mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
> I'm with Jonathan here: the command should fail with an informative
> message, and the admin can correct the problem (just cd somewhere else).
> 

I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way.

It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens.

Maybe I?ve got a blind spot here...

-- 
Stephen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/9b065b8b/attachment.htm>

From eric.wonderley at vt.edu  Fri Sep 11 19:47:52 2020
From: eric.wonderley at vt.edu (J. Eric Wonderley)
Date: Fri, 11 Sep 2020 14:47:52 -0400
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>

We have spectrum archive with encryption on disk and tape.   We get maybe a
100 or so messages like this daily.  It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.  The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.  I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:

>  Ran into something a good while back and I'm curious how many others this
> affects.   If folks with encryption enabled could run a quick word count on
> their SKLM server and reply with a rough count I'd appreciate it.
> I've gone round and round with IBM SKLM support over the last year on this
> and it just has me wondering.  This is one of those "morbidly curious about
> making the sausage" things.
>
> Looking to see if this is a normal error message folks are seeing.  Just
> find your daily, rotating audit log and search it.  I'll trust most folks
> to figure this out, but let me know if you need help.
> Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
> If you are on a normal linux box try something like:  "locate
> sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
> certificate" {} |wc "  or whatever works for you.   If your audit log is
> fairly fresh, you might want to check the previous one.   I do NOT need
> exact information, just 'yeah we get 12million out a 500MB file' or ' we
> get zero', or something like that.
>
>  Mostly I'm curious if folks get zero, or a large number.  I've got my
> logs adjusted to 500MB and I get 8 digit numbers out of the previous log.
> Yet things work perfectly.    I've talked to two other SS sites I know the
> admins personally, and they get larger numbers than I do. But it's such a
> tiny sample size! LOL
>
> Ed Wahl
> Ohio Supercomputer Center
>
> Apologies for the message formatting issues.  Outlook fought tooth and
> nail against sending it with the path as is, and kept breaking my
> paragraphs.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/ad87789d/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 11 20:53:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 11 Sep 2020 20:53:45 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
	<3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk>

On 11/09/2020 15:25, Stephen Ulmer wrote:
> 
>> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu 
>> <mailto:skylar2 at uw.edu>> wrote:
>>
>> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>>> I think it is incorrect to assume that a command that continues
>>>> after detecting the working directory has been removed is going to
>>>> cause damage to the file system.
>>>
>>> No I am not assuming it will cause damage. I am making the fairly 
>>> reasonable
>>> assumption that any command which fails has an increased probability of
>>> causing damage to the file system over one that completes successfully.
>>
>> I think there is another angle here, which is that this command's output
>> has the possibility of triggering an "oh ----" (fill in your preferred
>> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
>> reaction has the possibility of causing its own problems (i.e. not sure if
>> mmafmctl touches CCR, but aborting it midway could leave CCR 
>> inconsistent).
>> I'm with Jonathan here: the command should fail with an informative
>> message, and the admin can correct the problem (just cd somewhere else).
>>
> 
> I?m now (genuinely) curious as to?what?Spectrum Scale commands 
> *actually* depend on the working directory existing and why. They 
> shouldn?t depend on anything but existing well-known directories (logs, 
> SDR, /tmp, et cetera) and any file or directories passed as arguments to 
> the command. This is the Unix way.
> 
> It seems like the *right* solution is to armor commands against doing 
> something ?bad? if they lose a resource required to complete their task. 
> If $PWD goes away because an admin?s home goes away in the middle of a 
> long restripe, it?s better to complete the work and let them look in the 
> logs. It's not Scale?s problem if something not affecting its work happens.
 >
 > Maybe I?ve got a blind spot here...
 >

This jogged my memory that best practice would be to have a call to 
chdir to set the working directory to "/" very early on. Before anything 
critical is started.

I am 99.999% sure that its covered in Steven's (can't check as I am away 
for the weekend) so really there is no excuse. If / goes away then 
really really bad things have happened and it all sort of becomes moot 
anyway.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Mon Sep 14 06:27:58 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 14 Sep 2020 13:27:58 +0800
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
	<CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
Message-ID: <OF1657786B.80F2927F-ON482585E3.001DBE52-482585E3.001E06DF@notes.na.collabserv.com>


Hi Eric,

Please help me to understand your question. You have Spectrum Archive and
Spectrum Scale in your system, and both of them are connected to IBM SKLM
for encryption. Now you got lots of error/warning message from SKLM log.
Now you want to understand which component, Scale or Archive, makes the
SKLM print those error message, right?

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	"J. Eric Wonderley" <eric.wonderley at vt.edu>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	2020/09/12 02:47
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Request for folks using
            encryption on SKLM, run a word count
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We have spectrum archive with encryption on disk and tape.? ?We get maybe a
100 or so messages like this daily.? It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.? The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.? I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:
  ?Ran into something a good while back and I'm curious how many others
  this affects.?? If folks with encryption enabled could run a quick word
  count on their SKLM server and reply with a rough count I'd
  appreciate?it.
  I've gone round and round with IBM SKLM support over the last year on
  this and it just has me wondering.? This is one of those "morbidly
  curious about making the sausage" things.

  Looking to see if this is a normal error message folks are seeing.? Just
  find your daily, rotating audit log and search it.? I'll trust most folks
  to figure this?out, but let me know if you need help.
  Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
  If you are on a normal linux box try something like:? "locate
  sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
  certificate" {} |wc "? or whatever works for you.?? If your audit log is
  fairly fresh, you might want to check the previous one.?? I do NOT need
  exact information, just 'yeah we get 12million out a 500MB file' or ' we
  get zero', or something like that.

  ?Mostly I'm curious if folks get zero, or a large number.? I've got my
  logs adjusted to 500MB and I get 8 digit numbers out of the previous
  log.?? Yet things work perfectly.??? I've talked to two other SS sites I
  know the admins personally, and they get larger numbers than I do. But
  it's such a tiny sample size! LOL

  Ed Wahl
  Ohio Supercomputer Center

  Apologies for the message formatting issues.? Outlook fought tooth and
  nail against sending it with the path as is, and kept breaking my
  paragraphs.
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment.gif>

From u.sibiller at science-computing.de  Mon Sep 14 13:09:12 2020
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 14 Sep 2020 14:09:12 +0200
Subject: [gpfsug-discuss] tsgskkm stuck
In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
Message-ID: <b8b25b2f-1738-0848-84f4-ee10337736de@science-computing.de>

On 8/28/20 11:43 AM, Philipp Helo Rehs wrote:
> root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
> /usr/lpp/mmfs/bin/tsgskkm store --cert
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off

Judging from the command line tsgskkm will generate a certificate which normally involves a random
number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying
to generate some I/O on the node. Or run something like haveged
(https://wiki.archlinux.org/index.php/Haveged).

Uli


-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From S.J.Thompson at bham.ac.uk  Fri Sep 18 11:52:51 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 18 Sep 2020 10:52:51 +0000
Subject: [gpfsug-discuss] SSUG::Digital inode management,
 VCPU scaling and considerations for NUMA
Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk>

Number 5 in the SSUG::Digital talks set takes place 22 September 2020


Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads.
This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.


https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2126 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment.ics>

From joe at excelero.com  Fri Sep 18 13:38:51 2020
From: joe at excelero.com (joe at excelero.com)
Date: Fri, 18 Sep 2020 07:38:51 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16
Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment.bin>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:11:31 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:11:31 +0000
Subject: [gpfsug-discuss] CCR errors
Message-ID: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/f163918c/attachment.htm>

From novosirj at rutgers.edu  Sat Sep 19 21:23:01 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 20:23:01 +0000
Subject: [gpfsug-discuss] CCR errors
In-Reply-To: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76@rutgers.edu>

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0b8d7715/attachment.htm>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:52:19 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:52:19 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0ee97239/attachment.htm>

From novosirj at rutgers.edu  Sun Sep 20 00:45:41 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 23:45:41 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>,
	<DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07@rutgers.edu>

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/4f220e5f/attachment.htm>

From oluwasijibomi.saula at ndsu.edu  Sun Sep 20 00:59:28 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 23:59:28 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19
In-Reply-To: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324A4635EA5E11FA45651BD983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

I appreciate your support - I finally got some on a WebEx now.

I'll share any useful information I glean from the session.

Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator / Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu>

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 6:45:47 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 19

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: gpfsug-discuss Digest, Vol 104, Issue 18
      (Saula, Oluwasijibomi)
   2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:52:19 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID:
        <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0ee97239/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 23:45:41 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/4f220e5f/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 19
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/944bee8d/attachment.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 09:35:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 08:35:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>

Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/59795728/attachment.htm>

From olaf.weiser at de.ibm.com  Mon Sep 21 10:55:29 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 21 Sep 2020 09:55:29 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
Message-ID: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/83888a08/attachment.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 11:32:25 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 10:32:25 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>

Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,

I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.


I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).

I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).


Any suggestion ?


Thanks in advance,


   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/599979c4/attachment.htm>

From vpuvvada at in.ibm.com  Mon Sep 21 11:57:30 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Mon, 21 Sep 2020 16:27:30 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?=
 =?utf-8?q?l=09inflight?=
In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch><OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
Message-ID: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>

tspcacheutil <file path>, this command provides information about the 
file's replication state. You can also run policy to find these files. 

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt 
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated 
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are 
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For 
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory 
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:   "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed 
file is still   inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Information reported by that command (both at cache and home side) are 
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a 
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, 
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help 
us to know if AFM considers the file synced to home or not yet.
 
   Alvise
 
Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser 
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still 
inflight
 
do you looking fo smth like this:
mmafmlocal ls filename    or stat filename 
 
 
----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is 
still inflight
Date: Mon, Sep 21, 2020 10:45 AM
 

Dear GPFS users,
I know that through a policy one can know if a file is still being 
transferred from the cache to your home by AFM.
 
I wonder if there is another method @cache or @home side, faster and less 
invasive (a policy, as far as I know, can put some pressure on the system 
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a 
flags field that can show several things, like compression status, 
archive, etc).
 
Any suggestion ?
 
Thanks in advance,
 
   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/407c76cf/attachment.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 12:17:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 11:17:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
	<OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch>

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

tspcacheutil <file path>, this command provides information about the file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/62d55b7e/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep 22 10:18:05 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 22 Sep 2020 10:18:05 +0100
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>


I have a question about using RPM's for the portability interface on 
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some 
machines on  Skylake and some on Sandy Bridge but all running on say 
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc. 
and not variations with the x86-64, ppc class? Assuming some minimum 
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same 
on every machine regardless of whether it's Skylake or Sandy Bridge, or 
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability 
interfaces. Would it help to generate the portability layer RPM's on a 
Sandy Bridge machine and work no the presumption anything that runs on 
Sandy Bridge will run on Skylake?


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Tue Sep 22 11:47:46 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 22 Sep 2020 10:47:46 +0000
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk>

We've always taken it to mean ..

RHEL != CentOS
7.1 != 7.2 (though mostly down to the kernel).
ppc64le != x86_64

But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __

Simon

?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:


    I have a question about using RPM's for the portability interface on 
    different CPU's.

    According to /usr/lpp/mmfs/src/README

        The generated RPM can ONLY be deployed to the machine with
        identical architecture, distribution level, Linux kernel version
        and GPFS version.

    So does this mean that if I have a heterogeneous cluster with some 
    machines on  Skylake and some on Sandy Bridge but all running on say 
    RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
    two CPU's?

    Or when it says "identical architecture" does it mean x86-64, ppc etc. 
    and not variations with the x86-64, ppc class? Assuming some minimum 
    level is met.

    Obviously the actual Linux kernel being stock RedHat would be the same 
    on every machine regardless of whether it's Skylake or Sandy Bridge, or 
    even for that matter an AMD processor.

    Consequently it seems strange that I would need different portability 
    interfaces. Would it help to generate the portability layer RPM's on a 
    Sandy Bridge machine and work no the presumption anything that runs on 
    Sandy Bridge will run on Skylake?


    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From skylar2 at uw.edu  Tue Sep 22 14:50:34 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 22 Sep 2020 06:50:34 -0700
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <20200922135034.6be42ykveio654sm@thargelion>

We've used the same built RPMs (generally built on Intel) on Intel and AMD
x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and
haven't run into any problems.

On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote:
> 
> I have a question about using RPM's for the portability interface on
> different CPU's.
> 
> According to /usr/lpp/mmfs/src/README
> 
>    The generated RPM can ONLY be deployed to the machine with
>    identical architecture, distribution level, Linux kernel version
>    and GPFS version.
> 
> So does this mean that if I have a heterogeneous cluster with some machines
> on  Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all
> using GPFS 5.0.5 I have to have different RPM's for the two CPU's?
> 
> Or when it says "identical architecture" does it mean x86-64, ppc etc. and
> not variations with the x86-64, ppc class? Assuming some minimum level is
> met.
> 
> Obviously the actual Linux kernel being stock RedHat would be the same on
> every machine regardless of whether it's Skylake or Sandy Bridge, or even
> for that matter an AMD processor.
> 
> Consequently it seems strange that I would need different portability
> interfaces. Would it help to generate the portability layer RPM's on a Sandy
> Bridge machine and work no the presumption anything that runs on Sandy
> Bridge will run on Skylake?
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From truongv at us.ibm.com  Tue Sep 22 16:47:09 2020
From: truongv at us.ibm.com (Truong Vu)
Date: Tue, 22 Sep 2020 11:47:09 -0400
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>


You are correct, the "identical architecture" means the same machine
hardware name as shown by the -m option of the uname command.

Thanks,
Tru.


From:	gpfsug-discuss-request at spectrumscale.org
To:	gpfsug-discuss at spectrumscale.org
Date:	09/22/2020 05:18 AM
Subject:	[EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
		 gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

or, via email, send a message with subject or body 'help' to
		 gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
		 gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Checking if a AFM-managed file is still		 inflight
      (Dorigo Alvise (PSI))
   2. Portability interface (Jonathan Buzzard)


----------------------------------------------------------------------

Message: 1
Date: Mon, 21 Sep 2020 11:17:35 +0000
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
		 inflight
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch>
Content-Type: text/plain; charset="utf-8"

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R
Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

tspcacheutil <file path>, this command provides information about the
file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed
file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy,
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help
us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is
still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being
transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less
invasive (a policy, as far as I know, can put some pressure on the system
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a
flags field that can show several things, like compression status, archive,
etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html
 >

------------------------------

Message: 2
Date: Tue, 22 Sep 2020 10:18:05 +0100
From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed


I have a question about using RPM's for the portability interface on
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some
machines on  Skylake and some on Sandy Bridge but all running on say
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc.
and not variations with the x86-64, ppc class? Assuming some minimum
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same
on every machine regardless of whether it's Skylake or Sandy Bridge, or
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability
interfaces. Would it help to generate the portability layer RPM's on a
Sandy Bridge machine and work no the presumption anything that runs on
Sandy Bridge will run on Skylake?


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 23
***********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment.gif>

From jonathan.buzzard at strath.ac.uk  Wed Sep 23 15:57:00 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 23 Sep 2020 15:57:00 +0100
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
	<OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk>

On 22/09/2020 16:47, Truong Vu wrote:
> You are correct, the "identical architecture" means the same machine 
> hardware name as shown by the -m option of the uname command.
> 

Thanks for clearing that up. It just seemed something of a blindly 
obvious statement; surely nobody would expect an RPM for an Intel based 
machine to install on a PowerPC machine? that I though it might be 
referring to something else.

I mean you can't actually install an x86_64 RPM on a ppc64le machine as 
the rpm command will bomb out telling you it is from an incompatible 
architecture if you try. It's why you have noarch packages which can be 
installed on anything.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep 25 16:53:12 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 25 Sep 2020 15:53:12 +0000
Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes
 and OpenShift environments with Spectrum Scale
Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk>

Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access.


https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/


SSUG Host:

Bill Anderson


Speakers:

Smita Raut (IBM)

Harald Seipp (IBM)

Renar Grunenberg

Simon Thompson


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2233 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment.ics>

From joe at excelero.com  Sat Sep 26 16:43:15 2020
From: joe at excelero.com (joe at excelero.com)
Date: Sat, 26 Sep 2020 10:43:15 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27
Message-ID: <d7e50f17-d953-48e2-95e0-7ef2147a5f06@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment.bin>

From NISHAAN at za.ibm.com  Mon Sep 28 09:09:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Mon, 28 Sep 2020 10:09:29 +0200
Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon
 S3 DNS-style (Virtual hosted) Bucket Addressing
Message-ID: <OF8B87F384.2E814734-ON422585F1.00293165-422585F1.002CD05E@notes.na.collabserv.com>


Hi All

I need to find out if anyone has successfully been able to get our
Openstack Swift implementation of the object protocol to support the AWS
DNS-syle bucket naming convention. See here for an explanation
https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html.

AWS DNS-style bucket naming includes the bucket in the DNS name (eg.
mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style
bucket naming (eg. https://swift-cluster.example.com/v1/my_account/
container/object).

>From what I can tell, I need to enable the domain_remap function in the
proxy-server.conf file and also statically resolve the DNS name to a
specific bucket by inserting the correct AUTH account.

See here for the domain_remap middleware explanation..

https://docs.openstack.org/swift/latest/middleware.html

And here for additional DNS work that needs to be done..

https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/

Obviously a wildcard DNS server is required for this which is easy enough
to implement. However, the steps for Openstack Swift to support this are
not very clear. I'm hoping someone else went through the pain of figuring
this out already :)


Any help with this would be greatly appreciated!


Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 18044196.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment.jpg>

From xhejtman at ics.muni.cz  Wed Sep 30 22:52:39 2020
From: xhejtman at ics.muni.cz (Lukas Hejtmanek)
Date: Wed, 30 Sep 2020 23:52:39 +0200
Subject: [gpfsug-discuss] put_cred bug
Message-ID: <20200930215239.GU1440758@ics.muni.cz>

Hello,

is this bug already resolved? 

https://access.redhat.com/solutions/3132971

I think, I'm seeing it even with latest gpfs 5.0.5.2

[1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256

maybe also related:
[ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168
[ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP


-- 
Luk?? Hejtm?nek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title


From chair at spectrumscale.org  Tue Sep  1 09:17:12 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 01 Sep 2020 09:17:12 +0100
Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File
 Create and MMAP performance
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2596 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment-0001.ics>

From joe at excelero.com  Tue Sep  1 14:39:47 2020
From: joe at excelero.com (joe at excelero.com)
Date: Tue, 1 Sep 2020 08:39:47 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1
Message-ID: <f57736a4-39c1-425a-be69-ec714e129232@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment-0001.bin>

From russell at nordquist.info  Wed Sep  2 15:38:35 2020
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 2 Sep 2020 10:38:35 -0400
Subject: [gpfsug-discuss] data replicas and metadata space used
Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>

I was reading this slide deck on GPFS metadata sizing and I ran across something
http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf

On slide 51 it says

"Max replicas for Data, multiplies the MD capacity used

? Reserves space in MD for the replicas even if no files replicated!?

This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise.

This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? 

thanks
Russell


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/47f9f6ec/attachment-0001.htm>

From giovanni.bracco at enea.it  Wed Sep  2 21:28:55 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Wed, 2 Sep 2020 22:28:55 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>

I am curious to know about AMD epyc support by GPFS: what is the status?
Giovanni Bracco

On 28/08/20 14:25, Frederick Stock wrote:
> Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) 
> processors.? You may want to open a help case to determine the cause of 
> this problem.
> Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 
> so you may want to consider upgrading your cluster.? And should Scale 
> officially support the AMD epyc processor it would not be on Scale 4.2.x.
> 
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
> 
>     ----- Original message -----
>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>     Cc:
>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>     Date: Fri, Aug 28, 2020 5:52 AM
>     Hello,
> 
>     we have a gpfs v4 cluster running with 4 nsds and i am trying to add
>     some clients:
> 
>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
> 
>     this commands hangs and do not finish
> 
>     When i look into the server, i can see the following processes which
>     never finish:
> 
>     root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3
>     lc/setupClient
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>     root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2
>     21479
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>     0 1191
>     root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
> 
>     The node is an AMD epyc.
> 
>     Any idea what could cause the issue?
> 
>     ssh is possible in both directions and firewall is disabled.
> 
> 
>     Kind regards
> 
>      ?Philipp Rehs
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Wed Sep  2 23:28:34 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Wed, 2 Sep 2020 22:28:34 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>,
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de><OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/5abc3c57/attachment-0001.htm>

From knop at us.ibm.com  Thu Sep  3 05:00:38 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Thu, 3 Sep 2020 04:00:38 +0000
Subject: [gpfsug-discuss] data replicas and metadata space used
In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>
Message-ID: <OF587C92A5.A09D17DC-ON002585D8.00149AE1-002585D8.001607EE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/5307b0b8/attachment-0001.htm>

From giovanni.bracco at enea.it  Thu Sep  3 08:44:29 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Thu, 3 Sep 2020 09:44:29 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>

OK from client side, but I would like to know if the same is also for 
NSD servers with AMD EPYC, do they operate with good performance 
compared to Intel CPUs?

Giovanni

On 03/09/20 00:28, Andrew Beattie wrote:
> Giovanni,
> I have clients in Australia that are running AMD ROME processors in 
> their Visualisation nodes connected to scale 5.0.4 clusters with no issues.
> Spectrum Scale doesn't differentiate between x86 processor technologies 
> -- it only looks at x86_64 (OS support more than anything else)
> Andrew Beattie
> File and Object Storage Technical Specialist - A/NZ
> IBM Systems - Storage
> Phone: 614-2133-7927
> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
> 
>     ----- Original message -----
>     From: Giovanni Bracco <giovanni.bracco at enea.it>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>     Frederick Stock <stockf at us.ibm.com>
>     Cc:
>     Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>     about AMD epyc support in GPFS?
>     Date: Thu, Sep 3, 2020 7:29 AM
>     I am curious to know about AMD epyc support by GPFS: what is the status?
>     Giovanni Bracco
> 
>     On 28/08/20 14:25, Frederick Stock wrote:
>      > Not sure that Spectrum Scale has stated it supports the AMD epyc
>     (Rome?)
>      > processors.? You may want to open a help case to determine the
>     cause of
>      > this problem.
>      > Note that Spectrum Scale 4.2.x goes out of service on September
>     30, 2020
>      > so you may want to consider upgrading your cluster.? And should Scale
>      > officially support the AMD epyc processor it would not be on
>     Scale 4.2.x.
>      >
>      > Fred
>      > __________________________________________________
>      > Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>      > stockf at us.ibm.com
>      >
>      > ? ? ----- Original message -----
>      > ? ? From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>      > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      > ? ? To: gpfsug main discussion list
>     <gpfsug-discuss at spectrumscale.org>
>      > ? ? Cc:
>      > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>      > ? ? Date: Fri, Aug 28, 2020 5:52 AM
>      > ? ? Hello,
>      >
>      > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying
>     to add
>      > ? ? some clients:
>      >
>      > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>      >
>      > ? ? this commands hangs and do not finish
>      >
>      > ? ? When i look into the server, i can see the following
>     processes which
>      > ? ? never finish:
>      >
>      > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>     checkNewClusterNode3
>      > ? ? lc/setupClient
>      >    
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>      > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>      > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>     setupClient 2
>      > ? ? 21479
>      >    
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>      > ? ? 0 1191
>      > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>      > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>      >
>      > ? ? The node is an AMD epyc.
>      >
>      > ? ? Any idea what could cause the issue?
>      >
>      > ? ? ssh is possible in both directions and firewall is disabled.
>      >
>      >
>      > ? ? Kind regards
>      >
>      > ? ? ??Philipp Rehs
>      >
>      >
>      > ? ? _______________________________________________
>      > ? ? gpfsug-discuss mailing list
>      > ? ? gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
>      >
>      >
>      > _______________________________________________
>      > gpfsug-discuss mailing list
>      > gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
> 
>     --
>     Giovanni Bracco
>     phone ?+39 351 8804788
>     E-mail ?giovanni.bracco at enea.it
>     WWW http://www.afs.enea.it/bracco
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Thu Sep  3 09:10:38 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 3 Sep 2020 08:10:38 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>
Message-ID: <OF686646EA.945202B1-ON002585D8.002CEB1A-1599120638761@notes.na.collabserv.com>


I don?t currently have any x86 based servers to do that kind of performance
testing,

But the PCI-Gen 4 advantages alone mean that the AMD server options have
significant benefits over current Intel processor platforms.

There are however limited storage controllers and Network adapters that can
help utilise the full benefits of PCI-gen4.

In terms of NSD architecture there are many variables that you also have to
take into consideration.

Are you looking at storage rich servers?
Are you looking at SAN attached Flash
Are you looking at scale ECE type deployment?

As an IBM employee and someone familiar with ESS 5000, and the
differences / benefits of the 5K architecture,
Unless your planning on building a Scale ECE type cluster with AMD
processors, storage class memory, and NVMe flash modules.  I would
seriously consider the ESS 5k over an x86 based NL-SAS storage topology
Including AMD.


Sent from my iPhone

> On 3 Sep 2020, at 17:44, Giovanni Bracco <giovanni.bracco at enea.it> wrote:
>
> ?OK from client side, but I would like to know if the same is also for
> NSD servers with AMD EPYC, do they operate with good performance
> compared to Intel CPUs?
>
> Giovanni
>
>> On 03/09/20 00:28, Andrew Beattie wrote:
>> Giovanni,
>> I have clients in Australia that are running AMD ROME processors in
>> their Visualisation nodes connected to scale 5.0.4 clusters with no
issues.
>> Spectrum Scale doesn't differentiate between x86 processor technologies
>> -- it only looks at x86_64 (OS support more than anything else)
>> Andrew Beattie
>> File and Object Storage Technical Specialist - A/NZ
>> IBM Systems - Storage
>> Phone: 614-2133-7927
>> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
>>
>>    ----- Original message -----
>>    From: Giovanni Bracco <giovanni.bracco at enea.it>
>>    Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>>    Frederick Stock <stockf at us.ibm.com>
>>    Cc:
>>    Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>>    about AMD epyc support in GPFS?
>>    Date: Thu, Sep 3, 2020 7:29 AM
>>    I am curious to know about AMD epyc support by GPFS: what is the
status?
>>    Giovanni Bracco
>>
>>>    On 28/08/20 14:25, Frederick Stock wrote:
>>> Not sure that Spectrum Scale has stated it supports the AMD epyc
>>    (Rome?)
>>> processors.  You may want to open a help case to determine the
>>    cause of
>>> this problem.
>>> Note that Spectrum Scale 4.2.x goes out of service on September
>>    30, 2020
>>> so you may want to consider upgrading your cluster.  And should Scale
>>> officially support the AMD epyc processor it would not be on
>>    Scale 4.2.x.
>>>
>>> Fred
>>> __________________________________________________
>>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>>> stockf at us.ibm.com
>>>
>>>     ----- Original message -----
>>>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>>>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>     To: gpfsug main discussion list
>>    <gpfsug-discuss at spectrumscale.org>
>>>     Cc:
>>>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>>>     Date: Fri, Aug 28, 2020 5:52 AM
>>>     Hello,
>>>
>>>     we have a gpfs v4 cluster running with 4 nsds and i am trying
>>    to add
>>>     some clients:
>>>
>>>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>>>
>>>     this commands hangs and do not finish
>>>
>>>     When i look into the server, i can see the following
>>    processes which
>>>     never finish:
>>>
>>>     root     38138  0.0  0.0 123048 10376 ?        Ss   11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>>    checkNewClusterNode3
>>>     lc/setupClient
>>>
>>
%%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:

>>>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>>>     root     38169  0.0  0.0 123564 10892 ?        S    11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>>    setupClient 2
>>>     21479
>>>
>>
1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191

>>>     0 1191
>>>     root     38212  100  0.0  35544  5752 ?        R    11:32   9:40
>>>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>>>
>>>     The node is an AMD epyc.
>>>
>>>     Any idea what could cause the issue?
>>>
>>>     ssh is possible in both directions and firewall is disabled.
>>>
>>>
>>>     Kind regards
>>>
>>>       Philipp Rehs
>>>
>>>
>>>     _______________________________________________
>>>     gpfsug-discuss mailing list
>>>     gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>
>>    --
>>    Giovanni Bracco
>>    phone  +39 351 8804788
>>    E-mail  giovanni.bracco at enea.it
>>    WWW
http://www.afs.enea.it/bracco

>>    _______________________________________________
>>    gpfsug-discuss mailing list
>>    gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>
> --
> Giovanni Bracco
> phone  +39 351 8804788
> E-mail  giovanni.bracco at enea.it
> WWW
http://www.afs.enea.it/bracco

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/c6c95685/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep  4 08:56:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 4 Sep 2020 08:56:41 +0100
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>

On 02/09/2020 23:28, Andrew Beattie wrote:
> Giovanni, I have clients in Australia that are running AMD ROME
> processors in their Visualisation nodes connected to scale 5.0.4
> clusters with no issues. Spectrum Scale doesn't differentiate between
> x86 processor technologies -- it only looks at x86_64 (OS support
> more than anything else) 

While true bear in mind their are limits on the number of cores that it 
might be quite easy to pass on a high end multi CPU AMD machine :-)

See question 5.3

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

192 is the largest tested limit for the number of cores and there is a 
hard limit at 1536 cores.

 From memory these limits are lower in older versions of GPFS.So I think 
the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
release), but works just fine on 80 cores as far as I can tell.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep  4 10:02:29 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 4 Sep 2020 09:02:29 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
	<06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk>

Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet):

https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/

... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.

Simon

?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    On 02/09/2020 23:28, Andrew Beattie wrote:
    > Giovanni, I have clients in Australia that are running AMD ROME
    > processors in their Visualisation nodes connected to scale 5.0.4
    > clusters with no issues. Spectrum Scale doesn't differentiate between
    > x86 processor technologies -- it only looks at x86_64 (OS support
    > more than anything else) 

    While true bear in mind their are limits on the number of cores that it 
    might be quite easy to pass on a high end multi CPU AMD machine :-)

    See question 5.3

    https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

    192 is the largest tested limit for the number of cores and there is a 
    hard limit at 1536 cores.

     From memory these limits are lower in older versions of GPFS.So I think 
    the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
    release), but works just fine on 80 cores as far as I can tell.

    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oluwasijibomi.saula at ndsu.edu  Fri Sep  4 17:03:17 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Fri, 4 Sep 2020 16:03:17 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
Message-ID: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello GPFS Experts,

Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers?

I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing.

Thoughts?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200904/a7aa2a9e/attachment-0001.htm>

From heinrich.billich at id.ethz.ch  Mon Sep  7 14:29:59 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 7 Sep 2020 13:29:59 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>

Hi,

just came across this:

/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..

It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ?


Cheers,

Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200907/e5c19e73/attachment-0001.htm>

From knop at us.ibm.com  Tue Sep  8 04:09:07 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Tue, 8 Sep 2020 03:09:07 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
In-Reply-To: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <OF812C6C5F.2B36DE84-ON002585DD.0010FEB9-002585DD.00115081@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/09a7cf74/attachment-0001.htm>

From scale at us.ibm.com  Tue Sep  8 14:04:26 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 09:04:26 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
Message-ID: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>

I think a better metaphor is that the bridge we just crossed has collapsed 
and as long as we do not need to cross it again our journey should reach 
its intended destination :-)  As I understand the intent of this message 
is to alert the user (and our support teams) that the directory from which 
a command was executed no longer exist.  Should that be of consequence to 
the execution of the command then failure is not unexpected, however, many 
commands do not make use of the current directory so they likely will 
succeed.  If you consider the view point of a command failing because the 
working directory was removed, but not knowing that was the root cause, I 
think you can see why this message was added into the administration 
infrastructure.  It allows this odd failure scenario to be quickly 
recognized saving time for both the user and IBM support, in tracking down 
the root cause.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/07/2020 09:29 AM
Subject:        [EXTERNAL] [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
 
just came across this:
 
/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..
 
It?s like a bus driver telling you that the brakes don?t work and next 
speeding up even more. Honestly, why not just fail with a nice error 
messages ?. Don?t tell some customer asked for this to make the command 
more resilient ?
 
 
Cheers,
 
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/08abb202/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep  8 17:10:59 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 8 Sep 2020 17:10:59 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
Message-ID: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>

On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-) ?As I understand the intent of 
> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist. ?Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed. ?If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure. ?It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Tue Sep  8 18:37:59 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 13:37:59 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch><OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
Message-ID: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>

I think it is incorrect to assume that a command that continues after 
detecting the working directory has been removed is going to cause damage 
to the file system.  Further, there is no a priori means to confirm if the 
lack of a working directory will cause the command to fail.  I will agree 
that there may be admins that would prefer the command fail fast and allow 
them to restart the command anew, but I suspect there are admins that 
prefer the command press ahead in hopes that it can complete successfully 
and not require another execution.  I'm sure we can conjure scenarios that 
support both points of view.  Perhaps what is desired is a message that 
more clearly describes what is being undertaken.  For example, "The 
current working directory, <directory_name>, no longer exists.  Execution 
continues."

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To:     gpfsug-discuss at spectrumscale.org
Date:   09/08/2020 12:10 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-)  As I understand the intent of 

> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist.  Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed.  If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure.  It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/684aa7cb/attachment-0001.htm>

From ewahl at osc.edu  Tue Sep  8 23:46:08 2020
From: ewahl at osc.edu (Wahl, Edward)
Date: Tue, 8 Sep 2020 22:46:08 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
	run a word count
Message-ID: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/a4b02cc5/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Wed Sep  9 12:02:53 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 9 Sep 2020 12:02:53 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>

On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> I think it is incorrect to assume that a command that continues
> after detecting the working directory has been removed is going to
> cause damage to the file system.

No I am not assuming it will cause damage. I am making the fairly 
reasonable assumption that any command which fails has an increased 
probability of causing damage to the file system over one that completes 
successfully.

> Further, there is no a priori means to confirm if the lack of a 
> working directory will cause the command to fail.

Which is why baling out is a more sensible default that ploughing on
regardless.

> I will agree that there may be admins that would prefer the command 
> fail fast and allow them to restart the command anew, but I suspect 
> there are admins that prefer the command press ahead in hopes that
> it can complete successfully and not require another execution.

I am sure that there are inexperienced admins who have yet to be battle 
scared that would want such reckless default behaviour. Pandering to 
their naivety is not a sensible approach IMHO.

The downside if a large file system (and production GPFS file systems 
tend to be large) going "puff" is so massive that the precaution 
principle should apply.

One wonders if we are seeing the difference between a US and European 
mindset here.

> I'm sure we can conjure scenarios that support both points of view.
> Perhaps what is desired is a message that more clearly describes what
> is being undertaken.  For example, "The current working directory, 
> <directory_name>, no longer exists.  Execution continues."
> 

That is what --force is for. If you are sufficiently reckless that you 
want something to continue in the event of a possible error you have the 
option to stick that on every command you run. Meanwhile the sane admins 
get a system that defaults to proceeding in the safer manner possible.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Wed Sep  9 15:04:27 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Wed, 9 Sep 2020 07:04:27 -0700
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion>

On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> > I think it is incorrect to assume that a command that continues
> > after detecting the working directory has been removed is going to
> > cause damage to the file system.
> 
> No I am not assuming it will cause damage. I am making the fairly reasonable
> assumption that any command which fails has an increased probability of
> causing damage to the file system over one that completes successfully.

I think there is another angle here, which is that this command's output
has the possibility of triggering an "oh ----" (fill in your preferred
colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
reaction has the possibility of causing its own problems (i.e. not sure if
mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
I'm with Jonathan here: the command should fail with an informative
message, and the admin can correct the problem (just cd somewhere else).

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From carlz at us.ibm.com  Thu Sep 10 13:55:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Thu, 10 Sep 2020 12:55:25 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com>

Jonathan,

Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish.

I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.)

Thanks,


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1291474181]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment-0001.png>

From cblack at nygenome.org  Thu Sep 10 16:55:46 2020
From: cblack at nygenome.org (Christopher Black)
Date: Thu, 10 Sep 2020 15:55:46 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <EF9A5AAF-B759-4C76-9DF0-BD79A7AEBA08@nygenome.org>

We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools.
We see no grep hits for ?not trust? in our last few sklm_audit.log files.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, September 8, 2020 at 7:10 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/aafe8398/attachment-0001.htm>

From ulmer at ulmer.org  Fri Sep 11 15:25:55 2020
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Fri, 11 Sep 2020 10:25:55 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>


> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu> wrote:
> 
> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>> I think it is incorrect to assume that a command that continues
>>> after detecting the working directory has been removed is going to
>>> cause damage to the file system.
>> 
>> No I am not assuming it will cause damage. I am making the fairly reasonable
>> assumption that any command which fails has an increased probability of
>> causing damage to the file system over one that completes successfully.
> 
> I think there is another angle here, which is that this command's output
> has the possibility of triggering an "oh ----" (fill in your preferred
> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
> reaction has the possibility of causing its own problems (i.e. not sure if
> mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
> I'm with Jonathan here: the command should fail with an informative
> message, and the admin can correct the problem (just cd somewhere else).
> 

I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way.

It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens.

Maybe I?ve got a blind spot here...

-- 
Stephen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/9b065b8b/attachment-0001.htm>

From eric.wonderley at vt.edu  Fri Sep 11 19:47:52 2020
From: eric.wonderley at vt.edu (J. Eric Wonderley)
Date: Fri, 11 Sep 2020 14:47:52 -0400
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>

We have spectrum archive with encryption on disk and tape.   We get maybe a
100 or so messages like this daily.  It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.  The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.  I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:

>  Ran into something a good while back and I'm curious how many others this
> affects.   If folks with encryption enabled could run a quick word count on
> their SKLM server and reply with a rough count I'd appreciate it.
> I've gone round and round with IBM SKLM support over the last year on this
> and it just has me wondering.  This is one of those "morbidly curious about
> making the sausage" things.
>
> Looking to see if this is a normal error message folks are seeing.  Just
> find your daily, rotating audit log and search it.  I'll trust most folks
> to figure this out, but let me know if you need help.
> Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
> If you are on a normal linux box try something like:  "locate
> sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
> certificate" {} |wc "  or whatever works for you.   If your audit log is
> fairly fresh, you might want to check the previous one.   I do NOT need
> exact information, just 'yeah we get 12million out a 500MB file' or ' we
> get zero', or something like that.
>
>  Mostly I'm curious if folks get zero, or a large number.  I've got my
> logs adjusted to 500MB and I get 8 digit numbers out of the previous log.
> Yet things work perfectly.    I've talked to two other SS sites I know the
> admins personally, and they get larger numbers than I do. But it's such a
> tiny sample size! LOL
>
> Ed Wahl
> Ohio Supercomputer Center
>
> Apologies for the message formatting issues.  Outlook fought tooth and
> nail against sending it with the path as is, and kept breaking my
> paragraphs.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/ad87789d/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 11 20:53:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 11 Sep 2020 20:53:45 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
	<3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk>

On 11/09/2020 15:25, Stephen Ulmer wrote:
> 
>> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu 
>> <mailto:skylar2 at uw.edu>> wrote:
>>
>> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>>> I think it is incorrect to assume that a command that continues
>>>> after detecting the working directory has been removed is going to
>>>> cause damage to the file system.
>>>
>>> No I am not assuming it will cause damage. I am making the fairly 
>>> reasonable
>>> assumption that any command which fails has an increased probability of
>>> causing damage to the file system over one that completes successfully.
>>
>> I think there is another angle here, which is that this command's output
>> has the possibility of triggering an "oh ----" (fill in your preferred
>> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
>> reaction has the possibility of causing its own problems (i.e. not sure if
>> mmafmctl touches CCR, but aborting it midway could leave CCR 
>> inconsistent).
>> I'm with Jonathan here: the command should fail with an informative
>> message, and the admin can correct the problem (just cd somewhere else).
>>
> 
> I?m now (genuinely) curious as to?what?Spectrum Scale commands 
> *actually* depend on the working directory existing and why. They 
> shouldn?t depend on anything but existing well-known directories (logs, 
> SDR, /tmp, et cetera) and any file or directories passed as arguments to 
> the command. This is the Unix way.
> 
> It seems like the *right* solution is to armor commands against doing 
> something ?bad? if they lose a resource required to complete their task. 
> If $PWD goes away because an admin?s home goes away in the middle of a 
> long restripe, it?s better to complete the work and let them look in the 
> logs. It's not Scale?s problem if something not affecting its work happens.
 >
 > Maybe I?ve got a blind spot here...
 >

This jogged my memory that best practice would be to have a call to 
chdir to set the working directory to "/" very early on. Before anything 
critical is started.

I am 99.999% sure that its covered in Steven's (can't check as I am away 
for the weekend) so really there is no excuse. If / goes away then 
really really bad things have happened and it all sort of becomes moot 
anyway.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Mon Sep 14 06:27:58 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 14 Sep 2020 13:27:58 +0800
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
	<CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
Message-ID: <OF1657786B.80F2927F-ON482585E3.001DBE52-482585E3.001E06DF@notes.na.collabserv.com>


Hi Eric,

Please help me to understand your question. You have Spectrum Archive and
Spectrum Scale in your system, and both of them are connected to IBM SKLM
for encryption. Now you got lots of error/warning message from SKLM log.
Now you want to understand which component, Scale or Archive, makes the
SKLM print those error message, right?

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	"J. Eric Wonderley" <eric.wonderley at vt.edu>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	2020/09/12 02:47
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Request for folks using
            encryption on SKLM, run a word count
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We have spectrum archive with encryption on disk and tape.? ?We get maybe a
100 or so messages like this daily.? It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.? The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.? I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:
  ?Ran into something a good while back and I'm curious how many others
  this affects.?? If folks with encryption enabled could run a quick word
  count on their SKLM server and reply with a rough count I'd
  appreciate?it.
  I've gone round and round with IBM SKLM support over the last year on
  this and it just has me wondering.? This is one of those "morbidly
  curious about making the sausage" things.

  Looking to see if this is a normal error message folks are seeing.? Just
  find your daily, rotating audit log and search it.? I'll trust most folks
  to figure this?out, but let me know if you need help.
  Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
  If you are on a normal linux box try something like:? "locate
  sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
  certificate" {} |wc "? or whatever works for you.?? If your audit log is
  fairly fresh, you might want to check the previous one.?? I do NOT need
  exact information, just 'yeah we get 12million out a 500MB file' or ' we
  get zero', or something like that.

  ?Mostly I'm curious if folks get zero, or a large number.? I've got my
  logs adjusted to 500MB and I get 8 digit numbers out of the previous
  log.?? Yet things work perfectly.??? I've talked to two other SS sites I
  know the admins personally, and they get larger numbers than I do. But
  it's such a tiny sample size! LOL

  Ed Wahl
  Ohio Supercomputer Center

  Apologies for the message formatting issues.? Outlook fought tooth and
  nail against sending it with the path as is, and kept breaking my
  paragraphs.
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment-0001.gif>

From u.sibiller at science-computing.de  Mon Sep 14 13:09:12 2020
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 14 Sep 2020 14:09:12 +0200
Subject: [gpfsug-discuss] tsgskkm stuck
In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
Message-ID: <b8b25b2f-1738-0848-84f4-ee10337736de@science-computing.de>

On 8/28/20 11:43 AM, Philipp Helo Rehs wrote:
> root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
> /usr/lpp/mmfs/bin/tsgskkm store --cert
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off

Judging from the command line tsgskkm will generate a certificate which normally involves a random
number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying
to generate some I/O on the node. Or run something like haveged
(https://wiki.archlinux.org/index.php/Haveged).

Uli


-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From S.J.Thompson at bham.ac.uk  Fri Sep 18 11:52:51 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 18 Sep 2020 10:52:51 +0000
Subject: [gpfsug-discuss] SSUG::Digital inode management,
 VCPU scaling and considerations for NUMA
Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk>

Number 5 in the SSUG::Digital talks set takes place 22 September 2020


Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads.
This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.


https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2126 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment-0001.ics>

From joe at excelero.com  Fri Sep 18 13:38:51 2020
From: joe at excelero.com (joe at excelero.com)
Date: Fri, 18 Sep 2020 07:38:51 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16
Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment-0001.bin>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:11:31 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:11:31 +0000
Subject: [gpfsug-discuss] CCR errors
Message-ID: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/f163918c/attachment-0001.htm>

From novosirj at rutgers.edu  Sat Sep 19 21:23:01 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 20:23:01 +0000
Subject: [gpfsug-discuss] CCR errors
In-Reply-To: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76@rutgers.edu>

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0b8d7715/attachment-0001.htm>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:52:19 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:52:19 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0ee97239/attachment-0001.htm>

From novosirj at rutgers.edu  Sun Sep 20 00:45:41 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 23:45:41 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>,
	<DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07@rutgers.edu>

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/4f220e5f/attachment-0001.htm>

From oluwasijibomi.saula at ndsu.edu  Sun Sep 20 00:59:28 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 23:59:28 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19
In-Reply-To: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324A4635EA5E11FA45651BD983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

I appreciate your support - I finally got some on a WebEx now.

I'll share any useful information I glean from the session.

Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator / Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu>

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 6:45:47 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 19

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: gpfsug-discuss Digest, Vol 104, Issue 18
      (Saula, Oluwasijibomi)
   2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:52:19 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID:
        <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0ee97239/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 23:45:41 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/4f220e5f/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 19
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/944bee8d/attachment-0001.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 09:35:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 08:35:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>

Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/59795728/attachment-0001.htm>

From olaf.weiser at de.ibm.com  Mon Sep 21 10:55:29 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 21 Sep 2020 09:55:29 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
Message-ID: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/83888a08/attachment-0001.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 11:32:25 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 10:32:25 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>

Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,

I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.


I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).

I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).


Any suggestion ?


Thanks in advance,


   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/599979c4/attachment-0001.htm>

From vpuvvada at in.ibm.com  Mon Sep 21 11:57:30 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Mon, 21 Sep 2020 16:27:30 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?=
 =?utf-8?q?l=09inflight?=
In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch><OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
Message-ID: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>

tspcacheutil <file path>, this command provides information about the 
file's replication state. You can also run policy to find these files. 

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt 
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated 
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are 
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For 
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory 
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:   "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed 
file is still   inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Information reported by that command (both at cache and home side) are 
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a 
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, 
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help 
us to know if AFM considers the file synced to home or not yet.
 
   Alvise
 
Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser 
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still 
inflight
 
do you looking fo smth like this:
mmafmlocal ls filename    or stat filename 
 
 
----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is 
still inflight
Date: Mon, Sep 21, 2020 10:45 AM
 

Dear GPFS users,
I know that through a policy one can know if a file is still being 
transferred from the cache to your home by AFM.
 
I wonder if there is another method @cache or @home side, faster and less 
invasive (a policy, as far as I know, can put some pressure on the system 
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a 
flags field that can show several things, like compression status, 
archive, etc).
 
Any suggestion ?
 
Thanks in advance,
 
   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/407c76cf/attachment-0001.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 12:17:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 11:17:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
	<OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch>

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

tspcacheutil <file path>, this command provides information about the file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/62d55b7e/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep 22 10:18:05 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 22 Sep 2020 10:18:05 +0100
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>


I have a question about using RPM's for the portability interface on 
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some 
machines on  Skylake and some on Sandy Bridge but all running on say 
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc. 
and not variations with the x86-64, ppc class? Assuming some minimum 
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same 
on every machine regardless of whether it's Skylake or Sandy Bridge, or 
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability 
interfaces. Would it help to generate the portability layer RPM's on a 
Sandy Bridge machine and work no the presumption anything that runs on 
Sandy Bridge will run on Skylake?


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Tue Sep 22 11:47:46 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 22 Sep 2020 10:47:46 +0000
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk>

We've always taken it to mean ..

RHEL != CentOS
7.1 != 7.2 (though mostly down to the kernel).
ppc64le != x86_64

But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __

Simon

?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:


    I have a question about using RPM's for the portability interface on 
    different CPU's.

    According to /usr/lpp/mmfs/src/README

        The generated RPM can ONLY be deployed to the machine with
        identical architecture, distribution level, Linux kernel version
        and GPFS version.

    So does this mean that if I have a heterogeneous cluster with some 
    machines on  Skylake and some on Sandy Bridge but all running on say 
    RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
    two CPU's?

    Or when it says "identical architecture" does it mean x86-64, ppc etc. 
    and not variations with the x86-64, ppc class? Assuming some minimum 
    level is met.

    Obviously the actual Linux kernel being stock RedHat would be the same 
    on every machine regardless of whether it's Skylake or Sandy Bridge, or 
    even for that matter an AMD processor.

    Consequently it seems strange that I would need different portability 
    interfaces. Would it help to generate the portability layer RPM's on a 
    Sandy Bridge machine and work no the presumption anything that runs on 
    Sandy Bridge will run on Skylake?


    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From skylar2 at uw.edu  Tue Sep 22 14:50:34 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 22 Sep 2020 06:50:34 -0700
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <20200922135034.6be42ykveio654sm@thargelion>

We've used the same built RPMs (generally built on Intel) on Intel and AMD
x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and
haven't run into any problems.

On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote:
> 
> I have a question about using RPM's for the portability interface on
> different CPU's.
> 
> According to /usr/lpp/mmfs/src/README
> 
>    The generated RPM can ONLY be deployed to the machine with
>    identical architecture, distribution level, Linux kernel version
>    and GPFS version.
> 
> So does this mean that if I have a heterogeneous cluster with some machines
> on  Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all
> using GPFS 5.0.5 I have to have different RPM's for the two CPU's?
> 
> Or when it says "identical architecture" does it mean x86-64, ppc etc. and
> not variations with the x86-64, ppc class? Assuming some minimum level is
> met.
> 
> Obviously the actual Linux kernel being stock RedHat would be the same on
> every machine regardless of whether it's Skylake or Sandy Bridge, or even
> for that matter an AMD processor.
> 
> Consequently it seems strange that I would need different portability
> interfaces. Would it help to generate the portability layer RPM's on a Sandy
> Bridge machine and work no the presumption anything that runs on Sandy
> Bridge will run on Skylake?
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From truongv at us.ibm.com  Tue Sep 22 16:47:09 2020
From: truongv at us.ibm.com (Truong Vu)
Date: Tue, 22 Sep 2020 11:47:09 -0400
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>


You are correct, the "identical architecture" means the same machine
hardware name as shown by the -m option of the uname command.

Thanks,
Tru.


From:	gpfsug-discuss-request at spectrumscale.org
To:	gpfsug-discuss at spectrumscale.org
Date:	09/22/2020 05:18 AM
Subject:	[EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
		 gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

or, via email, send a message with subject or body 'help' to
		 gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
		 gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Checking if a AFM-managed file is still		 inflight
      (Dorigo Alvise (PSI))
   2. Portability interface (Jonathan Buzzard)


----------------------------------------------------------------------

Message: 1
Date: Mon, 21 Sep 2020 11:17:35 +0000
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
		 inflight
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch>
Content-Type: text/plain; charset="utf-8"

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R
Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

tspcacheutil <file path>, this command provides information about the
file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed
file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy,
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help
us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is
still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being
transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less
invasive (a policy, as far as I know, can put some pressure on the system
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a
flags field that can show several things, like compression status, archive,
etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html
 >

------------------------------

Message: 2
Date: Tue, 22 Sep 2020 10:18:05 +0100
From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed


I have a question about using RPM's for the portability interface on
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some
machines on  Skylake and some on Sandy Bridge but all running on say
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc.
and not variations with the x86-64, ppc class? Assuming some minimum
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same
on every machine regardless of whether it's Skylake or Sandy Bridge, or
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability
interfaces. Would it help to generate the portability layer RPM's on a
Sandy Bridge machine and work no the presumption anything that runs on
Sandy Bridge will run on Skylake?


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 23
***********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment-0001.gif>

From jonathan.buzzard at strath.ac.uk  Wed Sep 23 15:57:00 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 23 Sep 2020 15:57:00 +0100
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
	<OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk>

On 22/09/2020 16:47, Truong Vu wrote:
> You are correct, the "identical architecture" means the same machine 
> hardware name as shown by the -m option of the uname command.
> 

Thanks for clearing that up. It just seemed something of a blindly 
obvious statement; surely nobody would expect an RPM for an Intel based 
machine to install on a PowerPC machine? that I though it might be 
referring to something else.

I mean you can't actually install an x86_64 RPM on a ppc64le machine as 
the rpm command will bomb out telling you it is from an incompatible 
architecture if you try. It's why you have noarch packages which can be 
installed on anything.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep 25 16:53:12 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 25 Sep 2020 15:53:12 +0000
Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes
 and OpenShift environments with Spectrum Scale
Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk>

Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access.


https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/


SSUG Host:

Bill Anderson


Speakers:

Smita Raut (IBM)

Harald Seipp (IBM)

Renar Grunenberg

Simon Thompson


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2233 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment-0001.ics>

From joe at excelero.com  Sat Sep 26 16:43:15 2020
From: joe at excelero.com (joe at excelero.com)
Date: Sat, 26 Sep 2020 10:43:15 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27
Message-ID: <d7e50f17-d953-48e2-95e0-7ef2147a5f06@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment-0001.bin>

From NISHAAN at za.ibm.com  Mon Sep 28 09:09:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Mon, 28 Sep 2020 10:09:29 +0200
Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon
 S3 DNS-style (Virtual hosted) Bucket Addressing
Message-ID: <OF8B87F384.2E814734-ON422585F1.00293165-422585F1.002CD05E@notes.na.collabserv.com>


Hi All

I need to find out if anyone has successfully been able to get our
Openstack Swift implementation of the object protocol to support the AWS
DNS-syle bucket naming convention. See here for an explanation
https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html.

AWS DNS-style bucket naming includes the bucket in the DNS name (eg.
mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style
bucket naming (eg. https://swift-cluster.example.com/v1/my_account/
container/object).

>From what I can tell, I need to enable the domain_remap function in the
proxy-server.conf file and also statically resolve the DNS name to a
specific bucket by inserting the correct AUTH account.

See here for the domain_remap middleware explanation..

https://docs.openstack.org/swift/latest/middleware.html

And here for additional DNS work that needs to be done..

https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/

Obviously a wildcard DNS server is required for this which is easy enough
to implement. However, the steps for Openstack Swift to support this are
not very clear. I'm hoping someone else went through the pain of figuring
this out already :)


Any help with this would be greatly appreciated!


Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 18044196.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment-0001.jpg>

From xhejtman at ics.muni.cz  Wed Sep 30 22:52:39 2020
From: xhejtman at ics.muni.cz (Lukas Hejtmanek)
Date: Wed, 30 Sep 2020 23:52:39 +0200
Subject: [gpfsug-discuss] put_cred bug
Message-ID: <20200930215239.GU1440758@ics.muni.cz>

Hello,

is this bug already resolved? 

https://access.redhat.com/solutions/3132971

I think, I'm seeing it even with latest gpfs 5.0.5.2

[1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256

maybe also related:
[ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168
[ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP


-- 
Luk?? Hejtm?nek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title


From chair at spectrumscale.org  Tue Sep  1 09:17:12 2020
From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair))
Date: Tue, 01 Sep 2020 09:17:12 +0100
Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File
 Create and MMAP performance
Message-ID: <>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meeting.ics
Type: text/calendar
Size: 2596 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/d3275165/attachment-0002.ics>

From joe at excelero.com  Tue Sep  1 14:39:47 2020
From: joe at excelero.com (joe at excelero.com)
Date: Tue, 1 Sep 2020 08:39:47 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1
Message-ID: <f57736a4-39c1-425a-be69-ec714e129232@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200901/fb9d275a/attachment-0002.bin>

From russell at nordquist.info  Wed Sep  2 15:38:35 2020
From: russell at nordquist.info (Russell Nordquist)
Date: Wed, 2 Sep 2020 10:38:35 -0400
Subject: [gpfsug-discuss] data replicas and metadata space used
Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>

I was reading this slide deck on GPFS metadata sizing and I ran across something
http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf

On slide 51 it says

"Max replicas for Data, multiplies the MD capacity used

? Reserves space in MD for the replicas even if no files replicated!?

This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise.

This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? 

thanks
Russell


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/47f9f6ec/attachment-0002.htm>

From giovanni.bracco at enea.it  Wed Sep  2 21:28:55 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Wed, 2 Sep 2020 22:28:55 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>

I am curious to know about AMD epyc support by GPFS: what is the status?
Giovanni Bracco

On 28/08/20 14:25, Frederick Stock wrote:
> Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) 
> processors.? You may want to open a help case to determine the cause of 
> this problem.
> Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 
> so you may want to consider upgrading your cluster.? And should Scale 
> officially support the AMD epyc processor it would not be on Scale 4.2.x.
> 
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
> 
>     ----- Original message -----
>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>     Cc:
>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>     Date: Fri, Aug 28, 2020 5:52 AM
>     Hello,
> 
>     we have a gpfs v4 cluster running with 4 nsds and i am trying to add
>     some clients:
> 
>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
> 
>     this commands hangs and do not finish
> 
>     When i look into the server, i can see the following processes which
>     never finish:
> 
>     root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3
>     lc/setupClient
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>     root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2
>     21479
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>     0 1191
>     root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
> 
>     The node is an AMD epyc.
> 
>     Any idea what could cause the issue?
> 
>     ssh is possible in both directions and firewall is disabled.
> 
> 
>     Kind regards
> 
>      ?Philipp Rehs
> 
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Wed Sep  2 23:28:34 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Wed, 2 Sep 2020 22:28:34 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>,
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de><OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
Message-ID: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200902/5abc3c57/attachment-0002.htm>

From knop at us.ibm.com  Thu Sep  3 05:00:38 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Thu, 3 Sep 2020 04:00:38 +0000
Subject: [gpfsug-discuss] data replicas and metadata space used
In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info>
Message-ID: <OF587C92A5.A09D17DC-ON002585D8.00149AE1-002585D8.001607EE@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/5307b0b8/attachment-0002.htm>

From giovanni.bracco at enea.it  Thu Sep  3 08:44:29 2020
From: giovanni.bracco at enea.it (Giovanni Bracco)
Date: Thu, 3 Sep 2020 09:44:29 +0200
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>

OK from client side, but I would like to know if the same is also for 
NSD servers with AMD EPYC, do they operate with good performance 
compared to Intel CPUs?

Giovanni

On 03/09/20 00:28, Andrew Beattie wrote:
> Giovanni,
> I have clients in Australia that are running AMD ROME processors in 
> their Visualisation nodes connected to scale 5.0.4 clusters with no issues.
> Spectrum Scale doesn't differentiate between x86 processor technologies 
> -- it only looks at x86_64 (OS support more than anything else)
> Andrew Beattie
> File and Object Storage Technical Specialist - A/NZ
> IBM Systems - Storage
> Phone: 614-2133-7927
> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
> 
>     ----- Original message -----
>     From: Giovanni Bracco <giovanni.bracco at enea.it>
>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>     Frederick Stock <stockf at us.ibm.com>
>     Cc:
>     Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>     about AMD epyc support in GPFS?
>     Date: Thu, Sep 3, 2020 7:29 AM
>     I am curious to know about AMD epyc support by GPFS: what is the status?
>     Giovanni Bracco
> 
>     On 28/08/20 14:25, Frederick Stock wrote:
>      > Not sure that Spectrum Scale has stated it supports the AMD epyc
>     (Rome?)
>      > processors.? You may want to open a help case to determine the
>     cause of
>      > this problem.
>      > Note that Spectrum Scale 4.2.x goes out of service on September
>     30, 2020
>      > so you may want to consider upgrading your cluster.? And should Scale
>      > officially support the AMD epyc processor it would not be on
>     Scale 4.2.x.
>      >
>      > Fred
>      > __________________________________________________
>      > Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>      > stockf at us.ibm.com
>      >
>      > ? ? ----- Original message -----
>      > ? ? From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>      > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org
>      > ? ? To: gpfsug main discussion list
>     <gpfsug-discuss at spectrumscale.org>
>      > ? ? Cc:
>      > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>      > ? ? Date: Fri, Aug 28, 2020 5:52 AM
>      > ? ? Hello,
>      >
>      > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying
>     to add
>      > ? ? some clients:
>      >
>      > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>      >
>      > ? ? this commands hangs and do not finish
>      >
>      > ? ? When i look into the server, i can see the following
>     processes which
>      > ? ? never finish:
>      >
>      > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>     checkNewClusterNode3
>      > ? ? lc/setupClient
>      >    
>     %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:
>      > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>      > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00
>      > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>     setupClient 2
>      > ? ? 21479
>      >    
>     1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191
>      > ? ? 0 1191
>      > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
>      > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>      > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>      >
>      > ? ? The node is an AMD epyc.
>      >
>      > ? ? Any idea what could cause the issue?
>      >
>      > ? ? ssh is possible in both directions and firewall is disabled.
>      >
>      >
>      > ? ? Kind regards
>      >
>      > ? ? ??Philipp Rehs
>      >
>      >
>      > ? ? _______________________________________________
>      > ? ? gpfsug-discuss mailing list
>      > ? ? gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
>      >
>      >
>      > _______________________________________________
>      > gpfsug-discuss mailing list
>      > gpfsug-discuss at spectrumscale.org
>      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>      >
> 
>     --
>     Giovanni Bracco
>     phone ?+39 351 8804788
>     E-mail ?giovanni.bracco at enea.it
>     WWW http://www.afs.enea.it/bracco
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco


From abeattie at au1.ibm.com  Thu Sep  3 09:10:38 2020
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 3 Sep 2020 08:10:38 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
	in GPFS?
In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it>
Message-ID: <OF686646EA.945202B1-ON002585D8.002CEB1A-1599120638761@notes.na.collabserv.com>


I don?t currently have any x86 based servers to do that kind of performance
testing,

But the PCI-Gen 4 advantages alone mean that the AMD server options have
significant benefits over current Intel processor platforms.

There are however limited storage controllers and Network adapters that can
help utilise the full benefits of PCI-gen4.

In terms of NSD architecture there are many variables that you also have to
take into consideration.

Are you looking at storage rich servers?
Are you looking at SAN attached Flash
Are you looking at scale ECE type deployment?

As an IBM employee and someone familiar with ESS 5000, and the
differences / benefits of the 5K architecture,
Unless your planning on building a Scale ECE type cluster with AMD
processors, storage class memory, and NVMe flash modules.  I would
seriously consider the ESS 5k over an x86 based NL-SAS storage topology
Including AMD.


Sent from my iPhone

> On 3 Sep 2020, at 17:44, Giovanni Bracco <giovanni.bracco at enea.it> wrote:
>
> ?OK from client side, but I would like to know if the same is also for
> NSD servers with AMD EPYC, do they operate with good performance
> compared to Intel CPUs?
>
> Giovanni
>
>> On 03/09/20 00:28, Andrew Beattie wrote:
>> Giovanni,
>> I have clients in Australia that are running AMD ROME processors in
>> their Visualisation nodes connected to scale 5.0.4 clusters with no
issues.
>> Spectrum Scale doesn't differentiate between x86 processor technologies
>> -- it only looks at x86_64 (OS support more than anything else)
>> Andrew Beattie
>> File and Object Storage Technical Specialist - A/NZ
>> IBM Systems - Storage
>> Phone: 614-2133-7927
>> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
>>
>>    ----- Original message -----
>>    From: Giovanni Bracco <giovanni.bracco at enea.it>
>>    Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
>>    Frederick Stock <stockf at us.ibm.com>
>>    Cc:
>>    Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what
>>    about AMD epyc support in GPFS?
>>    Date: Thu, Sep 3, 2020 7:29 AM
>>    I am curious to know about AMD epyc support by GPFS: what is the
status?
>>    Giovanni Bracco
>>
>>>    On 28/08/20 14:25, Frederick Stock wrote:
>>> Not sure that Spectrum Scale has stated it supports the AMD epyc
>>    (Rome?)
>>> processors.  You may want to open a help case to determine the
>>    cause of
>>> this problem.
>>> Note that Spectrum Scale 4.2.x goes out of service on September
>>    30, 2020
>>> so you may want to consider upgrading your cluster.  And should Scale
>>> officially support the AMD epyc processor it would not be on
>>    Scale 4.2.x.
>>>
>>> Fred
>>> __________________________________________________
>>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
>>> stockf at us.ibm.com
>>>
>>>     ----- Original message -----
>>>     From: Philipp Helo Rehs <Philipp.Rehs at uni-duesseldorf.de>
>>>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>     To: gpfsug main discussion list
>>    <gpfsug-discuss at spectrumscale.org>
>>>     Cc:
>>>     Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck
>>>     Date: Fri, Aug 28, 2020 5:52 AM
>>>     Hello,
>>>
>>>     we have a gpfs v4 cluster running with 4 nsds and i am trying
>>    to add
>>>     some clients:
>>>
>>>     mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1
>>>
>>>     this commands hangs and do not finish
>>>
>>>     When i look into the server, i can see the following
>>    processes which
>>>     never finish:
>>>
>>>     root     38138  0.0  0.0 123048 10376 ?        Ss   11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote
>>    checkNewClusterNode3
>>>     lc/setupClient
>>>
>>
%%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:

>>>     %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1
>>>     root     38169  0.0  0.0 123564 10892 ?        S    11:32   0:00
>>>     /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl
>>    setupClient 2
>>>     21479
>>>
>>
1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191

>>>     0 1191
>>>     root     38212  100  0.0  35544  5752 ?        R    11:32   9:40
>>>     /usr/lpp/mmfs/bin/tsgskkm store --cert
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
>>>     /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off
>>>
>>>     The node is an AMD epyc.
>>>
>>>     Any idea what could cause the issue?
>>>
>>>     ssh is possible in both directions and firewall is disabled.
>>>
>>>
>>>     Kind regards
>>>
>>>       Philipp Rehs
>>>
>>>
>>>     _______________________________________________
>>>     gpfsug-discuss mailing list
>>>     gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>>
>>
>>    --
>>    Giovanni Bracco
>>    phone  +39 351 8804788
>>    E-mail  giovanni.bracco at enea.it
>>    WWW
http://www.afs.enea.it/bracco

>>    _______________________________________________
>>    gpfsug-discuss mailing list
>>    gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>>
>
> --
> Giovanni Bracco
> phone  +39 351 8804788
> E-mail  giovanni.bracco at enea.it
> WWW
http://www.afs.enea.it/bracco

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200903/c6c95685/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep  4 08:56:41 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 4 Sep 2020 08:56:41 +0100
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>

On 02/09/2020 23:28, Andrew Beattie wrote:
> Giovanni, I have clients in Australia that are running AMD ROME
> processors in their Visualisation nodes connected to scale 5.0.4
> clusters with no issues. Spectrum Scale doesn't differentiate between
> x86 processor technologies -- it only looks at x86_64 (OS support
> more than anything else) 

While true bear in mind their are limits on the number of cores that it 
might be quite easy to pass on a high end multi CPU AMD machine :-)

See question 5.3

https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

192 is the largest tested limit for the number of cores and there is a 
hard limit at 1536 cores.

 From memory these limits are lower in older versions of GPFS.So I think 
the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
release), but works just fine on 80 cores as far as I can tell.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep  4 10:02:29 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 4 Sep 2020 09:02:29 +0000
Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support
 in GPFS?
In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>
	<90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
	<OF3D94B077.95B89A4F-ON002585D2.00440AC6-002585D2.00443C2A@notes.na.collabserv.com>
	<OF6B86E1FF.9430D4D6-ON002585D7.007ADDC6-002585D7.007B7739@notes.na.collabserv.com>
	<06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk>
Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk>

Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet):

https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/

... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.

Simon

?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    On 02/09/2020 23:28, Andrew Beattie wrote:
    > Giovanni, I have clients in Australia that are running AMD ROME
    > processors in their Visualisation nodes connected to scale 5.0.4
    > clusters with no issues. Spectrum Scale doesn't differentiate between
    > x86 processor technologies -- it only looks at x86_64 (OS support
    > more than anything else) 

    While true bear in mind their are limits on the number of cores that it 
    might be quite easy to pass on a high end multi CPU AMD machine :-)

    See question 5.3

    https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf

    192 is the largest tested limit for the number of cores and there is a 
    hard limit at 1536 cores.

     From memory these limits are lower in older versions of GPFS.So I think 
    the "tested" limit in 4.2 is 64 cores from memory (or was at the time of 
    release), but works just fine on 80 cores as far as I can tell.

    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oluwasijibomi.saula at ndsu.edu  Fri Sep  4 17:03:17 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Fri, 4 Sep 2020 16:03:17 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
Message-ID: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello GPFS Experts,

Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers?

I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing.

Thoughts?


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200904/a7aa2a9e/attachment-0002.htm>

From heinrich.billich at id.ethz.ch  Mon Sep  7 14:29:59 2020
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 7 Sep 2020 13:29:59 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>

Hi,

just came across this:

/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..

It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ?


Cheers,

Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200907/e5c19e73/attachment-0002.htm>

From knop at us.ibm.com  Tue Sep  8 04:09:07 2020
From: knop at us.ibm.com (Felipe Knop)
Date: Tue, 8 Sep 2020 03:09:07 +0000
Subject: [gpfsug-discuss] Short-term Deactivation of NSD server
In-Reply-To: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB532498016CCC27F8AC5AB99F982D0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <OF812C6C5F.2B36DE84-ON002585DD.0010FEB9-002585DD.00115081@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/09a7cf74/attachment-0002.htm>

From scale at us.ibm.com  Tue Sep  8 14:04:26 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 09:04:26 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
Message-ID: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>

I think a better metaphor is that the bridge we just crossed has collapsed 
and as long as we do not need to cross it again our journey should reach 
its intended destination :-)  As I understand the intent of this message 
is to alert the user (and our support teams) that the directory from which 
a command was executed no longer exist.  Should that be of consequence to 
the execution of the command then failure is not unexpected, however, many 
commands do not make use of the current directory so they likely will 
succeed.  If you consider the view point of a command failing because the 
working directory was removed, but not knowing that was the root cause, I 
think you can see why this message was added into the administration 
infrastructure.  It allows this odd failure scenario to be quickly 
recognized saving time for both the user and IBM support, in tracking down 
the root cause.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/07/2020 09:29 AM
Subject:        [EXTERNAL] [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
 
just came across this:
 
/usr/lpp/mmfs/bin/mmafmctl fs3101 getstate
mmafmctl: Invalid current working directory detected: /tmp/A
  The command may fail in an unexpected way.  Processing continues ..
 
It?s like a bus driver telling you that the brakes don?t work and next 
speeding up even more. Honestly, why not just fail with a nice error 
messages ?. Don?t tell some customer asked for this to make the command 
more resilient ?
 
 
Cheers,
 
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/08abb202/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep  8 17:10:59 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 8 Sep 2020 17:10:59 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
Message-ID: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>

On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-) ?As I understand the intent of 
> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist. ?Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed. ?If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure. ?It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Tue Sep  8 18:37:59 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Tue, 8 Sep 2020 13:37:59 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch><OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
Message-ID: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>

I think it is incorrect to assume that a command that continues after 
detecting the working directory has been removed is going to cause damage 
to the file system.  Further, there is no a priori means to confirm if the 
lack of a working directory will cause the command to fail.  I will agree 
that there may be admins that would prefer the command fail fast and allow 
them to restart the command anew, but I suspect there are admins that 
prefer the command press ahead in hopes that it can complete successfully 
and not require another execution.  I'm sure we can conjure scenarios that 
support both points of view.  Perhaps what is desired is a message that 
more clearly describes what is being undertaken.  For example, "The 
current working directory, <directory_name>, no longer exists.  Execution 
continues."

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.


From:   Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To:     gpfsug-discuss at spectrumscale.org
Date:   09/08/2020 12:10 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-)  As I understand the intent of 

> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist.  Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed.  If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure.  It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/684aa7cb/attachment-0002.htm>

From ewahl at osc.edu  Tue Sep  8 23:46:08 2020
From: ewahl at osc.edu (Wahl, Edward)
Date: Tue, 8 Sep 2020 22:46:08 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
	run a word count
Message-ID: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200908/a4b02cc5/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Wed Sep  9 12:02:53 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 9 Sep 2020 12:02:53 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>

On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> I think it is incorrect to assume that a command that continues
> after detecting the working directory has been removed is going to
> cause damage to the file system.

No I am not assuming it will cause damage. I am making the fairly 
reasonable assumption that any command which fails has an increased 
probability of causing damage to the file system over one that completes 
successfully.

> Further, there is no a priori means to confirm if the lack of a 
> working directory will cause the command to fail.

Which is why baling out is a more sensible default that ploughing on
regardless.

> I will agree that there may be admins that would prefer the command 
> fail fast and allow them to restart the command anew, but I suspect 
> there are admins that prefer the command press ahead in hopes that
> it can complete successfully and not require another execution.

I am sure that there are inexperienced admins who have yet to be battle 
scared that would want such reckless default behaviour. Pandering to 
their naivety is not a sensible approach IMHO.

The downside if a large file system (and production GPFS file systems 
tend to be large) going "puff" is so massive that the precaution 
principle should apply.

One wonders if we are seeing the difference between a US and European 
mindset here.

> I'm sure we can conjure scenarios that support both points of view.
> Perhaps what is desired is a message that more clearly describes what
> is being undertaken.  For example, "The current working directory, 
> <directory_name>, no longer exists.  Execution continues."
> 

That is what --force is for. If you are sufficiently reckless that you 
want something to continue in the event of a possible error you have the 
option to stick that on every command you run. Meanwhile the sane admins 
get a system that defaults to proceeding in the safer manner possible.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From skylar2 at uw.edu  Wed Sep  9 15:04:27 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Wed, 9 Sep 2020 07:04:27 -0700
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion>

On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
> > I think it is incorrect to assume that a command that continues
> > after detecting the working directory has been removed is going to
> > cause damage to the file system.
> 
> No I am not assuming it will cause damage. I am making the fairly reasonable
> assumption that any command which fails has an increased probability of
> causing damage to the file system over one that completes successfully.

I think there is another angle here, which is that this command's output
has the possibility of triggering an "oh ----" (fill in your preferred
colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
reaction has the possibility of causing its own problems (i.e. not sure if
mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
I'm with Jonathan here: the command should fail with an informative
message, and the admin can correct the problem (just cd somewhere else).

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From carlz at us.ibm.com  Thu Sep 10 13:55:25 2020
From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com)
Date: Thu, 10 Sep 2020 12:55:25 +0000
Subject: [gpfsug-discuss] Best of spectrum scale
Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com>

Jonathan,

Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish.

I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.)

Thanks,


Carl Zetie
Program Director
Offering Management
Spectrum Scale
----
(919) 473 3318 ][ Research Triangle Park
carlz at us.ibm.com

[signature_1291474181]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 69558 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/c3696dc7/attachment-0002.png>

From cblack at nygenome.org  Thu Sep 10 16:55:46 2020
From: cblack at nygenome.org (Christopher Black)
Date: Thu, 10 Sep 2020 15:55:46 +0000
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <EF9A5AAF-B759-4C76-9DF0-BD79A7AEBA08@nygenome.org>

We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools.
We see no grep hits for ?not trust? in our last few sklm_audit.log files.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, September 8, 2020 at 7:10 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count

 Ran into something a good while back and I'm curious how many others this affects.   If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it.
I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.  This is one of those "morbidly curious about making the sausage" things.

Looking to see if this is a normal error message folks are seeing.  Just find your daily, rotating audit log and search it.  I'll trust most folks to figure this out, but let me know if you need help.
Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit  If you are on a normal linux box try something like:  "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "  or whatever works for you.   If your audit log is fairly fresh, you might want to check the previous one.   I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that.

 Mostly I'm curious if folks get zero, or a large number.  I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.   Yet things work perfectly.    I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL

Ed Wahl
Ohio Supercomputer Center

Apologies for the message formatting issues.  Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs.
________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200910/aafe8398/attachment-0002.htm>

From ulmer at ulmer.org  Fri Sep 11 15:25:55 2020
From: ulmer at ulmer.org (Stephen Ulmer)
Date: Fri, 11 Sep 2020 10:25:55 -0400
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>


> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu> wrote:
> 
> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>> I think it is incorrect to assume that a command that continues
>>> after detecting the working directory has been removed is going to
>>> cause damage to the file system.
>> 
>> No I am not assuming it will cause damage. I am making the fairly reasonable
>> assumption that any command which fails has an increased probability of
>> causing damage to the file system over one that completes successfully.
> 
> I think there is another angle here, which is that this command's output
> has the possibility of triggering an "oh ----" (fill in your preferred
> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
> reaction has the possibility of causing its own problems (i.e. not sure if
> mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent).
> I'm with Jonathan here: the command should fail with an informative
> message, and the admin can correct the problem (just cd somewhere else).
> 

I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way.

It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens.

Maybe I?ve got a blind spot here...

-- 
Stephen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/9b065b8b/attachment-0002.htm>

From eric.wonderley at vt.edu  Fri Sep 11 19:47:52 2020
From: eric.wonderley at vt.edu (J. Eric Wonderley)
Date: Fri, 11 Sep 2020 14:47:52 -0400
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
Message-ID: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>

We have spectrum archive with encryption on disk and tape.   We get maybe a
100 or so messages like this daily.  It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.  The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.  I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:

>  Ran into something a good while back and I'm curious how many others this
> affects.   If folks with encryption enabled could run a quick word count on
> their SKLM server and reply with a rough count I'd appreciate it.
> I've gone round and round with IBM SKLM support over the last year on this
> and it just has me wondering.  This is one of those "morbidly curious about
> making the sausage" things.
>
> Looking to see if this is a normal error message folks are seeing.  Just
> find your daily, rotating audit log and search it.  I'll trust most folks
> to figure this out, but let me know if you need help.
> Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
> If you are on a normal linux box try something like:  "locate
> sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
> certificate" {} |wc "  or whatever works for you.   If your audit log is
> fairly fresh, you might want to check the previous one.   I do NOT need
> exact information, just 'yeah we get 12million out a 500MB file' or ' we
> get zero', or something like that.
>
>  Mostly I'm curious if folks get zero, or a large number.  I've got my
> logs adjusted to 500MB and I get 8 digit numbers out of the previous log.
> Yet things work perfectly.    I've talked to two other SS sites I know the
> admins personally, and they get larger numbers than I do. But it's such a
> tiny sample size! LOL
>
> Ed Wahl
> Ohio Supercomputer Center
>
> Apologies for the message formatting issues.  Outlook fought tooth and
> nail against sending it with the path as is, and kept breaking my
> paragraphs.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200911/ad87789d/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 11 20:53:45 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 11 Sep 2020 20:53:45 +0100
Subject: [gpfsug-discuss] Best of spectrum scale
In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch>
	<OF8F7E9F10.AD5AD7E5-ON852585DD.0046A0A7-852585DD.0047D19A@notes.na.collabserv.com>
	<a1daa780-821d-7f2e-4046-522d575003b8@strath.ac.uk>
	<OF77C37154.FC9AEFFB-ON852585DD.005FCC71-852585DD.0060DCA7@notes.na.collabserv.com>
	<61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk>
	<20200909140427.aint6lhyqgz7jlk7@thargelion>
	<3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org>
Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk>

On 11/09/2020 15:25, Stephen Ulmer wrote:
> 
>> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu 
>> <mailto:skylar2 at uw.edu>> wrote:
>>
>> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>>> I think it is incorrect to assume that a command that continues
>>>> after detecting the working directory has been removed is going to
>>>> cause damage to the file system.
>>>
>>> No I am not assuming it will cause damage. I am making the fairly 
>>> reasonable
>>> assumption that any command which fails has an increased probability of
>>> causing damage to the file system over one that completes successfully.
>>
>> I think there is another angle here, which is that this command's output
>> has the possibility of triggering an "oh ----" (fill in your preferred
>> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
>> reaction has the possibility of causing its own problems (i.e. not sure if
>> mmafmctl touches CCR, but aborting it midway could leave CCR 
>> inconsistent).
>> I'm with Jonathan here: the command should fail with an informative
>> message, and the admin can correct the problem (just cd somewhere else).
>>
> 
> I?m now (genuinely) curious as to?what?Spectrum Scale commands 
> *actually* depend on the working directory existing and why. They 
> shouldn?t depend on anything but existing well-known directories (logs, 
> SDR, /tmp, et cetera) and any file or directories passed as arguments to 
> the command. This is the Unix way.
> 
> It seems like the *right* solution is to armor commands against doing 
> something ?bad? if they lose a resource required to complete their task. 
> If $PWD goes away because an admin?s home goes away in the middle of a 
> long restripe, it?s better to complete the work and let them look in the 
> logs. It's not Scale?s problem if something not affecting its work happens.
 >
 > Maybe I?ve got a blind spot here...
 >

This jogged my memory that best practice would be to have a call to 
chdir to set the working directory to "/" very early on. Before anything 
critical is started.

I am 99.999% sure that its covered in Steven's (can't check as I am away 
for the weekend) so really there is no excuse. If / goes away then 
really really bad things have happened and it all sort of becomes moot 
anyway.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From scale at us.ibm.com  Mon Sep 14 06:27:58 2020
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 14 Sep 2020 13:27:58 +0800
Subject: [gpfsug-discuss] Request for folks using encryption on SKLM,
 run a word count
In-Reply-To: <CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
References: <DM5PR0101MB2924A5E5DD72C51927BF1618A8290@DM5PR0101MB2924.prod.exchangelabs.com>
	<CABOSGQfPZiJZdCx9DT3ts2CBS7JDyqX_D=QQkvYwQv=yOoSE6Q@mail.gmail.com>
Message-ID: <OF1657786B.80F2927F-ON482585E3.001DBE52-482585E3.001E06DF@notes.na.collabserv.com>


Hi Eric,

Please help me to understand your question. You have Spectrum Archive and
Spectrum Scale in your system, and both of them are connected to IBM SKLM
for encryption. Now you got lots of error/warning message from SKLM log.
Now you want to understand which component, Scale or Archive, makes the
SKLM print those error message, right?

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	"J. Eric Wonderley" <eric.wonderley at vt.edu>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	2020/09/12 02:47
Subject:	[EXTERNAL] Re: [gpfsug-discuss] Request for folks using
            encryption on SKLM, run a word count
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


We have spectrum archive with encryption on disk and tape.? ?We get maybe a
100 or so messages like this daily.? It would be nice if message had some
information about which client is the issue.

We have had client certs expire in the past.? The root cause of the outage
was a network outage...iirc the certs are cached in the clients.

I don't know what to make of these messages...they do concern me.? I don't
have a very good opinion of the sklm code...key replication between the key
servers has never worked as expected.


Eric Wonderley


On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward <ewahl at osc.edu> wrote:
  ?Ran into something a good while back and I'm curious how many others
  this affects.?? If folks with encryption enabled could run a quick word
  count on their SKLM server and reply with a rough count I'd
  appreciate?it.
  I've gone round and round with IBM SKLM support over the last year on
  this and it just has me wondering.? This is one of those "morbidly
  curious about making the sausage" things.

  Looking to see if this is a normal error message folks are seeing.? Just
  find your daily, rotating audit log and search it.? I'll trust most folks
  to figure this?out, but let me know if you need help.
  Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit
  If you are on a normal linux box try something like:? "locate
  sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client
  certificate" {} |wc "? or whatever works for you.?? If your audit log is
  fairly fresh, you might want to check the previous one.?? I do NOT need
  exact information, just 'yeah we get 12million out a 500MB file' or ' we
  get zero', or something like that.

  ?Mostly I'm curious if folks get zero, or a large number.? I've got my
  logs adjusted to 500MB and I get 8 digit numbers out of the previous
  log.?? Yet things work perfectly.??? I've talked to two other SS sites I
  know the admins personally, and they get larger numbers than I do. But
  it's such a tiny sample size! LOL

  Ed Wahl
  Ohio Supercomputer Center

  Apologies for the message formatting issues.? Outlook fought tooth and
  nail against sending it with the path as is, and kept breaking my
  paragraphs.
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200914/bb6b23fc/attachment-0002.gif>

From u.sibiller at science-computing.de  Mon Sep 14 13:09:12 2020
From: u.sibiller at science-computing.de (Ulrich Sibiller)
Date: Mon, 14 Sep 2020 14:09:12 +0200
Subject: [gpfsug-discuss] tsgskkm stuck
In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de>
Message-ID: <b8b25b2f-1738-0848-84f4-ee10337736de@science-computing.de>

On 8/28/20 11:43 AM, Philipp Helo Rehs wrote:
> root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40
> /usr/lpp/mmfs/bin/tsgskkm store --cert
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out
> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off

Judging from the command line tsgskkm will generate a certificate which normally involves a random
number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying
to generate some I/O on the node. Or run something like haveged
(https://wiki.archlinux.org/index.php/Haveged).

Uli


-- 
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

From S.J.Thompson at bham.ac.uk  Fri Sep 18 11:52:51 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 18 Sep 2020 10:52:51 +0000
Subject: [gpfsug-discuss] SSUG::Digital inode management,
 VCPU scaling and considerations for NUMA
Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk>

Number 5 in the SSUG::Digital talks set takes place 22 September 2020


Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads.
This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA.


https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2126 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/5862fc35/attachment-0002.ics>

From joe at excelero.com  Fri Sep 18 13:38:51 2020
From: joe at excelero.com (joe at excelero.com)
Date: Fri, 18 Sep 2020 07:38:51 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16
Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200918/ca3098e2/attachment-0002.bin>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:11:31 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:11:31 +0000
Subject: [gpfsug-discuss] CCR errors
Message-ID: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/f163918c/attachment-0002.htm>

From novosirj at rutgers.edu  Sat Sep 19 21:23:01 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 20:23:01 +0000
Subject: [gpfsug-discuss] CCR errors
In-Reply-To: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76@rutgers.edu>

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0b8d7715/attachment-0002.htm>

From oluwasijibomi.saula at ndsu.edu  Sat Sep 19 21:52:19 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 20:52:19 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/0ee97239/attachment-0002.htm>

From novosirj at rutgers.edu  Sun Sep 20 00:45:41 2020
From: novosirj at rutgers.edu (Ryan Novosielski)
Date: Sat, 19 Sep 2020 23:45:41 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
In-Reply-To: <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
References: <mailman.589.1600546986.1663.gpfsug-discuss@spectrumscale.org>,
	<DM6PR08MB5324B069B9BE106DA9EA6B2B983C0@DM6PR08MB5324.namprd08.prod.outlook.com>
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07@rutgers.edu>

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/4f220e5f/attachment-0002.htm>

From oluwasijibomi.saula at ndsu.edu  Sun Sep 20 00:59:28 2020
From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi)
Date: Sat, 19 Sep 2020 23:59:28 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19
In-Reply-To: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.591.1600559147.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <DM6PR08MB5324A4635EA5E11FA45651BD983C0@DM6PR08MB5324.namprd08.prod.outlook.com>

Ryan,

I appreciate your support - I finally got some on a WebEx now.

I'll share any useful information I glean from the session.

Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator / Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu>

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 6:45:47 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 19

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: gpfsug-discuss Digest, Vol 104, Issue 18
      (Saula, Oluwasijibomi)
   2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:52:19 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID:
        <DM6PR08MB5324B069B9BE106DA9EA6B2B983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0ee97239/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 23:45:41 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18
Message-ID: <AE20D3E0-4824-486A-A2AD-564270BFFF07 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones.

I?d help you, but I don?t know how to fix that one.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Ryan,

We've been at severity 1 since about 4am with only a single response all day.

Got me a bit concerned now...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Saturday, September 19, 2020 3:23 PM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 104, Issue 18

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. CCR errors (Saula, Oluwasijibomi)
   2. Re: CCR errors (Ryan Novosielski)


----------------------------------------------------------------------

Message: 1
Date: Sat, 19 Sep 2020 20:11:31 +0000
From: "Saula, Oluwasijibomi" <oluwasijibomi.saula at ndsu.edu>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] CCR errors
Message-ID:
        <DM6PR08MB5324EE3B4D20A1FFD6BBCF66983C0 at DM6PR08MB5324.namprd08.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/f163918c/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 19 Sep 2020 20:23:01 +0000
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CCR errors
Message-ID: <EBDC9F2B-9DA3-42B8-8592-2F96B04F1A76 at rutgers.edu>
Content-Type: text/plain; charset="utf-8"

I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at).

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi <oluwasijibomi.saula at ndsu.edu> wrote:

?
Hello,

Anyone available to assist with CCR errors:


[root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager

mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ...


Per IBM support's direction, I already followed the Manual Repair Procedure<https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1pdg_noccrreco_multinode.htm>, but now I'm back to square one with the same issue.

Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow.

If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary.


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology


Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>


[cid:image001.gif at 01D57DE0.91C300C0]


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/0b8d7715/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 18
***********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200919/4f220e5f/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 19
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200919/944bee8d/attachment-0002.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 09:35:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 08:35:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>

Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/59795728/attachment-0002.htm>

From olaf.weiser at de.ibm.com  Mon Sep 21 10:55:29 2020
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 21 Sep 2020 09:55:29 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
Message-ID: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/83888a08/attachment-0002.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 11:32:25 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 10:32:25 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>

Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,

I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.


I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).

I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).


Any suggestion ?


Thanks in advance,


   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/599979c4/attachment-0002.htm>

From vpuvvada at in.ibm.com  Mon Sep 21 11:57:30 2020
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Mon, 21 Sep 2020 16:27:30 +0530
Subject: [gpfsug-discuss]
 =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?=
 =?utf-8?q?l=09inflight?=
In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch><OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
Message-ID: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>

tspcacheutil <file path>, this command provides information about the 
file's replication state. You can also run policy to find these files. 

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt 
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated 
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are 
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For 
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory 
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:   "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed 
file is still   inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Information reported by that command (both at cache and home side) are 
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a 
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, 
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help 
us to know if AFM considers the file synced to home or not yet.
 
   Alvise
 
Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser 
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still 
inflight
 
do you looking fo smth like this:
mmafmlocal ls filename    or stat filename 
 
 
----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is 
still inflight
Date: Mon, Sep 21, 2020 10:45 AM
 

Dear GPFS users,
I know that through a policy one can know if a file is still being 
transferred from the cache to your home by AFM.
 
I wonder if there is another method @cache or @home side, faster and less 
invasive (a policy, as far as I know, can put some pressure on the system 
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a 
flags field that can show several things, like compression status, 
archive, etc).
 
Any suggestion ?
 
Thanks in advance,
 
   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/407c76cf/attachment-0002.htm>

From alvise.dorigo at psi.ch  Mon Sep 21 12:17:35 2020
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 21 Sep 2020 11:17:35 +0000
Subject: [gpfsug-discuss] Checking if a AFM-managed file is still
	inflight
In-Reply-To: <OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch>
	<OF1D312477.90B20EEE-ON002585EA.00367D24-002585EA.003684F5@notes.na.collabserv.com>
	<81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch>
	<OFF546193B.BB75496D-ON652585EA.003AEC0E-652585EA.003C31CC@notes.na.collabserv.com>
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch>

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

tspcacheutil <file path>, this command provides information about the file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser <olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200921/62d55b7e/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Tue Sep 22 10:18:05 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Tue, 22 Sep 2020 10:18:05 +0100
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>


I have a question about using RPM's for the portability interface on 
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some 
machines on  Skylake and some on Sandy Bridge but all running on say 
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc. 
and not variations with the x86-64, ppc class? Assuming some minimum 
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same 
on every machine regardless of whether it's Skylake or Sandy Bridge, or 
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability 
interfaces. Would it help to generate the portability layer RPM's on a 
Sandy Bridge machine and work no the presumption anything that runs on 
Sandy Bridge will run on Skylake?


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Tue Sep 22 11:47:46 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Tue, 22 Sep 2020 10:47:46 +0000
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk>

We've always taken it to mean ..

RHEL != CentOS
7.1 != 7.2 (though mostly down to the kernel).
ppc64le != x86_64

But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __

Simon

?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:


    I have a question about using RPM's for the portability interface on 
    different CPU's.

    According to /usr/lpp/mmfs/src/README

        The generated RPM can ONLY be deployed to the machine with
        identical architecture, distribution level, Linux kernel version
        and GPFS version.

    So does this mean that if I have a heterogeneous cluster with some 
    machines on  Skylake and some on Sandy Bridge but all running on say 
    RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the 
    two CPU's?

    Or when it says "identical architecture" does it mean x86-64, ppc etc. 
    and not variations with the x86-64, ppc class? Assuming some minimum 
    level is met.

    Obviously the actual Linux kernel being stock RedHat would be the same 
    on every machine regardless of whether it's Skylake or Sandy Bridge, or 
    even for that matter an AMD processor.

    Consequently it seems strange that I would need different portability 
    interfaces. Would it help to generate the portability layer RPM's on a 
    Sandy Bridge machine and work no the presumption anything that runs on 
    Sandy Bridge will run on Skylake?


    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From skylar2 at uw.edu  Tue Sep 22 14:50:34 2020
From: skylar2 at uw.edu (Skylar Thompson)
Date: Tue, 22 Sep 2020 06:50:34 -0700
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk>
Message-ID: <20200922135034.6be42ykveio654sm@thargelion>

We've used the same built RPMs (generally built on Intel) on Intel and AMD
x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and
haven't run into any problems.

On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote:
> 
> I have a question about using RPM's for the portability interface on
> different CPU's.
> 
> According to /usr/lpp/mmfs/src/README
> 
>    The generated RPM can ONLY be deployed to the machine with
>    identical architecture, distribution level, Linux kernel version
>    and GPFS version.
> 
> So does this mean that if I have a heterogeneous cluster with some machines
> on  Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all
> using GPFS 5.0.5 I have to have different RPM's for the two CPU's?
> 
> Or when it says "identical architecture" does it mean x86-64, ppc etc. and
> not variations with the x86-64, ppc class? Assuming some minimum level is
> met.
> 
> Obviously the actual Linux kernel being stock RedHat would be the same on
> every machine regardless of whether it's Skylake or Sandy Bridge, or even
> for that matter an AMD processor.
> 
> Consequently it seems strange that I would need different portability
> interfaces. Would it help to generate the portability layer RPM's on a Sandy
> Bridge machine and work no the presumption anything that runs on Sandy
> Bridge will run on Skylake?
> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department (UW Medicine), System Administrator
-- Foege Building S046, (206)-685-7354
-- Pronouns: He/Him/His


From truongv at us.ibm.com  Tue Sep 22 16:47:09 2020
From: truongv at us.ibm.com (Truong Vu)
Date: Tue, 22 Sep 2020 11:47:09 -0400
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>


You are correct, the "identical architecture" means the same machine
hardware name as shown by the -m option of the uname command.

Thanks,
Tru.


From:	gpfsug-discuss-request at spectrumscale.org
To:	gpfsug-discuss at spectrumscale.org
Date:	09/22/2020 05:18 AM
Subject:	[EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Send gpfsug-discuss mailing list submissions to
		 gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

or, via email, send a message with subject or body 'help' to
		 gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
		 gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Checking if a AFM-managed file is still		 inflight
      (Dorigo Alvise (PSI))
   2. Portability interface (Jonathan Buzzard)


----------------------------------------------------------------------

Message: 1
Date: Mon, 21 Sep 2020 11:17:35 +0000
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
		 inflight
Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch>
Content-Type: text/plain; charset="utf-8"

Thank you Venkat, the ?dirty? and ?append? flags seem quite useful.

   A


Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Venkateswara R
Puvvada <vpuvvada at in.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 12:57
A: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

tspcacheutil <file path>, this command provides information about the
file's replication state. You can also run policy to find these files.

Example:

tspcacheutil /gpfs/gpfs1/sw2/1.txt
inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777
nlink=1
       ctime=1600366912.382081156 mtime=1600275424.692786000
       cached 1  hasState 1  local 0
       create 0  setattr  0  dirty 0  link 0  append 0
pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1
remote: ino=56076 size=3 nlink=1 fhsize=24 version=0
        ctime=1600376836.408694099 mtime=1600275424.692786000

Cached - File is cached. For directory, readdir+lookup is completed.
hashState - file/dir have remote attributes for the replication.
local - file/dir is local,  won't be replicated to home or not revalidated
with home.
Create - file/dir is newly created, not yet replicated
Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..)  are
changed on dir/file, but not replicated yet.
Dirty - file have been changed in the cache, but not replicated yet. For
directory this means that files inside it have been removed or renamed.
Link - hard link for the file have been created, but not replicated yet.
Append - file have been appended, but not replicated yet. For directory
this is complete bit which indicates  that readddir was performed.

~Venkat (vpuvvada at in.ibm.com)


From:        "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/21/2020 04:02 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed
file is still        inflight
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Information reported by that command (both at cache and home side) are
size, blocks, block size, and times.
I think it cannot be enough to decide that AFM completed the transfer of a
file.
Did I possibly miss something else ?
It would be nice to have a flag (like that one reported by the policy,
flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help
us to know if AFM considers the file synced to home or not yet.

   Alvise

Da: <gpfsug-discuss-bounces at spectrumscale.org> per conto di Olaf Weiser
<olaf.weiser at de.ibm.com>
Risposta: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Data: luned?, 21 settembre 2020 11:55
A: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still
inflight

do you looking fo smth like this:
mmafmlocal ls filename    or stat filename


----- Original message -----
From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is
still inflight
Date: Mon, Sep 21, 2020 10:45 AM


Dear GPFS users,
I know that through a policy one can know if a file is still being
transferred from the cache to your home by AFM.

I wonder if there is another method @cache or @home side, faster and less
invasive (a policy, as far as I know, can put some pressure on the system
when there are many files).
I quickly checked mmlsattr that seems not to be AFM-aware (but there is a
flags field that can show several things, like compression status, archive,
etc).

Any suggestion ?

Thanks in advance,

   Alvise
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html
 >

------------------------------

Message: 2
Date: Tue, 22 Sep 2020 10:18:05 +0100
From: Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Portability interface
Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed


I have a question about using RPM's for the portability interface on
different CPU's.

According to /usr/lpp/mmfs/src/README

    The generated RPM can ONLY be deployed to the machine with
    identical architecture, distribution level, Linux kernel version
    and GPFS version.

So does this mean that if I have a heterogeneous cluster with some
machines on  Skylake and some on Sandy Bridge but all running on say
RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the
two CPU's?

Or when it says "identical architecture" does it mean x86-64, ppc etc.
and not variations with the x86-64, ppc class? Assuming some minimum
level is met.

Obviously the actual Linux kernel being stock RedHat would be the same
on every machine regardless of whether it's Skylake or Sandy Bridge, or
even for that matter an AMD processor.

Consequently it seems strange that I would need different portability
interfaces. Would it help to generate the portability layer RPM's on a
Sandy Bridge machine and work no the presumption anything that runs on
Sandy Bridge will run on Skylake?


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 104, Issue 23
***********************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200922/1ff4d732/attachment-0002.gif>

From jonathan.buzzard at strath.ac.uk  Wed Sep 23 15:57:00 2020
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Wed, 23 Sep 2020 15:57:00 +0100
Subject: [gpfsug-discuss] Portability interface
In-Reply-To: <OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
References: <mailman.604.1600766289.1663.gpfsug-discuss@spectrumscale.org>
	<OF870C3877.E60DD32E-ON852585EB.0056729B-852585EB.0056B726@notes.na.collabserv.com>
Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk>

On 22/09/2020 16:47, Truong Vu wrote:
> You are correct, the "identical architecture" means the same machine 
> hardware name as shown by the -m option of the uname command.
> 

Thanks for clearing that up. It just seemed something of a blindly 
obvious statement; surely nobody would expect an RPM for an Intel based 
machine to install on a PowerPC machine? that I though it might be 
referring to something else.

I mean you can't actually install an x86_64 RPM on a ppc64le machine as 
the rpm command will bomb out telling you it is from an incompatible 
architecture if you try. It's why you have noarch packages which can be 
installed on anything.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From S.J.Thompson at bham.ac.uk  Fri Sep 25 16:53:12 2020
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Fri, 25 Sep 2020 15:53:12 +0000
Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes
 and OpenShift environments with Spectrum Scale
Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk>

Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access.


https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/


SSUG Host:

Bill Anderson


Speakers:

Smita Raut (IBM)

Harald Seipp (IBM)

Renar Grunenberg

Simon Thompson


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2233 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200925/2857c28f/attachment-0002.ics>

From joe at excelero.com  Sat Sep 26 16:43:15 2020
From: joe at excelero.com (joe at excelero.com)
Date: Sat, 26 Sep 2020 10:43:15 -0500
Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27
Message-ID: <d7e50f17-d953-48e2-95e0-7ef2147a5f06@Spark>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reply.ics
Type: application/ics
Size: 0 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200926/ab9d1eee/attachment-0002.bin>

From NISHAAN at za.ibm.com  Mon Sep 28 09:09:29 2020
From: NISHAAN at za.ibm.com (Nishaan Docrat)
Date: Mon, 28 Sep 2020 10:09:29 +0200
Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon
 S3 DNS-style (Virtual hosted) Bucket Addressing
Message-ID: <OF8B87F384.2E814734-ON422585F1.00293165-422585F1.002CD05E@notes.na.collabserv.com>


Hi All

I need to find out if anyone has successfully been able to get our
Openstack Swift implementation of the object protocol to support the AWS
DNS-syle bucket naming convention. See here for an explanation
https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html.

AWS DNS-style bucket naming includes the bucket in the DNS name (eg.
mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style
bucket naming (eg. https://swift-cluster.example.com/v1/my_account/
container/object).

>From what I can tell, I need to enable the domain_remap function in the
proxy-server.conf file and also statically resolve the DNS name to a
specific bucket by inserting the correct AUTH account.

See here for the domain_remap middleware explanation..

https://docs.openstack.org/swift/latest/middleware.html

And here for additional DNS work that needs to be done..

https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/

Obviously a wildcard DNS server is required for this which is easy enough
to implement. However, the steps for Openstack Swift to support this are
not very clear. I'm hoping someone else went through the pain of figuring
this out already :)


Any help with this would be greatly appreciated!


Kind Regards

Nishaan Docrat
Client Technical Specialist - Storage Systems
IBM Systems Hardware

Work: +27 (0)11 302 5001
Mobile: +27 (0)81 040 3793
Email: nishaan at za.ibm.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 18044196.jpg
Type: image/jpeg
Size: 35643 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200928/c6923e6c/attachment-0002.jpg>

From xhejtman at ics.muni.cz  Wed Sep 30 22:52:39 2020
From: xhejtman at ics.muni.cz (Lukas Hejtmanek)
Date: Wed, 30 Sep 2020 23:52:39 +0200
Subject: [gpfsug-discuss] put_cred bug
Message-ID: <20200930215239.GU1440758@ics.muni.cz>

Hello,

is this bug already resolved? 

https://access.redhat.com/solutions/3132971

I think, I'm seeing it even with latest gpfs 5.0.5.2

[1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256

maybe also related:
[ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168
[ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP


-- 
Luk?? Hejtm?nek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title