From chair at spectrumscale.org Tue Sep 1 09:17:12 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Tue, 01 Sep 2020 09:17:12 +0100 Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File Create and MMAP performance Message-ID: <> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: meeting.ics Type: text/calendar Size: 2596 bytes Desc: not available URL: From joe at excelero.com Tue Sep 1 14:39:47 2020 From: joe at excelero.com (joe at excelero.com) Date: Tue, 1 Sep 2020 08:39:47 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From russell at nordquist.info Wed Sep 2 15:38:35 2020 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 2 Sep 2020 10:38:35 -0400 Subject: [gpfsug-discuss] data replicas and metadata space used Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> I was reading this slide deck on GPFS metadata sizing and I ran across something http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf On slide 51 it says "Max replicas for Data, multiplies the MD capacity used ? Reserves space in MD for the replicas even if no files replicated!? This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise. This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? thanks Russell -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Wed Sep 2 21:28:55 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Wed, 2 Sep 2020 22:28:55 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> I am curious to know about AMD epyc support by GPFS: what is the status? Giovanni Bracco On 28/08/20 14:25, Frederick Stock wrote: > Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) > processors.? You may want to open a help case to determine the cause of > this problem. > Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 > so you may want to consider upgrading your cluster.? And should Scale > officially support the AMD epyc processor it would not be on Scale 4.2.x. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > ----- Original message ----- > From: Philipp Helo Rehs > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > Date: Fri, Aug 28, 2020 5:52 AM > Hello, > > we have a gpfs v4 cluster running with 4 nsds and i am trying to add > some clients: > > mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > this commands hangs and do not finish > > When i look into the server, i can see the following processes which > never finish: > > root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3 > lc/setupClient > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2 > 21479 > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > 0 1191 > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > The node is an AMD epyc. > > Any idea what could cause the issue? > > ssh is possible in both directions and firewall is disabled. > > > Kind regards > > ?Philipp Rehs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Wed Sep 2 23:28:34 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 2 Sep 2020 22:28:34 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>, <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Sep 3 05:00:38 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 3 Sep 2020 04:00:38 +0000 Subject: [gpfsug-discuss] data replicas and metadata space used In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> Message-ID: An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Thu Sep 3 08:44:29 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Thu, 3 Sep 2020 09:44:29 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> OK from client side, but I would like to know if the same is also for NSD servers with AMD EPYC, do they operate with good performance compared to Intel CPUs? Giovanni On 03/09/20 00:28, Andrew Beattie wrote: > Giovanni, > I have clients in Australia that are running AMD ROME processors in > their Visualisation nodes connected to scale 5.0.4 clusters with no issues. > Spectrum Scale doesn't differentiate between x86 processor technologies > -- it only looks at x86_64 (OS support more than anything else) > Andrew Beattie > File and Object Storage Technical Specialist - A/NZ > IBM Systems - Storage > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Giovanni Bracco > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list , > Frederick Stock > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what > about AMD epyc support in GPFS? > Date: Thu, Sep 3, 2020 7:29 AM > I am curious to know about AMD epyc support by GPFS: what is the status? > Giovanni Bracco > > On 28/08/20 14:25, Frederick Stock wrote: > > Not sure that Spectrum Scale has stated it supports the AMD epyc > (Rome?) > > processors.? You may want to open a help case to determine the > cause of > > this problem. > > Note that Spectrum Scale 4.2.x goes out of service on September > 30, 2020 > > so you may want to consider upgrading your cluster.? And should Scale > > officially support the AMD epyc processor it would not be on > Scale 4.2.x. > > > > Fred > > __________________________________________________ > > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > > stockf at us.ibm.com > > > > ? ? ----- Original message ----- > > ? ? From: Philipp Helo Rehs > > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? To: gpfsug main discussion list > > > ? ? Cc: > > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > > ? ? Date: Fri, Aug 28, 2020 5:52 AM > > ? ? Hello, > > > > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying > to add > > ? ? some clients: > > > > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > > > ? ? this commands hangs and do not finish > > > > ? ? When i look into the server, i can see the following > processes which > > ? ? never finish: > > > > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote > checkNewClusterNode3 > > ? ? lc/setupClient > > > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl > setupClient 2 > > ? ? 21479 > > > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > > ? ? 0 1191 > > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > > > ? ? The node is an AMD epyc. > > > > ? ? Any idea what could cause the issue? > > > > ? ? ssh is possible in both directions and firewall is disabled. > > > > > > ? ? Kind regards > > > > ? ? ??Philipp Rehs > > > > > > ? ? _______________________________________________ > > ? ? gpfsug-discuss mailing list > > ? ? gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Giovanni Bracco > phone ?+39 351 8804788 > E-mail ?giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Thu Sep 3 09:10:38 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 3 Sep 2020 08:10:38 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> Message-ID: I don?t currently have any x86 based servers to do that kind of performance testing, But the PCI-Gen 4 advantages alone mean that the AMD server options have significant benefits over current Intel processor platforms. There are however limited storage controllers and Network adapters that can help utilise the full benefits of PCI-gen4. In terms of NSD architecture there are many variables that you also have to take into consideration. Are you looking at storage rich servers? Are you looking at SAN attached Flash Are you looking at scale ECE type deployment? As an IBM employee and someone familiar with ESS 5000, and the differences / benefits of the 5K architecture, Unless your planning on building a Scale ECE type cluster with AMD processors, storage class memory, and NVMe flash modules. I would seriously consider the ESS 5k over an x86 based NL-SAS storage topology Including AMD. Sent from my iPhone > On 3 Sep 2020, at 17:44, Giovanni Bracco wrote: > > ?OK from client side, but I would like to know if the same is also for > NSD servers with AMD EPYC, do they operate with good performance > compared to Intel CPUs? > > Giovanni > >> On 03/09/20 00:28, Andrew Beattie wrote: >> Giovanni, >> I have clients in Australia that are running AMD ROME processors in >> their Visualisation nodes connected to scale 5.0.4 clusters with no issues. >> Spectrum Scale doesn't differentiate between x86 processor technologies >> -- it only looks at x86_64 (OS support more than anything else) >> Andrew Beattie >> File and Object Storage Technical Specialist - A/NZ >> IBM Systems - Storage >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> ----- Original message ----- >> From: Giovanni Bracco >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list , >> Frederick Stock >> Cc: >> Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what >> about AMD epyc support in GPFS? >> Date: Thu, Sep 3, 2020 7:29 AM >> I am curious to know about AMD epyc support by GPFS: what is the status? >> Giovanni Bracco >> >>> On 28/08/20 14:25, Frederick Stock wrote: >>> Not sure that Spectrum Scale has stated it supports the AMD epyc >> (Rome?) >>> processors. You may want to open a help case to determine the >> cause of >>> this problem. >>> Note that Spectrum Scale 4.2.x goes out of service on September >> 30, 2020 >>> so you may want to consider upgrading your cluster. And should Scale >>> officially support the AMD epyc processor it would not be on >> Scale 4.2.x. >>> >>> Fred >>> __________________________________________________ >>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >>> stockf at us.ibm.com >>> >>> ----- Original message ----- >>> From: Philipp Helo Rehs >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >> >>> Cc: >>> Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck >>> Date: Fri, Aug 28, 2020 5:52 AM >>> Hello, >>> >>> we have a gpfs v4 cluster running with 4 nsds and i am trying >> to add >>> some clients: >>> >>> mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 >>> >>> this commands hangs and do not finish >>> >>> When i look into the server, i can see the following >> processes which >>> never finish: >>> >>> root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote >> checkNewClusterNode3 >>> lc/setupClient >>> >> %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: >>> %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 >>> root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl >> setupClient 2 >>> 21479 >>> >> 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 >>> 0 1191 >>> root 38212 100 0.0 35544 5752 ? R 11:32 9:40 >>> /usr/lpp/mmfs/bin/tsgskkm store --cert >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off >>> >>> The node is an AMD epyc. >>> >>> Any idea what could cause the issue? >>> >>> ssh is possible in both directions and firewall is disabled. >>> >>> >>> Kind regards >>> >>> Philipp Rehs >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bracco at enea.it >> WWW http://www.afs.enea.it/bracco >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 4 08:56:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 4 Sep 2020 08:56:41 +0100 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 4 10:02:29 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 4 Sep 2020 09:02:29 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk> Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet): https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ ... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. Simon ?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oluwasijibomi.saula at ndsu.edu Fri Sep 4 17:03:17 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Fri, 4 Sep 2020 16:03:17 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server Message-ID: Hello GPFS Experts, Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers? I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing. Thoughts? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Sep 7 14:29:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 7 Sep 2020 13:29:59 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Sep 8 04:09:07 2020 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 8 Sep 2020 03:09:07 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 8 14:04:26 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 09:04:26 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think a better metaphor is that the bridge we just crossed has collapsed and as long as we do not need to cross it again our journey should reach its intended destination :-) As I understand the intent of this message is to alert the user (and our support teams) that the directory from which a command was executed no longer exist. Should that be of consequence to the execution of the command then failure is not unexpected, however, many commands do not make use of the current directory so they likely will succeed. If you consider the view point of a command failing because the working directory was removed, but not knowing that was the root cause, I think you can see why this message was added into the administration infrastructure. It allows this odd failure scenario to be quickly recognized saving time for both the user and IBM support, in tracking down the root cause. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/07/2020 09:29 AM Subject: [EXTERNAL] [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 8 17:10:59 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Sep 2020 17:10:59 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) ?As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. ?Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. ?If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. ?It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Sep 8 18:37:59 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 13:37:59 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think it is incorrect to assume that a command that continues after detecting the working directory has been removed is going to cause damage to the file system. Further, there is no a priori means to confirm if the lack of a working directory will cause the command to fail. I will agree that there may be admins that would prefer the command fail fast and allow them to restart the command anew, but I suspect there are admins that prefer the command press ahead in hopes that it can complete successfully and not require another execution. I'm sure we can conjure scenarios that support both points of view. Perhaps what is desired is a message that more clearly describes what is being undertaken. For example, "The current working directory, , no longer exists. Execution continues." Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 09/08/2020 12:10 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Tue Sep 8 23:46:08 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 8 Sep 2020 22:46:08 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Message-ID: Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Sep 9 12:02:53 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 9 Sep 2020 12:02:53 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> On 08/09/2020 18:37, IBM Spectrum Scale wrote: > I think it is incorrect to assume that a command that continues > after detecting the working directory has been removed is going to > cause damage to the file system. No I am not assuming it will cause damage. I am making the fairly reasonable assumption that any command which fails has an increased probability of causing damage to the file system over one that completes successfully. > Further, there is no a priori means to confirm if the lack of a > working directory will cause the command to fail. Which is why baling out is a more sensible default that ploughing on regardless. > I will agree that there may be admins that would prefer the command > fail fast and allow them to restart the command anew, but I suspect > there are admins that prefer the command press ahead in hopes that > it can complete successfully and not require another execution. I am sure that there are inexperienced admins who have yet to be battle scared that would want such reckless default behaviour. Pandering to their naivety is not a sensible approach IMHO. The downside if a large file system (and production GPFS file systems tend to be large) going "puff" is so massive that the precaution principle should apply. One wonders if we are seeing the difference between a US and European mindset here. > I'm sure we can conjure scenarios that support both points of view. > Perhaps what is desired is a message that more clearly describes what > is being undertaken. For example, "The current working directory, > , no longer exists. Execution continues." > That is what --force is for. If you are sufficiently reckless that you want something to continue in the event of a possible error you have the option to stick that on every command you run. Meanwhile the sane admins get a system that defaults to proceeding in the safer manner possible. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Sep 9 15:04:27 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Sep 2020 07:04:27 -0700 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: > On 08/09/2020 18:37, IBM Spectrum Scale wrote: > > I think it is incorrect to assume that a command that continues > > after detecting the working directory has been removed is going to > > cause damage to the file system. > > No I am not assuming it will cause damage. I am making the fairly reasonable > assumption that any command which fails has an increased probability of > causing damage to the file system over one that completes successfully. I think there is another angle here, which is that this command's output has the possibility of triggering an "oh ----" (fill in your preferred colorful metaphor here) moment, followed up by a panicked Ctrl-C. That reaction has the possibility of causing its own problems (i.e. not sure if mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). I'm with Jonathan here: the command should fail with an informative message, and the admin can correct the problem (just cd somewhere else). -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From carlz at us.ibm.com Thu Sep 10 13:55:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 10 Sep 2020 12:55:25 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com> Jonathan, Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish. I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.) Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1291474181] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From cblack at nygenome.org Thu Sep 10 16:55:46 2020 From: cblack at nygenome.org (Christopher Black) Date: Thu, 10 Sep 2020 15:55:46 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools. We see no grep hits for ?not trust? in our last few sklm_audit.log files. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Tuesday, September 8, 2020 at 7:10 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Sep 11 15:25:55 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 11 Sep 2020 10:25:55 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> > On Sep 9, 2020, at 10:04 AM, Skylar Thompson wrote: > > On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>> I think it is incorrect to assume that a command that continues >>> after detecting the working directory has been removed is going to >>> cause damage to the file system. >> >> No I am not assuming it will cause damage. I am making the fairly reasonable >> assumption that any command which fails has an increased probability of >> causing damage to the file system over one that completes successfully. > > I think there is another angle here, which is that this command's output > has the possibility of triggering an "oh ----" (fill in your preferred > colorful metaphor here) moment, followed up by a panicked Ctrl-C. That > reaction has the possibility of causing its own problems (i.e. not sure if > mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). > I'm with Jonathan here: the command should fail with an informative > message, and the admin can correct the problem (just cd somewhere else). > I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way. It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens. Maybe I?ve got a blind spot here... -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Sep 11 19:47:52 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 11 Sep 2020 14:47:52 -0400 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We have spectrum archive with encryption on disk and tape. We get maybe a 100 or so messages like this daily. It would be nice if message had some information about which client is the issue. We have had client certs expire in the past. The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me. I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: > Ran into something a good while back and I'm curious how many others this > affects. If folks with encryption enabled could run a quick word count on > their SKLM server and reply with a rough count I'd appreciate it. > I've gone round and round with IBM SKLM support over the last year on this > and it just has me wondering. This is one of those "morbidly curious about > making the sausage" things. > > Looking to see if this is a normal error message folks are seeing. Just > find your daily, rotating audit log and search it. I'll trust most folks > to figure this out, but let me know if you need help. > Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit > If you are on a normal linux box try something like: "locate > sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client > certificate" {} |wc " or whatever works for you. If your audit log is > fairly fresh, you might want to check the previous one. I do NOT need > exact information, just 'yeah we get 12million out a 500MB file' or ' we > get zero', or something like that. > > Mostly I'm curious if folks get zero, or a large number. I've got my > logs adjusted to 500MB and I get 8 digit numbers out of the previous log. > Yet things work perfectly. I've talked to two other SS sites I know the > admins personally, and they get larger numbers than I do. But it's such a > tiny sample size! LOL > > Ed Wahl > Ohio Supercomputer Center > > Apologies for the message formatting issues. Outlook fought tooth and > nail against sending it with the path as is, and kept breaking my > paragraphs. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 11 20:53:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 11 Sep 2020 20:53:45 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk> On 11/09/2020 15:25, Stephen Ulmer wrote: > >> On Sep 9, 2020, at 10:04 AM, Skylar Thompson > > wrote: >> >> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >>> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>>> I think it is incorrect to assume that a command that continues >>>> after detecting the working directory has been removed is going to >>>> cause damage to the file system. >>> >>> No I am not assuming it will cause damage. I am making the fairly >>> reasonable >>> assumption that any command which fails has an increased probability of >>> causing damage to the file system over one that completes successfully. >> >> I think there is another angle here, which is that this command's output >> has the possibility of triggering an "oh ----" (fill in your preferred >> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That >> reaction has the possibility of causing its own problems (i.e. not sure if >> mmafmctl touches CCR, but aborting it midway could leave CCR >> inconsistent). >> I'm with Jonathan here: the command should fail with an informative >> message, and the admin can correct the problem (just cd somewhere else). >> > > I?m now (genuinely) curious as to?what?Spectrum Scale commands > *actually* depend on the working directory existing and why. They > shouldn?t depend on anything but existing well-known directories (logs, > SDR, /tmp, et cetera) and any file or directories passed as arguments to > the command. This is the Unix way. > > It seems like the *right* solution is to armor commands against doing > something ?bad? if they lose a resource required to complete their task. > If $PWD goes away because an admin?s home goes away in the middle of a > long restripe, it?s better to complete the work and let them look in the > logs. It's not Scale?s problem if something not affecting its work happens. > > Maybe I?ve got a blind spot here... > This jogged my memory that best practice would be to have a call to chdir to set the working directory to "/" very early on. Before anything critical is started. I am 99.999% sure that its covered in Steven's (can't check as I am away for the weekend) so really there is no excuse. If / goes away then really really bad things have happened and it all sort of becomes moot anyway. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Mon Sep 14 06:27:58 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 14 Sep 2020 13:27:58 +0800 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: Hi Eric, Please help me to understand your question. You have Spectrum Archive and Spectrum Scale in your system, and both of them are connected to IBM SKLM for encryption. Now you got lots of error/warning message from SKLM log. Now you want to understand which component, Scale or Archive, makes the SKLM print those error message, right? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 2020/09/12 02:47 Subject: [EXTERNAL] Re: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Sent by: gpfsug-discuss-bounces at spectrumscale.org We have spectrum archive with encryption on disk and tape.? ?We get maybe a 100 or so messages like this daily.? It would be nice if message had some information about which client is the issue. We have had client certs expire in the past.? The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me.? I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: ?Ran into something a good while back and I'm curious how many others this affects.?? If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate?it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.? This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing.? Just find your daily, rotating audit log and search it.? I'll trust most folks to figure this?out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like:? "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "? or whatever works for you.?? If your audit log is fairly fresh, you might want to check the previous one.?? I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. ?Mostly I'm curious if folks get zero, or a large number.? I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.?? Yet things work perfectly.??? I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues.? Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From u.sibiller at science-computing.de Mon Sep 14 13:09:12 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 14 Sep 2020 14:09:12 +0200 Subject: [gpfsug-discuss] tsgskkm stuck In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: On 8/28/20 11:43 AM, Philipp Helo Rehs wrote: > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off Judging from the command line tsgskkm will generate a certificate which normally involves a random number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying to generate some I/O on the node. Or run something like haveged (https://wiki.archlinux.org/index.php/Haveged). Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From S.J.Thompson at bham.ac.uk Fri Sep 18 11:52:51 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Sep 2020 10:52:51 +0000 Subject: [gpfsug-discuss] SSUG::Digital inode management, VCPU scaling and considerations for NUMA Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk> Number 5 in the SSUG::Digital talks set takes place 22 September 2020 Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads. This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2126 bytes Desc: not available URL: From joe at excelero.com Fri Sep 18 13:38:51 2020 From: joe at excelero.com (joe at excelero.com) Date: Fri, 18 Sep 2020 07:38:51 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16 Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:11:31 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:11:31 +0000 Subject: [gpfsug-discuss] CCR errors Message-ID: Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sat Sep 19 21:23:01 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 20:23:01 +0000 Subject: [gpfsug-discuss] CCR errors In-Reply-To: References: Message-ID: I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:52:19 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:52:19 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: Message-ID: Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sun Sep 20 00:45:41 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 23:45:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: , Message-ID: I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sun Sep 20 00:59:28 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 23:59:28 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19 In-Reply-To: References: Message-ID: Ryan, I appreciate your support - I finally got some on a WebEx now. I'll share any useful information I glean from the session. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 6:45:47 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Saula, Oluwasijibomi) 2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:52:19 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="us-ascii" Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 23:45:41 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="utf-8" I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 19 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 09:35:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 08:35:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 21 10:55:29 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 21 Sep 2020 09:55:29 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 11:32:25 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 10:32:25 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Sep 21 11:57:30 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 21 Sep 2020 16:27:30 +0530 Subject: [gpfsug-discuss] =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?= =?utf-8?q?l=09inflight?= In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 12:17:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 11:17:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch> Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 22 10:18:05 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Sep 2020 10:18:05 +0100 Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Sep 22 11:47:46 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Sep 2020 10:47:46 +0000 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk> We've always taken it to mean .. RHEL != CentOS 7.1 != 7.2 (though mostly down to the kernel). ppc64le != x86_64 But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __ Simon ?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Tue Sep 22 14:50:34 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 22 Sep 2020 06:50:34 -0700 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <20200922135034.6be42ykveio654sm@thargelion> We've used the same built RPMs (generally built on Intel) on Intel and AMD x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and haven't run into any problems. On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote: > > I have a question about using RPM's for the portability interface on > different CPU's. > > According to /usr/lpp/mmfs/src/README > > The generated RPM can ONLY be deployed to the machine with > identical architecture, distribution level, Linux kernel version > and GPFS version. > > So does this mean that if I have a heterogeneous cluster with some machines > on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all > using GPFS 5.0.5 I have to have different RPM's for the two CPU's? > > Or when it says "identical architecture" does it mean x86-64, ppc etc. and > not variations with the x86-64, ppc class? Assuming some minimum level is > met. > > Obviously the actual Linux kernel being stock RedHat would be the same on > every machine regardless of whether it's Skylake or Sandy Bridge, or even > for that matter an AMD processor. > > Consequently it seems strange that I would need different portability > interfaces. Would it help to generate the portability layer RPM's on a Sandy > Bridge machine and work no the presumption anything that runs on Sandy > Bridge will run on Skylake? > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From truongv at us.ibm.com Tue Sep 22 16:47:09 2020 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 22 Sep 2020 11:47:09 -0400 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: You are correct, the "identical architecture" means the same machine hardware name as shown by the -m option of the uname command. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/22/2020 05:18 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Checking if a AFM-managed file is still inflight (Dorigo Alvise (PSI)) 2. Portability interface (Jonathan Buzzard) ---------------------------------------------------------------------- Message: 1 Date: Mon, 21 Sep 2020 11:17:35 +0000 From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch> Content-Type: text/plain; charset="utf-8" Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html > ------------------------------ Message: 2 Date: Tue, 22 Sep 2020 10:18:05 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk> Content-Type: text/plain; charset=utf-8; format=flowed I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 23 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Sep 23 15:57:00 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Sep 2020 15:57:00 +0100 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk> On 22/09/2020 16:47, Truong Vu wrote: > You are correct, the "identical architecture" means the same machine > hardware name as shown by the -m option of the uname command. > Thanks for clearing that up. It just seemed something of a blindly obvious statement; surely nobody would expect an RPM for an Intel based machine to install on a PowerPC machine? that I though it might be referring to something else. I mean you can't actually install an x86_64 RPM on a ppc64le machine as the rpm command will bomb out telling you it is from an incompatible architecture if you try. It's why you have noarch packages which can be installed on anything. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 25 16:53:12 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Sep 2020 15:53:12 +0000 Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes and OpenShift environments with Spectrum Scale Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk> Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access. https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/ SSUG Host: Bill Anderson Speakers: Smita Raut (IBM) Harald Seipp (IBM) Renar Grunenberg Simon Thompson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2233 bytes Desc: not available URL: From joe at excelero.com Sat Sep 26 16:43:15 2020 From: joe at excelero.com (joe at excelero.com) Date: Sat, 26 Sep 2020 10:43:15 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From NISHAAN at za.ibm.com Mon Sep 28 09:09:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Mon, 28 Sep 2020 10:09:29 +0200 Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon S3 DNS-style (Virtual hosted) Bucket Addressing Message-ID: Hi All I need to find out if anyone has successfully been able to get our Openstack Swift implementation of the object protocol to support the AWS DNS-syle bucket naming convention. See here for an explanation https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html. AWS DNS-style bucket naming includes the bucket in the DNS name (eg. mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style bucket naming (eg. https://swift-cluster.example.com/v1/my_account/ container/object). >From what I can tell, I need to enable the domain_remap function in the proxy-server.conf file and also statically resolve the DNS name to a specific bucket by inserting the correct AUTH account. See here for the domain_remap middleware explanation.. https://docs.openstack.org/swift/latest/middleware.html And here for additional DNS work that needs to be done.. https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/ Obviously a wildcard DNS server is required for this which is easy enough to implement. However, the steps for Openstack Swift to support this are not very clear. I'm hoping someone else went through the pain of figuring this out already :) Any help with this would be greatly appreciated! Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18044196.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Sep 30 22:52:39 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 30 Sep 2020 23:52:39 +0200 Subject: [gpfsug-discuss] put_cred bug Message-ID: <20200930215239.GU1440758@ics.muni.cz> Hello, is this bug already resolved? https://access.redhat.com/solutions/3132971 I think, I'm seeing it even with latest gpfs 5.0.5.2 [1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256 maybe also related: [ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168 [ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From chair at spectrumscale.org Tue Sep 1 09:17:12 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Tue, 01 Sep 2020 09:17:12 +0100 Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File Create and MMAP performance Message-ID: <> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: meeting.ics Type: text/calendar Size: 2596 bytes Desc: not available URL: From joe at excelero.com Tue Sep 1 14:39:47 2020 From: joe at excelero.com (joe at excelero.com) Date: Tue, 1 Sep 2020 08:39:47 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From russell at nordquist.info Wed Sep 2 15:38:35 2020 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 2 Sep 2020 10:38:35 -0400 Subject: [gpfsug-discuss] data replicas and metadata space used Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> I was reading this slide deck on GPFS metadata sizing and I ran across something http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf On slide 51 it says "Max replicas for Data, multiplies the MD capacity used ? Reserves space in MD for the replicas even if no files replicated!? This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise. This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? thanks Russell -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Wed Sep 2 21:28:55 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Wed, 2 Sep 2020 22:28:55 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> I am curious to know about AMD epyc support by GPFS: what is the status? Giovanni Bracco On 28/08/20 14:25, Frederick Stock wrote: > Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) > processors.? You may want to open a help case to determine the cause of > this problem. > Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 > so you may want to consider upgrading your cluster.? And should Scale > officially support the AMD epyc processor it would not be on Scale 4.2.x. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > ----- Original message ----- > From: Philipp Helo Rehs > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > Date: Fri, Aug 28, 2020 5:52 AM > Hello, > > we have a gpfs v4 cluster running with 4 nsds and i am trying to add > some clients: > > mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > this commands hangs and do not finish > > When i look into the server, i can see the following processes which > never finish: > > root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3 > lc/setupClient > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2 > 21479 > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > 0 1191 > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > The node is an AMD epyc. > > Any idea what could cause the issue? > > ssh is possible in both directions and firewall is disabled. > > > Kind regards > > ?Philipp Rehs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Wed Sep 2 23:28:34 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 2 Sep 2020 22:28:34 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>, <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Sep 3 05:00:38 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 3 Sep 2020 04:00:38 +0000 Subject: [gpfsug-discuss] data replicas and metadata space used In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> Message-ID: An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Thu Sep 3 08:44:29 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Thu, 3 Sep 2020 09:44:29 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> OK from client side, but I would like to know if the same is also for NSD servers with AMD EPYC, do they operate with good performance compared to Intel CPUs? Giovanni On 03/09/20 00:28, Andrew Beattie wrote: > Giovanni, > I have clients in Australia that are running AMD ROME processors in > their Visualisation nodes connected to scale 5.0.4 clusters with no issues. > Spectrum Scale doesn't differentiate between x86 processor technologies > -- it only looks at x86_64 (OS support more than anything else) > Andrew Beattie > File and Object Storage Technical Specialist - A/NZ > IBM Systems - Storage > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Giovanni Bracco > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list , > Frederick Stock > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what > about AMD epyc support in GPFS? > Date: Thu, Sep 3, 2020 7:29 AM > I am curious to know about AMD epyc support by GPFS: what is the status? > Giovanni Bracco > > On 28/08/20 14:25, Frederick Stock wrote: > > Not sure that Spectrum Scale has stated it supports the AMD epyc > (Rome?) > > processors.? You may want to open a help case to determine the > cause of > > this problem. > > Note that Spectrum Scale 4.2.x goes out of service on September > 30, 2020 > > so you may want to consider upgrading your cluster.? And should Scale > > officially support the AMD epyc processor it would not be on > Scale 4.2.x. > > > > Fred > > __________________________________________________ > > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > > stockf at us.ibm.com > > > > ? ? ----- Original message ----- > > ? ? From: Philipp Helo Rehs > > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? To: gpfsug main discussion list > > > ? ? Cc: > > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > > ? ? Date: Fri, Aug 28, 2020 5:52 AM > > ? ? Hello, > > > > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying > to add > > ? ? some clients: > > > > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > > > ? ? this commands hangs and do not finish > > > > ? ? When i look into the server, i can see the following > processes which > > ? ? never finish: > > > > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote > checkNewClusterNode3 > > ? ? lc/setupClient > > > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl > setupClient 2 > > ? ? 21479 > > > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > > ? ? 0 1191 > > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > > > ? ? The node is an AMD epyc. > > > > ? ? Any idea what could cause the issue? > > > > ? ? ssh is possible in both directions and firewall is disabled. > > > > > > ? ? Kind regards > > > > ? ? ??Philipp Rehs > > > > > > ? ? _______________________________________________ > > ? ? gpfsug-discuss mailing list > > ? ? gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Giovanni Bracco > phone ?+39 351 8804788 > E-mail ?giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Thu Sep 3 09:10:38 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 3 Sep 2020 08:10:38 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> Message-ID: I don?t currently have any x86 based servers to do that kind of performance testing, But the PCI-Gen 4 advantages alone mean that the AMD server options have significant benefits over current Intel processor platforms. There are however limited storage controllers and Network adapters that can help utilise the full benefits of PCI-gen4. In terms of NSD architecture there are many variables that you also have to take into consideration. Are you looking at storage rich servers? Are you looking at SAN attached Flash Are you looking at scale ECE type deployment? As an IBM employee and someone familiar with ESS 5000, and the differences / benefits of the 5K architecture, Unless your planning on building a Scale ECE type cluster with AMD processors, storage class memory, and NVMe flash modules. I would seriously consider the ESS 5k over an x86 based NL-SAS storage topology Including AMD. Sent from my iPhone > On 3 Sep 2020, at 17:44, Giovanni Bracco wrote: > > ?OK from client side, but I would like to know if the same is also for > NSD servers with AMD EPYC, do they operate with good performance > compared to Intel CPUs? > > Giovanni > >> On 03/09/20 00:28, Andrew Beattie wrote: >> Giovanni, >> I have clients in Australia that are running AMD ROME processors in >> their Visualisation nodes connected to scale 5.0.4 clusters with no issues. >> Spectrum Scale doesn't differentiate between x86 processor technologies >> -- it only looks at x86_64 (OS support more than anything else) >> Andrew Beattie >> File and Object Storage Technical Specialist - A/NZ >> IBM Systems - Storage >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> ----- Original message ----- >> From: Giovanni Bracco >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list , >> Frederick Stock >> Cc: >> Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what >> about AMD epyc support in GPFS? >> Date: Thu, Sep 3, 2020 7:29 AM >> I am curious to know about AMD epyc support by GPFS: what is the status? >> Giovanni Bracco >> >>> On 28/08/20 14:25, Frederick Stock wrote: >>> Not sure that Spectrum Scale has stated it supports the AMD epyc >> (Rome?) >>> processors. You may want to open a help case to determine the >> cause of >>> this problem. >>> Note that Spectrum Scale 4.2.x goes out of service on September >> 30, 2020 >>> so you may want to consider upgrading your cluster. And should Scale >>> officially support the AMD epyc processor it would not be on >> Scale 4.2.x. >>> >>> Fred >>> __________________________________________________ >>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >>> stockf at us.ibm.com >>> >>> ----- Original message ----- >>> From: Philipp Helo Rehs >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >> >>> Cc: >>> Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck >>> Date: Fri, Aug 28, 2020 5:52 AM >>> Hello, >>> >>> we have a gpfs v4 cluster running with 4 nsds and i am trying >> to add >>> some clients: >>> >>> mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 >>> >>> this commands hangs and do not finish >>> >>> When i look into the server, i can see the following >> processes which >>> never finish: >>> >>> root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote >> checkNewClusterNode3 >>> lc/setupClient >>> >> %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: >>> %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 >>> root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl >> setupClient 2 >>> 21479 >>> >> 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 >>> 0 1191 >>> root 38212 100 0.0 35544 5752 ? R 11:32 9:40 >>> /usr/lpp/mmfs/bin/tsgskkm store --cert >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off >>> >>> The node is an AMD epyc. >>> >>> Any idea what could cause the issue? >>> >>> ssh is possible in both directions and firewall is disabled. >>> >>> >>> Kind regards >>> >>> Philipp Rehs >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bracco at enea.it >> WWW http://www.afs.enea.it/bracco >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 4 08:56:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 4 Sep 2020 08:56:41 +0100 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 4 10:02:29 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 4 Sep 2020 09:02:29 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk> Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet): https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ ... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. Simon ?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oluwasijibomi.saula at ndsu.edu Fri Sep 4 17:03:17 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Fri, 4 Sep 2020 16:03:17 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server Message-ID: Hello GPFS Experts, Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers? I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing. Thoughts? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Sep 7 14:29:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 7 Sep 2020 13:29:59 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Sep 8 04:09:07 2020 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 8 Sep 2020 03:09:07 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 8 14:04:26 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 09:04:26 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think a better metaphor is that the bridge we just crossed has collapsed and as long as we do not need to cross it again our journey should reach its intended destination :-) As I understand the intent of this message is to alert the user (and our support teams) that the directory from which a command was executed no longer exist. Should that be of consequence to the execution of the command then failure is not unexpected, however, many commands do not make use of the current directory so they likely will succeed. If you consider the view point of a command failing because the working directory was removed, but not knowing that was the root cause, I think you can see why this message was added into the administration infrastructure. It allows this odd failure scenario to be quickly recognized saving time for both the user and IBM support, in tracking down the root cause. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/07/2020 09:29 AM Subject: [EXTERNAL] [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 8 17:10:59 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Sep 2020 17:10:59 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) ?As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. ?Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. ?If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. ?It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Sep 8 18:37:59 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 13:37:59 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think it is incorrect to assume that a command that continues after detecting the working directory has been removed is going to cause damage to the file system. Further, there is no a priori means to confirm if the lack of a working directory will cause the command to fail. I will agree that there may be admins that would prefer the command fail fast and allow them to restart the command anew, but I suspect there are admins that prefer the command press ahead in hopes that it can complete successfully and not require another execution. I'm sure we can conjure scenarios that support both points of view. Perhaps what is desired is a message that more clearly describes what is being undertaken. For example, "The current working directory, , no longer exists. Execution continues." Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 09/08/2020 12:10 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Tue Sep 8 23:46:08 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 8 Sep 2020 22:46:08 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Message-ID: Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Sep 9 12:02:53 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 9 Sep 2020 12:02:53 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> On 08/09/2020 18:37, IBM Spectrum Scale wrote: > I think it is incorrect to assume that a command that continues > after detecting the working directory has been removed is going to > cause damage to the file system. No I am not assuming it will cause damage. I am making the fairly reasonable assumption that any command which fails has an increased probability of causing damage to the file system over one that completes successfully. > Further, there is no a priori means to confirm if the lack of a > working directory will cause the command to fail. Which is why baling out is a more sensible default that ploughing on regardless. > I will agree that there may be admins that would prefer the command > fail fast and allow them to restart the command anew, but I suspect > there are admins that prefer the command press ahead in hopes that > it can complete successfully and not require another execution. I am sure that there are inexperienced admins who have yet to be battle scared that would want such reckless default behaviour. Pandering to their naivety is not a sensible approach IMHO. The downside if a large file system (and production GPFS file systems tend to be large) going "puff" is so massive that the precaution principle should apply. One wonders if we are seeing the difference between a US and European mindset here. > I'm sure we can conjure scenarios that support both points of view. > Perhaps what is desired is a message that more clearly describes what > is being undertaken. For example, "The current working directory, > , no longer exists. Execution continues." > That is what --force is for. If you are sufficiently reckless that you want something to continue in the event of a possible error you have the option to stick that on every command you run. Meanwhile the sane admins get a system that defaults to proceeding in the safer manner possible. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Sep 9 15:04:27 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Sep 2020 07:04:27 -0700 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: > On 08/09/2020 18:37, IBM Spectrum Scale wrote: > > I think it is incorrect to assume that a command that continues > > after detecting the working directory has been removed is going to > > cause damage to the file system. > > No I am not assuming it will cause damage. I am making the fairly reasonable > assumption that any command which fails has an increased probability of > causing damage to the file system over one that completes successfully. I think there is another angle here, which is that this command's output has the possibility of triggering an "oh ----" (fill in your preferred colorful metaphor here) moment, followed up by a panicked Ctrl-C. That reaction has the possibility of causing its own problems (i.e. not sure if mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). I'm with Jonathan here: the command should fail with an informative message, and the admin can correct the problem (just cd somewhere else). -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From carlz at us.ibm.com Thu Sep 10 13:55:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 10 Sep 2020 12:55:25 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com> Jonathan, Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish. I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.) Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1291474181] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From cblack at nygenome.org Thu Sep 10 16:55:46 2020 From: cblack at nygenome.org (Christopher Black) Date: Thu, 10 Sep 2020 15:55:46 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools. We see no grep hits for ?not trust? in our last few sklm_audit.log files. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Tuesday, September 8, 2020 at 7:10 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Sep 11 15:25:55 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 11 Sep 2020 10:25:55 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> > On Sep 9, 2020, at 10:04 AM, Skylar Thompson wrote: > > On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>> I think it is incorrect to assume that a command that continues >>> after detecting the working directory has been removed is going to >>> cause damage to the file system. >> >> No I am not assuming it will cause damage. I am making the fairly reasonable >> assumption that any command which fails has an increased probability of >> causing damage to the file system over one that completes successfully. > > I think there is another angle here, which is that this command's output > has the possibility of triggering an "oh ----" (fill in your preferred > colorful metaphor here) moment, followed up by a panicked Ctrl-C. That > reaction has the possibility of causing its own problems (i.e. not sure if > mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). > I'm with Jonathan here: the command should fail with an informative > message, and the admin can correct the problem (just cd somewhere else). > I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way. It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens. Maybe I?ve got a blind spot here... -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Sep 11 19:47:52 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 11 Sep 2020 14:47:52 -0400 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We have spectrum archive with encryption on disk and tape. We get maybe a 100 or so messages like this daily. It would be nice if message had some information about which client is the issue. We have had client certs expire in the past. The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me. I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: > Ran into something a good while back and I'm curious how many others this > affects. If folks with encryption enabled could run a quick word count on > their SKLM server and reply with a rough count I'd appreciate it. > I've gone round and round with IBM SKLM support over the last year on this > and it just has me wondering. This is one of those "morbidly curious about > making the sausage" things. > > Looking to see if this is a normal error message folks are seeing. Just > find your daily, rotating audit log and search it. I'll trust most folks > to figure this out, but let me know if you need help. > Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit > If you are on a normal linux box try something like: "locate > sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client > certificate" {} |wc " or whatever works for you. If your audit log is > fairly fresh, you might want to check the previous one. I do NOT need > exact information, just 'yeah we get 12million out a 500MB file' or ' we > get zero', or something like that. > > Mostly I'm curious if folks get zero, or a large number. I've got my > logs adjusted to 500MB and I get 8 digit numbers out of the previous log. > Yet things work perfectly. I've talked to two other SS sites I know the > admins personally, and they get larger numbers than I do. But it's such a > tiny sample size! LOL > > Ed Wahl > Ohio Supercomputer Center > > Apologies for the message formatting issues. Outlook fought tooth and > nail against sending it with the path as is, and kept breaking my > paragraphs. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 11 20:53:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 11 Sep 2020 20:53:45 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk> On 11/09/2020 15:25, Stephen Ulmer wrote: > >> On Sep 9, 2020, at 10:04 AM, Skylar Thompson > > wrote: >> >> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >>> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>>> I think it is incorrect to assume that a command that continues >>>> after detecting the working directory has been removed is going to >>>> cause damage to the file system. >>> >>> No I am not assuming it will cause damage. I am making the fairly >>> reasonable >>> assumption that any command which fails has an increased probability of >>> causing damage to the file system over one that completes successfully. >> >> I think there is another angle here, which is that this command's output >> has the possibility of triggering an "oh ----" (fill in your preferred >> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That >> reaction has the possibility of causing its own problems (i.e. not sure if >> mmafmctl touches CCR, but aborting it midway could leave CCR >> inconsistent). >> I'm with Jonathan here: the command should fail with an informative >> message, and the admin can correct the problem (just cd somewhere else). >> > > I?m now (genuinely) curious as to?what?Spectrum Scale commands > *actually* depend on the working directory existing and why. They > shouldn?t depend on anything but existing well-known directories (logs, > SDR, /tmp, et cetera) and any file or directories passed as arguments to > the command. This is the Unix way. > > It seems like the *right* solution is to armor commands against doing > something ?bad? if they lose a resource required to complete their task. > If $PWD goes away because an admin?s home goes away in the middle of a > long restripe, it?s better to complete the work and let them look in the > logs. It's not Scale?s problem if something not affecting its work happens. > > Maybe I?ve got a blind spot here... > This jogged my memory that best practice would be to have a call to chdir to set the working directory to "/" very early on. Before anything critical is started. I am 99.999% sure that its covered in Steven's (can't check as I am away for the weekend) so really there is no excuse. If / goes away then really really bad things have happened and it all sort of becomes moot anyway. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Mon Sep 14 06:27:58 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 14 Sep 2020 13:27:58 +0800 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: Hi Eric, Please help me to understand your question. You have Spectrum Archive and Spectrum Scale in your system, and both of them are connected to IBM SKLM for encryption. Now you got lots of error/warning message from SKLM log. Now you want to understand which component, Scale or Archive, makes the SKLM print those error message, right? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 2020/09/12 02:47 Subject: [EXTERNAL] Re: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Sent by: gpfsug-discuss-bounces at spectrumscale.org We have spectrum archive with encryption on disk and tape.? ?We get maybe a 100 or so messages like this daily.? It would be nice if message had some information about which client is the issue. We have had client certs expire in the past.? The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me.? I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: ?Ran into something a good while back and I'm curious how many others this affects.?? If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate?it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.? This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing.? Just find your daily, rotating audit log and search it.? I'll trust most folks to figure this?out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like:? "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "? or whatever works for you.?? If your audit log is fairly fresh, you might want to check the previous one.?? I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. ?Mostly I'm curious if folks get zero, or a large number.? I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.?? Yet things work perfectly.??? I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues.? Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From u.sibiller at science-computing.de Mon Sep 14 13:09:12 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 14 Sep 2020 14:09:12 +0200 Subject: [gpfsug-discuss] tsgskkm stuck In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: On 8/28/20 11:43 AM, Philipp Helo Rehs wrote: > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off Judging from the command line tsgskkm will generate a certificate which normally involves a random number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying to generate some I/O on the node. Or run something like haveged (https://wiki.archlinux.org/index.php/Haveged). Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From S.J.Thompson at bham.ac.uk Fri Sep 18 11:52:51 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Sep 2020 10:52:51 +0000 Subject: [gpfsug-discuss] SSUG::Digital inode management, VCPU scaling and considerations for NUMA Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk> Number 5 in the SSUG::Digital talks set takes place 22 September 2020 Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads. This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2126 bytes Desc: not available URL: From joe at excelero.com Fri Sep 18 13:38:51 2020 From: joe at excelero.com (joe at excelero.com) Date: Fri, 18 Sep 2020 07:38:51 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16 Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:11:31 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:11:31 +0000 Subject: [gpfsug-discuss] CCR errors Message-ID: Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sat Sep 19 21:23:01 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 20:23:01 +0000 Subject: [gpfsug-discuss] CCR errors In-Reply-To: References: Message-ID: I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:52:19 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:52:19 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: Message-ID: Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sun Sep 20 00:45:41 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 23:45:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: , Message-ID: I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sun Sep 20 00:59:28 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 23:59:28 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19 In-Reply-To: References: Message-ID: Ryan, I appreciate your support - I finally got some on a WebEx now. I'll share any useful information I glean from the session. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 6:45:47 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Saula, Oluwasijibomi) 2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:52:19 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="us-ascii" Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 23:45:41 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="utf-8" I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 19 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 09:35:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 08:35:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 21 10:55:29 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 21 Sep 2020 09:55:29 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 11:32:25 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 10:32:25 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Sep 21 11:57:30 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 21 Sep 2020 16:27:30 +0530 Subject: [gpfsug-discuss] =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?= =?utf-8?q?l=09inflight?= In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 12:17:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 11:17:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch> Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 22 10:18:05 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Sep 2020 10:18:05 +0100 Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Sep 22 11:47:46 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Sep 2020 10:47:46 +0000 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk> We've always taken it to mean .. RHEL != CentOS 7.1 != 7.2 (though mostly down to the kernel). ppc64le != x86_64 But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __ Simon ?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Tue Sep 22 14:50:34 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 22 Sep 2020 06:50:34 -0700 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <20200922135034.6be42ykveio654sm@thargelion> We've used the same built RPMs (generally built on Intel) on Intel and AMD x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and haven't run into any problems. On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote: > > I have a question about using RPM's for the portability interface on > different CPU's. > > According to /usr/lpp/mmfs/src/README > > The generated RPM can ONLY be deployed to the machine with > identical architecture, distribution level, Linux kernel version > and GPFS version. > > So does this mean that if I have a heterogeneous cluster with some machines > on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all > using GPFS 5.0.5 I have to have different RPM's for the two CPU's? > > Or when it says "identical architecture" does it mean x86-64, ppc etc. and > not variations with the x86-64, ppc class? Assuming some minimum level is > met. > > Obviously the actual Linux kernel being stock RedHat would be the same on > every machine regardless of whether it's Skylake or Sandy Bridge, or even > for that matter an AMD processor. > > Consequently it seems strange that I would need different portability > interfaces. Would it help to generate the portability layer RPM's on a Sandy > Bridge machine and work no the presumption anything that runs on Sandy > Bridge will run on Skylake? > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From truongv at us.ibm.com Tue Sep 22 16:47:09 2020 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 22 Sep 2020 11:47:09 -0400 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: You are correct, the "identical architecture" means the same machine hardware name as shown by the -m option of the uname command. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/22/2020 05:18 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Checking if a AFM-managed file is still inflight (Dorigo Alvise (PSI)) 2. Portability interface (Jonathan Buzzard) ---------------------------------------------------------------------- Message: 1 Date: Mon, 21 Sep 2020 11:17:35 +0000 From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch> Content-Type: text/plain; charset="utf-8" Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html > ------------------------------ Message: 2 Date: Tue, 22 Sep 2020 10:18:05 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk> Content-Type: text/plain; charset=utf-8; format=flowed I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 23 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Sep 23 15:57:00 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Sep 2020 15:57:00 +0100 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk> On 22/09/2020 16:47, Truong Vu wrote: > You are correct, the "identical architecture" means the same machine > hardware name as shown by the -m option of the uname command. > Thanks for clearing that up. It just seemed something of a blindly obvious statement; surely nobody would expect an RPM for an Intel based machine to install on a PowerPC machine? that I though it might be referring to something else. I mean you can't actually install an x86_64 RPM on a ppc64le machine as the rpm command will bomb out telling you it is from an incompatible architecture if you try. It's why you have noarch packages which can be installed on anything. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 25 16:53:12 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Sep 2020 15:53:12 +0000 Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes and OpenShift environments with Spectrum Scale Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk> Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access. https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/ SSUG Host: Bill Anderson Speakers: Smita Raut (IBM) Harald Seipp (IBM) Renar Grunenberg Simon Thompson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2233 bytes Desc: not available URL: From joe at excelero.com Sat Sep 26 16:43:15 2020 From: joe at excelero.com (joe at excelero.com) Date: Sat, 26 Sep 2020 10:43:15 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From NISHAAN at za.ibm.com Mon Sep 28 09:09:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Mon, 28 Sep 2020 10:09:29 +0200 Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon S3 DNS-style (Virtual hosted) Bucket Addressing Message-ID: Hi All I need to find out if anyone has successfully been able to get our Openstack Swift implementation of the object protocol to support the AWS DNS-syle bucket naming convention. See here for an explanation https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html. AWS DNS-style bucket naming includes the bucket in the DNS name (eg. mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style bucket naming (eg. https://swift-cluster.example.com/v1/my_account/ container/object). >From what I can tell, I need to enable the domain_remap function in the proxy-server.conf file and also statically resolve the DNS name to a specific bucket by inserting the correct AUTH account. See here for the domain_remap middleware explanation.. https://docs.openstack.org/swift/latest/middleware.html And here for additional DNS work that needs to be done.. https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/ Obviously a wildcard DNS server is required for this which is easy enough to implement. However, the steps for Openstack Swift to support this are not very clear. I'm hoping someone else went through the pain of figuring this out already :) Any help with this would be greatly appreciated! Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18044196.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Sep 30 22:52:39 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 30 Sep 2020 23:52:39 +0200 Subject: [gpfsug-discuss] put_cred bug Message-ID: <20200930215239.GU1440758@ics.muni.cz> Hello, is this bug already resolved? https://access.redhat.com/solutions/3132971 I think, I'm seeing it even with latest gpfs 5.0.5.2 [1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256 maybe also related: [ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168 [ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From chair at spectrumscale.org Tue Sep 1 09:17:12 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Tue, 01 Sep 2020 09:17:12 +0100 Subject: [gpfsug-discuss] Update: [NEW DATE] SSUG::Digital Update on File Create and MMAP performance Message-ID: <> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: meeting.ics Type: text/calendar Size: 2596 bytes Desc: not available URL: From joe at excelero.com Tue Sep 1 14:39:47 2020 From: joe at excelero.com (joe at excelero.com) Date: Tue, 1 Sep 2020 08:39:47 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 1 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From russell at nordquist.info Wed Sep 2 15:38:35 2020 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 2 Sep 2020 10:38:35 -0400 Subject: [gpfsug-discuss] data replicas and metadata space used Message-ID: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> I was reading this slide deck on GPFS metadata sizing and I ran across something http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf On slide 51 it says "Max replicas for Data, multiplies the MD capacity used ? Reserves space in MD for the replicas even if no files replicated!? This is something I did not realize - setting data replicas to 2 or even 3 consumes metadata space even if you are not using the data replicas. For metadata replicas it says unused replica?s have little impact - great. I like to set data and metadata replica?s to 3 when I make a filesystem even when the initial replicas used are 1 since you never know what will change down the road. However this makes me wonder about that idea for the data replica?s - it?s really expensive metadata spacewise. This information was written prior to GPFSv5 when subblocks changed from only 32. Does it still hold true that unused data replica?s use metadata space with v5? thanks Russell -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Wed Sep 2 21:28:55 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Wed, 2 Sep 2020 22:28:55 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> I am curious to know about AMD epyc support by GPFS: what is the status? Giovanni Bracco On 28/08/20 14:25, Frederick Stock wrote: > Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) > processors.? You may want to open a help case to determine the cause of > this problem. > Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 > so you may want to consider upgrading your cluster.? And should Scale > officially support the AMD epyc processor it would not be on Scale 4.2.x. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > ----- Original message ----- > From: Philipp Helo Rehs > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > Date: Fri, Aug 28, 2020 5:52 AM > Hello, > > we have a gpfs v4 cluster running with 4 nsds and i am trying to add > some clients: > > mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > this commands hangs and do not finish > > When i look into the server, i can see the following processes which > never finish: > > root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3 > lc/setupClient > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2 > 21479 > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > 0 1191 > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > The node is an AMD epyc. > > Any idea what could cause the issue? > > ssh is possible in both directions and firewall is disabled. > > > Kind regards > > ?Philipp Rehs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Wed Sep 2 23:28:34 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 2 Sep 2020 22:28:34 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it>, <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Sep 3 05:00:38 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 3 Sep 2020 04:00:38 +0000 Subject: [gpfsug-discuss] data replicas and metadata space used In-Reply-To: <12198E19-AC4A-44C9-BE54-8482E85CE32B@nordquist.info> Message-ID: An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Thu Sep 3 08:44:29 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Thu, 3 Sep 2020 09:44:29 +0200 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> OK from client side, but I would like to know if the same is also for NSD servers with AMD EPYC, do they operate with good performance compared to Intel CPUs? Giovanni On 03/09/20 00:28, Andrew Beattie wrote: > Giovanni, > I have clients in Australia that are running AMD ROME processors in > their Visualisation nodes connected to scale 5.0.4 clusters with no issues. > Spectrum Scale doesn't differentiate between x86 processor technologies > -- it only looks at x86_64 (OS support more than anything else) > Andrew Beattie > File and Object Storage Technical Specialist - A/NZ > IBM Systems - Storage > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > ----- Original message ----- > From: Giovanni Bracco > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list , > Frederick Stock > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what > about AMD epyc support in GPFS? > Date: Thu, Sep 3, 2020 7:29 AM > I am curious to know about AMD epyc support by GPFS: what is the status? > Giovanni Bracco > > On 28/08/20 14:25, Frederick Stock wrote: > > Not sure that Spectrum Scale has stated it supports the AMD epyc > (Rome?) > > processors.? You may want to open a help case to determine the > cause of > > this problem. > > Note that Spectrum Scale 4.2.x goes out of service on September > 30, 2020 > > so you may want to consider upgrading your cluster.? And should Scale > > officially support the AMD epyc processor it would not be on > Scale 4.2.x. > > > > Fred > > __________________________________________________ > > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > > stockf at us.ibm.com > > > > ? ? ----- Original message ----- > > ? ? From: Philipp Helo Rehs > > ? ? Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? To: gpfsug main discussion list > > > ? ? Cc: > > ? ? Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > > ? ? Date: Fri, Aug 28, 2020 5:52 AM > > ? ? Hello, > > > > ? ? we have a gpfs v4 cluster running with 4 nsds and i am trying > to add > > ? ? some clients: > > > > ? ? mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > > > ? ? this commands hangs and do not finish > > > > ? ? When i look into the server, i can see the following > processes which > > ? ? never finish: > > > > ? ? root???? 38138? 0.0? 0.0 123048 10376 ???????? Ss?? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote > checkNewClusterNode3 > > ? ? lc/setupClient > > > %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > > ? ? %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > > ? ? root???? 38169? 0.0? 0.0 123564 10892 ???????? S??? 11:32?? 0:00 > > ? ? /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl > setupClient 2 > > ? ? 21479 > > > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > > ? ? 0 1191 > > ? ? root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > > ? ? /usr/lpp/mmfs/bin/tsgskkm store --cert > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > > ? ? /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > > > ? ? The node is an AMD epyc. > > > > ? ? Any idea what could cause the issue? > > > > ? ? ssh is possible in both directions and firewall is disabled. > > > > > > ? ? Kind regards > > > > ? ? ??Philipp Rehs > > > > > > ? ? _______________________________________________ > > ? ? gpfsug-discuss mailing list > > ? ? gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Giovanni Bracco > phone ?+39 351 8804788 > E-mail ?giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From abeattie at au1.ibm.com Thu Sep 3 09:10:38 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 3 Sep 2020 08:10:38 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <364e4801-0052-e427-832a-15cfd2ae1c3f@enea.it> Message-ID: I don?t currently have any x86 based servers to do that kind of performance testing, But the PCI-Gen 4 advantages alone mean that the AMD server options have significant benefits over current Intel processor platforms. There are however limited storage controllers and Network adapters that can help utilise the full benefits of PCI-gen4. In terms of NSD architecture there are many variables that you also have to take into consideration. Are you looking at storage rich servers? Are you looking at SAN attached Flash Are you looking at scale ECE type deployment? As an IBM employee and someone familiar with ESS 5000, and the differences / benefits of the 5K architecture, Unless your planning on building a Scale ECE type cluster with AMD processors, storage class memory, and NVMe flash modules. I would seriously consider the ESS 5k over an x86 based NL-SAS storage topology Including AMD. Sent from my iPhone > On 3 Sep 2020, at 17:44, Giovanni Bracco wrote: > > ?OK from client side, but I would like to know if the same is also for > NSD servers with AMD EPYC, do they operate with good performance > compared to Intel CPUs? > > Giovanni > >> On 03/09/20 00:28, Andrew Beattie wrote: >> Giovanni, >> I have clients in Australia that are running AMD ROME processors in >> their Visualisation nodes connected to scale 5.0.4 clusters with no issues. >> Spectrum Scale doesn't differentiate between x86 processor technologies >> -- it only looks at x86_64 (OS support more than anything else) >> Andrew Beattie >> File and Object Storage Technical Specialist - A/NZ >> IBM Systems - Storage >> Phone: 614-2133-7927 >> E-mail: abeattie at au1.ibm.com >> >> ----- Original message ----- >> From: Giovanni Bracco >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug main discussion list , >> Frederick Stock >> Cc: >> Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what >> about AMD epyc support in GPFS? >> Date: Thu, Sep 3, 2020 7:29 AM >> I am curious to know about AMD epyc support by GPFS: what is the status? >> Giovanni Bracco >> >>> On 28/08/20 14:25, Frederick Stock wrote: >>> Not sure that Spectrum Scale has stated it supports the AMD epyc >> (Rome?) >>> processors. You may want to open a help case to determine the >> cause of >>> this problem. >>> Note that Spectrum Scale 4.2.x goes out of service on September >> 30, 2020 >>> so you may want to consider upgrading your cluster. And should Scale >>> officially support the AMD epyc processor it would not be on >> Scale 4.2.x. >>> >>> Fred >>> __________________________________________________ >>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >>> stockf at us.ibm.com >>> >>> ----- Original message ----- >>> From: Philipp Helo Rehs >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >> >>> Cc: >>> Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck >>> Date: Fri, Aug 28, 2020 5:52 AM >>> Hello, >>> >>> we have a gpfs v4 cluster running with 4 nsds and i am trying >> to add >>> some clients: >>> >>> mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 >>> >>> this commands hangs and do not finish >>> >>> When i look into the server, i can see the following >> processes which >>> never finish: >>> >>> root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote >> checkNewClusterNode3 >>> lc/setupClient >>> >> %%9999%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: >>> %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 >>> root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl >> setupClient 2 >>> 21479 >>> >> 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 >>> 0 1191 >>> root 38212 100 0.0 35544 5752 ? R 11:32 9:40 >>> /usr/lpp/mmfs/bin/tsgskkm store --cert >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out >>> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off >>> >>> The node is an AMD epyc. >>> >>> Any idea what could cause the issue? >>> >>> ssh is possible in both directions and firewall is disabled. >>> >>> >>> Kind regards >>> >>> Philipp Rehs >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bracco at enea.it >> WWW http://www.afs.enea.it/bracco >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 4 08:56:41 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 4 Sep 2020 08:56:41 +0100 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 4 10:02:29 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 4 Sep 2020 09:02:29 +0000 Subject: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? In-Reply-To: <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> References: <72830a61-3bb6-ff19-fedd-2dd389664f15@enea.it> <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> <06deef0f-3a2e-68a1-a948-fa07a54ffa55@strath.ac.uk> Message-ID: <8BA85682-C84F-4AF3-9A3D-6077E0715892@bham.ac.uk> Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet): https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ ... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. Simon ?On 04/09/2020, 08:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oluwasijibomi.saula at ndsu.edu Fri Sep 4 17:03:17 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Fri, 4 Sep 2020 16:03:17 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server Message-ID: Hello GPFS Experts, Say, is there any way to disable a particular NSD server outside of shutting down GPFS on the server, or shutting down the entire cluster and removing the NSD server from the list of NSD servers? I'm finding that TSM activity on one of our NSD servers is stifling IO traffic through the server and resulting in intermittent latency for clients. If we could restrict cluster IO from going through this NSD server, we might be able to minimize or eliminate the latencies experienced by the clients while TSM activity is ongoing. Thoughts? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Sep 7 14:29:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 7 Sep 2020 13:29:59 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Tue Sep 8 04:09:07 2020 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 8 Sep 2020 03:09:07 +0000 Subject: [gpfsug-discuss] Short-term Deactivation of NSD server In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 8 14:04:26 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 09:04:26 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think a better metaphor is that the bridge we just crossed has collapsed and as long as we do not need to cross it again our journey should reach its intended destination :-) As I understand the intent of this message is to alert the user (and our support teams) that the directory from which a command was executed no longer exist. Should that be of consequence to the execution of the command then failure is not unexpected, however, many commands do not make use of the current directory so they likely will succeed. If you consider the view point of a command failing because the working directory was removed, but not knowing that was the root cause, I think you can see why this message was added into the administration infrastructure. It allows this odd failure scenario to be quickly recognized saving time for both the user and IBM support, in tracking down the root cause. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/07/2020 09:29 AM Subject: [EXTERNAL] [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, just came across this: /usr/lpp/mmfs/bin/mmafmctl fs3101 getstate mmafmctl: Invalid current working directory detected: /tmp/A The command may fail in an unexpected way. Processing continues .. It?s like a bus driver telling you that the brakes don?t work and next speeding up even more. Honestly, why not just fail with a nice error messages ?. Don?t tell some customer asked for this to make the command more resilient ? Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 8 17:10:59 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Sep 2020 17:10:59 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) ?As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. ?Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. ?If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. ?It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Sep 8 18:37:59 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Sep 2020 13:37:59 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: I think it is incorrect to assume that a command that continues after detecting the working directory has been removed is going to cause damage to the file system. Further, there is no a priori means to confirm if the lack of a working directory will cause the command to fail. I will agree that there may be admins that would prefer the command fail fast and allow them to restart the command anew, but I suspect there are admins that prefer the command press ahead in hopes that it can complete successfully and not require another execution. I'm sure we can conjure scenarios that support both points of view. Perhaps what is desired is a message that more clearly describes what is being undertaken. For example, "The current working directory, , no longer exists. Execution continues." Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 09/08/2020 12:10 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Best of spectrum scale Sent by: gpfsug-discuss-bounces at spectrumscale.org On 08/09/2020 14:04, IBM Spectrum Scale wrote: > I think a better metaphor is that the bridge we just crossed has > collapsed and as long as we do not need to cross it again our journey > should reach its intended destination :-) As I understand the intent of > this message is to alert the user (and our support teams) that the > directory from which a command was executed no longer exist. Should > that be of consequence to the execution of the command then failure is > not unexpected, however, many commands do not make use of the current > directory so they likely will succeed. If you consider the view point > of a command failing because the working directory was removed, but not > knowing that was the root cause, I think you can see why this message > was added into the administration infrastructure. It allows this odd > failure scenario to be quickly recognized saving time for both the user > and IBM support, in tracking down the root cause. > I think the issue being taken is that you get an error message of The command may fail in an unexpected way. Processing continues .. Now to my mind that is an instant WTF, and if your description is correct the command should IMHO have exiting saying something like Working directory vanished, exiting command If there is any chance of the command failing then it should not be executed IMHO. I would rather issue it again from a directory that exists. The way I look at it is that file systems have "state", that is if something goes wrong then you could be looking at extended downtime as you break the backup out and start restoring. GPFS file systems have a tendency to be large, so even if you have a backup it is not a pleasant process and could easily take weeks to get things back to rights. Consequently most system admins would prefer the command does not continue if there is any possibility of it failing and messing up the "state" of my file system. That's unlike say the configuration on a network switch that can be quickly be put back with minimal interruption. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Tue Sep 8 23:46:08 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Tue, 8 Sep 2020 22:46:08 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Message-ID: Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Sep 9 12:02:53 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 9 Sep 2020 12:02:53 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> Message-ID: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> On 08/09/2020 18:37, IBM Spectrum Scale wrote: > I think it is incorrect to assume that a command that continues > after detecting the working directory has been removed is going to > cause damage to the file system. No I am not assuming it will cause damage. I am making the fairly reasonable assumption that any command which fails has an increased probability of causing damage to the file system over one that completes successfully. > Further, there is no a priori means to confirm if the lack of a > working directory will cause the command to fail. Which is why baling out is a more sensible default that ploughing on regardless. > I will agree that there may be admins that would prefer the command > fail fast and allow them to restart the command anew, but I suspect > there are admins that prefer the command press ahead in hopes that > it can complete successfully and not require another execution. I am sure that there are inexperienced admins who have yet to be battle scared that would want such reckless default behaviour. Pandering to their naivety is not a sensible approach IMHO. The downside if a large file system (and production GPFS file systems tend to be large) going "puff" is so massive that the precaution principle should apply. One wonders if we are seeing the difference between a US and European mindset here. > I'm sure we can conjure scenarios that support both points of view. > Perhaps what is desired is a message that more clearly describes what > is being undertaken. For example, "The current working directory, > , no longer exists. Execution continues." > That is what --force is for. If you are sufficiently reckless that you want something to continue in the event of a possible error you have the option to stick that on every command you run. Meanwhile the sane admins get a system that defaults to proceeding in the safer manner possible. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Sep 9 15:04:27 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Sep 2020 07:04:27 -0700 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> Message-ID: <20200909140427.aint6lhyqgz7jlk7@thargelion> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: > On 08/09/2020 18:37, IBM Spectrum Scale wrote: > > I think it is incorrect to assume that a command that continues > > after detecting the working directory has been removed is going to > > cause damage to the file system. > > No I am not assuming it will cause damage. I am making the fairly reasonable > assumption that any command which fails has an increased probability of > causing damage to the file system over one that completes successfully. I think there is another angle here, which is that this command's output has the possibility of triggering an "oh ----" (fill in your preferred colorful metaphor here) moment, followed up by a panicked Ctrl-C. That reaction has the possibility of causing its own problems (i.e. not sure if mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). I'm with Jonathan here: the command should fail with an informative message, and the admin can correct the problem (just cd somewhere else). -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From carlz at us.ibm.com Thu Sep 10 13:55:25 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 10 Sep 2020 12:55:25 +0000 Subject: [gpfsug-discuss] Best of spectrum scale Message-ID: <188B4B5D-8670-4071-85E6-AF13E087E8E1@us.ibm.com> Jonathan, Can I ask you to file an RFE for this? And post the number here so others can vote for it if they wish. I don?t see any reason to defend an error message that is basically a shrug, and the fix should be straightforward (i.e. bail out). However, email threads tend to get lost, whereas RFEs are tracked, managed, and monitored (and there is now a new Systems-wide initiative to report and measure responsiveness.) Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1291474181] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From cblack at nygenome.org Thu Sep 10 16:55:46 2020 From: cblack at nygenome.org (Christopher Black) Date: Thu, 10 Sep 2020 15:55:46 +0000 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We run sklm for tape encryption for spectrum archive ? no encryption in gpfs filesystem on disk pools. We see no grep hits for ?not trust? in our last few sklm_audit.log files. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Tuesday, September 8, 2020 at 7:10 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Ran into something a good while back and I'm curious how many others this affects. If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering. This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing. Just find your daily, rotating audit log and search it. I'll trust most folks to figure this out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like: "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc " or whatever works for you. If your audit log is fairly fresh, you might want to check the previous one. I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. Mostly I'm curious if folks get zero, or a large number. I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log. Yet things work perfectly. I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues. Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Fri Sep 11 15:25:55 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Fri, 11 Sep 2020 10:25:55 -0400 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <20200909140427.aint6lhyqgz7jlk7@thargelion> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> Message-ID: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> > On Sep 9, 2020, at 10:04 AM, Skylar Thompson wrote: > > On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>> I think it is incorrect to assume that a command that continues >>> after detecting the working directory has been removed is going to >>> cause damage to the file system. >> >> No I am not assuming it will cause damage. I am making the fairly reasonable >> assumption that any command which fails has an increased probability of >> causing damage to the file system over one that completes successfully. > > I think there is another angle here, which is that this command's output > has the possibility of triggering an "oh ----" (fill in your preferred > colorful metaphor here) moment, followed up by a panicked Ctrl-C. That > reaction has the possibility of causing its own problems (i.e. not sure if > mmafmctl touches CCR, but aborting it midway could leave CCR inconsistent). > I'm with Jonathan here: the command should fail with an informative > message, and the admin can correct the problem (just cd somewhere else). > I?m now (genuinely) curious as to what Spectrum Scale commands *actually* depend on the working directory existing and why. They shouldn?t depend on anything but existing well-known directories (logs, SDR, /tmp, et cetera) and any file or directories passed as arguments to the command. This is the Unix way. It seems like the *right* solution is to armor commands against doing something ?bad? if they lose a resource required to complete their task. If $PWD goes away because an admin?s home goes away in the middle of a long restripe, it?s better to complete the work and let them look in the logs. It's not Scale?s problem if something not affecting its work happens. Maybe I?ve got a blind spot here... -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Fri Sep 11 19:47:52 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 11 Sep 2020 14:47:52 -0400 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: We have spectrum archive with encryption on disk and tape. We get maybe a 100 or so messages like this daily. It would be nice if message had some information about which client is the issue. We have had client certs expire in the past. The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me. I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: > Ran into something a good while back and I'm curious how many others this > affects. If folks with encryption enabled could run a quick word count on > their SKLM server and reply with a rough count I'd appreciate it. > I've gone round and round with IBM SKLM support over the last year on this > and it just has me wondering. This is one of those "morbidly curious about > making the sausage" things. > > Looking to see if this is a normal error message folks are seeing. Just > find your daily, rotating audit log and search it. I'll trust most folks > to figure this out, but let me know if you need help. > Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit > If you are on a normal linux box try something like: "locate > sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client > certificate" {} |wc " or whatever works for you. If your audit log is > fairly fresh, you might want to check the previous one. I do NOT need > exact information, just 'yeah we get 12million out a 500MB file' or ' we > get zero', or something like that. > > Mostly I'm curious if folks get zero, or a large number. I've got my > logs adjusted to 500MB and I get 8 digit numbers out of the previous log. > Yet things work perfectly. I've talked to two other SS sites I know the > admins personally, and they get larger numbers than I do. But it's such a > tiny sample size! LOL > > Ed Wahl > Ohio Supercomputer Center > > Apologies for the message formatting issues. Outlook fought tooth and > nail against sending it with the path as is, and kept breaking my > paragraphs. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 11 20:53:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 11 Sep 2020 20:53:45 +0100 Subject: [gpfsug-discuss] Best of spectrum scale In-Reply-To: <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> References: <26AD5F36-2E5D-478B-9AFF-0948770C2EBE@id.ethz.ch> <61815473-11ff-d29e-a352-b8d3f1842b97@strath.ac.uk> <20200909140427.aint6lhyqgz7jlk7@thargelion> <3D38DC36-233A-405C-B5CE-B22962ED8E80@ulmer.org> Message-ID: <049f7e23-fb72-019f-a7b0-f9d0f1d189dc@strath.ac.uk> On 11/09/2020 15:25, Stephen Ulmer wrote: > >> On Sep 9, 2020, at 10:04 AM, Skylar Thompson > > wrote: >> >> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote: >>> On 08/09/2020 18:37, IBM Spectrum Scale wrote: >>>> I think it is incorrect to assume that a command that continues >>>> after detecting the working directory has been removed is going to >>>> cause damage to the file system. >>> >>> No I am not assuming it will cause damage. I am making the fairly >>> reasonable >>> assumption that any command which fails has an increased probability of >>> causing damage to the file system over one that completes successfully. >> >> I think there is another angle here, which is that this command's output >> has the possibility of triggering an "oh ----" (fill in your preferred >> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That >> reaction has the possibility of causing its own problems (i.e. not sure if >> mmafmctl touches CCR, but aborting it midway could leave CCR >> inconsistent). >> I'm with Jonathan here: the command should fail with an informative >> message, and the admin can correct the problem (just cd somewhere else). >> > > I?m now (genuinely) curious as to?what?Spectrum Scale commands > *actually* depend on the working directory existing and why. They > shouldn?t depend on anything but existing well-known directories (logs, > SDR, /tmp, et cetera) and any file or directories passed as arguments to > the command. This is the Unix way. > > It seems like the *right* solution is to armor commands against doing > something ?bad? if they lose a resource required to complete their task. > If $PWD goes away because an admin?s home goes away in the middle of a > long restripe, it?s better to complete the work and let them look in the > logs. It's not Scale?s problem if something not affecting its work happens. > > Maybe I?ve got a blind spot here... > This jogged my memory that best practice would be to have a call to chdir to set the working directory to "/" very early on. Before anything critical is started. I am 99.999% sure that its covered in Steven's (can't check as I am away for the weekend) so really there is no excuse. If / goes away then really really bad things have happened and it all sort of becomes moot anyway. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Mon Sep 14 06:27:58 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 14 Sep 2020 13:27:58 +0800 Subject: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count In-Reply-To: References: Message-ID: Hi Eric, Please help me to understand your question. You have Spectrum Archive and Spectrum Scale in your system, and both of them are connected to IBM SKLM for encryption. Now you got lots of error/warning message from SKLM log. Now you want to understand which component, Scale or Archive, makes the SKLM print those error message, right? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "J. Eric Wonderley" To: gpfsug main discussion list Date: 2020/09/12 02:47 Subject: [EXTERNAL] Re: [gpfsug-discuss] Request for folks using encryption on SKLM, run a word count Sent by: gpfsug-discuss-bounces at spectrumscale.org We have spectrum archive with encryption on disk and tape.? ?We get maybe a 100 or so messages like this daily.? It would be nice if message had some information about which client is the issue. We have had client certs expire in the past.? The root cause of the outage was a network outage...iirc the certs are cached in the clients. I don't know what to make of these messages...they do concern me.? I don't have a very good opinion of the sklm code...key replication between the key servers has never worked as expected. Eric Wonderley On Tue, Sep 8, 2020 at 7:10 PM Wahl, Edward wrote: ?Ran into something a good while back and I'm curious how many others this affects.?? If folks with encryption enabled could run a quick word count on their SKLM server and reply with a rough count I'd appreciate?it. I've gone round and round with IBM SKLM support over the last year on this and it just has me wondering.? This is one of those "morbidly curious about making the sausage" things. Looking to see if this is a normal error message folks are seeing.? Just find your daily, rotating audit log and search it.? I'll trust most folks to figure this?out, but let me know if you need help. Normal location is /opt/IBM/WebSphere/AppServer/products/sklm/logs/audit If you are on a normal linux box try something like:? "locate sklm_audit.log |head -1 |xargs -i grep "Server does not trust the client certificate" {} |wc "? or whatever works for you.?? If your audit log is fairly fresh, you might want to check the previous one.?? I do NOT need exact information, just 'yeah we get 12million out a 500MB file' or ' we get zero', or something like that. ?Mostly I'm curious if folks get zero, or a large number.? I've got my logs adjusted to 500MB and I get 8 digit numbers out of the previous log.?? Yet things work perfectly.??? I've talked to two other SS sites I know the admins personally, and they get larger numbers than I do. But it's such a tiny sample size! LOL Ed Wahl Ohio Supercomputer Center Apologies for the message formatting issues.? Outlook fought tooth and nail against sending it with the path as is, and kept breaking my paragraphs. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From u.sibiller at science-computing.de Mon Sep 14 13:09:12 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 14 Sep 2020 14:09:12 +0200 Subject: [gpfsug-discuss] tsgskkm stuck In-Reply-To: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> References: <90e8ffba-a00a-95b9-c65b-1cda9ffc8c4c@uni-duesseldorf.de> Message-ID: On 8/28/20 11:43 AM, Philipp Helo Rehs wrote: > root???? 38212? 100? 0.0? 35544? 5752 ???????? R??? 11:32?? 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off Judging from the command line tsgskkm will generate a certificate which normally involves a random number generator. If such a process hangs it might be due to a lack of entropy. So I suggest trying to generate some I/O on the node. Or run something like haveged (https://wiki.archlinux.org/index.php/Haveged). Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From S.J.Thompson at bham.ac.uk Fri Sep 18 11:52:51 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Sep 2020 10:52:51 +0000 Subject: [gpfsug-discuss] SSUG::Digital inode management, VCPU scaling and considerations for NUMA Message-ID: <5c6175fb949c4a30bcc94a2bbe986178@bham.ac.uk> Number 5 in the SSUG::Digital talks set takes place 22 September 2020 Spectrum Scale is a highly scalable, high-performance storage solution for file and object storage. It started more than 20 years ago as research project and is now used by thousands of customers. IBM continues to enhance Spectrum Scale, in response to recent hardware advancements and evolving workloads. This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2126 bytes Desc: not available URL: From joe at excelero.com Fri Sep 18 13:38:51 2020 From: joe at excelero.com (joe at excelero.com) Date: Fri, 18 Sep 2020 07:38:51 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 16 Message-ID: <92e304d9-de58-4bdc-aae5-95a9dfc03a44@Spark> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:11:31 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:11:31 +0000 Subject: [gpfsug-discuss] CCR errors Message-ID: Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sat Sep 19 21:23:01 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 20:23:01 +0000 Subject: [gpfsug-discuss] CCR errors In-Reply-To: References: Message-ID: I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sat Sep 19 21:52:19 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 20:52:19 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: Message-ID: Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Sun Sep 20 00:45:41 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Sat, 19 Sep 2020 23:45:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 In-Reply-To: References: , Message-ID: I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Sun Sep 20 00:59:28 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Sat, 19 Sep 2020 23:59:28 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 19 In-Reply-To: References: Message-ID: Ryan, I appreciate your support - I finally got some on a WebEx now. I'll share any useful information I glean from the session. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 6:45:47 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Saula, Oluwasijibomi) 2. Re: gpfsug-discuss Digest, Vol 104, Issue 18 (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:52:19 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="us-ascii" Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 23:45:41 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 104, Issue 18 Message-ID: Content-Type: text/plain; charset="utf-8" I?d call in/click the lack of response thing (if I?m remembering that button right for the current system). That?s unusual. Maybe it failed to transition time zones. I?d help you, but I don?t know how to fix that one. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:52, Saula, Oluwasijibomi wrote: ? Ryan, We've been at severity 1 since about 4am with only a single response all day. Got me a bit concerned now... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Saturday, September 19, 2020 3:23 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 104, Issue 18 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. CCR errors (Saula, Oluwasijibomi) 2. Re: CCR errors (Ryan Novosielski) ---------------------------------------------------------------------- Message: 1 Date: Sat, 19 Sep 2020 20:11:31 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sat, 19 Sep 2020 20:23:01 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR errors Message-ID: Content-Type: text/plain; charset="utf-8" I find them to be pretty fast and very experienced at severity 1. Don?t hesitate to use it (unless that?s already where you?re at). -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 19, 2020, at 16:11, Saula, Oluwasijibomi wrote: ? Hello, Anyone available to assist with CCR errors: [root at nsd02 ~]# mmchnode -N nsd04-ib --quorum --manager mmchnode: Unable to obtain the GPFS configuration file lock. Retrying ... Per IBM support's direction, I already followed the Manual Repair Procedure, but now I'm back to square one with the same issue. Also, I have a ticket over to IBM for troubleshooting this issue during our downtime this weekend, but support's response is really slow. If possible, I'd prefer a webex session to facilitate closure, but I can send emails back and forth if necessary. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 18 *********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 19 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 09:35:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 08:35:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 21 10:55:29 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 21 Sep 2020 09:55:29 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 11:32:25 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 10:32:25 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> Message-ID: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Sep 21 11:57:30 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 21 Sep 2020 16:27:30 +0530 Subject: [gpfsug-discuss] =?utf-8?q?Checking_if_a_AFM-managed_file_is_stil?= =?utf-8?q?l=09inflight?= In-Reply-To: <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Mon Sep 21 12:17:35 2020 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 21 Sep 2020 11:17:35 +0000 Subject: [gpfsug-discuss] Checking if a AFM-managed file is still inflight In-Reply-To: References: <71A01D57-2DE5-4980-8E15-C8802A1E89ED@psi.ch> <81B79A74-B3D7-4A0F-A7AF-4A4EB497E634@psi.ch> Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C@psi.ch> Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Sep 22 10:18:05 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Sep 2020 10:18:05 +0100 Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Sep 22 11:47:46 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Sep 2020 10:47:46 +0000 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <1696CA15-9ACC-474F-99F5-DC031951A131@bham.ac.uk> We've always taken it to mean .. RHEL != CentOS 7.1 != 7.2 (though mostly down to the kernel). ppc64le != x86_64 But never differentiated by microarchitecture. That doesn't mean to say we are correct in these assumptions __ Simon ?On 22/09/2020, 10:17, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Tue Sep 22 14:50:34 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 22 Sep 2020 06:50:34 -0700 Subject: [gpfsug-discuss] Portability interface In-Reply-To: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> References: <4b586251-d208-8535-925a-311023af3dd6@strath.ac.uk> Message-ID: <20200922135034.6be42ykveio654sm@thargelion> We've used the same built RPMs (generally built on Intel) on Intel and AMD x86-64 CPUs, and definitely have a mix of ISAs from both vendors, and haven't run into any problems. On Tue, Sep 22, 2020 at 10:18:05AM +0100, Jonathan Buzzard wrote: > > I have a question about using RPM's for the portability interface on > different CPU's. > > According to /usr/lpp/mmfs/src/README > > The generated RPM can ONLY be deployed to the machine with > identical architecture, distribution level, Linux kernel version > and GPFS version. > > So does this mean that if I have a heterogeneous cluster with some machines > on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all > using GPFS 5.0.5 I have to have different RPM's for the two CPU's? > > Or when it says "identical architecture" does it mean x86-64, ppc etc. and > not variations with the x86-64, ppc class? Assuming some minimum level is > met. > > Obviously the actual Linux kernel being stock RedHat would be the same on > every machine regardless of whether it's Skylake or Sandy Bridge, or even > for that matter an AMD processor. > > Consequently it seems strange that I would need different portability > interfaces. Would it help to generate the portability layer RPM's on a Sandy > Bridge machine and work no the presumption anything that runs on Sandy > Bridge will run on Skylake? > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department (UW Medicine), System Administrator -- Foege Building S046, (206)-685-7354 -- Pronouns: He/Him/His From truongv at us.ibm.com Tue Sep 22 16:47:09 2020 From: truongv at us.ibm.com (Truong Vu) Date: Tue, 22 Sep 2020 11:47:09 -0400 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: You are correct, the "identical architecture" means the same machine hardware name as shown by the -m option of the uname command. Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/22/2020 05:18 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 104, Issue 23 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Checking if a AFM-managed file is still inflight (Dorigo Alvise (PSI)) 2. Portability interface (Jonathan Buzzard) ---------------------------------------------------------------------- Message: 1 Date: Mon, 21 Sep 2020 11:17:35 +0000 From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Message-ID: <7BB1DD94-E99C-4A66-BAF2-BE287EE5752C at psi.ch> Content-Type: text/plain; charset="utf-8" Thank you Venkat, the ?dirty? and ?append? flags seem quite useful. A Da: per conto di Venkateswara R Puvvada Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 12:57 A: gpfsug main discussion list Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight tspcacheutil , this command provides information about the file's replication state. You can also run policy to find these files. Example: tspcacheutil /gpfs/gpfs1/sw2/1.txt inode: ino=524290 gen=235142808 uid=1000 gid=0 size=3 mode=0200100777 nlink=1 ctime=1600366912.382081156 mtime=1600275424.692786000 cached 1 hasState 1 local 0 create 0 setattr 0 dirty 0 link 0 append 0 pcache: parent ino=524291 foldval=0x6AE011D4 nlink=1 remote: ino=56076 size=3 nlink=1 fhsize=24 version=0 ctime=1600376836.408694099 mtime=1600275424.692786000 Cached - File is cached. For directory, readdir+lookup is completed. hashState - file/dir have remote attributes for the replication. local - file/dir is local, won't be replicated to home or not revalidated with home. Create - file/dir is newly created, not yet replicated Setattr - Attributes (chown, chmod, mmchattr , setACL, setEA etc..) are changed on dir/file, but not replicated yet. Dirty - file have been changed in the cache, but not replicated yet. For directory this means that files inside it have been removed or renamed. Link - hard link for the file have been created, but not replicated yet. Append - file have been appended, but not replicated yet. For directory this is complete bit which indicates that readddir was performed. ~Venkat (vpuvvada at in.ibm.com) From: "Dorigo Alvise (PSI)" To: gpfsug main discussion list Date: 09/21/2020 04:02 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Information reported by that command (both at cache and home side) are size, blocks, block size, and times. I think it cannot be enough to decide that AFM completed the transfer of a file. Did I possibly miss something else ? It would be nice to have a flag (like that one reported by the policy, flags ?P? ? managed by AFM ? and ?w? ? beeing transferred -) that can help us to know if AFM considers the file synced to home or not yet. Alvise Da: per conto di Olaf Weiser Risposta: gpfsug main discussion list Data: luned?, 21 settembre 2020 11:55 A: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Oggetto: Re: [gpfsug-discuss] Checking if a AFM-managed file is still inflight do you looking fo smth like this: mmafmlocal ls filename or stat filename ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Checking if a AFM-managed file is still inflight Date: Mon, Sep 21, 2020 10:45 AM Dear GPFS users, I know that through a policy one can know if a file is still being transferred from the cache to your home by AFM. I wonder if there is another method @cache or @home side, faster and less invasive (a policy, as far as I know, can put some pressure on the system when there are many files). I quickly checked mmlsattr that seems not to be AFM-aware (but there is a flags field that can show several things, like compression status, archive, etc). Any suggestion ? Thanks in advance, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200921/62d55b7e/attachment-0001.html > ------------------------------ Message: 2 Date: Tue, 22 Sep 2020 10:18:05 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: [gpfsug-discuss] Portability interface Message-ID: <4b586251-d208-8535-925a-311023af3dd6 at strath.ac.uk> Content-Type: text/plain; charset=utf-8; format=flowed I have a question about using RPM's for the portability interface on different CPU's. According to /usr/lpp/mmfs/src/README The generated RPM can ONLY be deployed to the machine with identical architecture, distribution level, Linux kernel version and GPFS version. So does this mean that if I have a heterogeneous cluster with some machines on Skylake and some on Sandy Bridge but all running on say RHEL 7.8 and all using GPFS 5.0.5 I have to have different RPM's for the two CPU's? Or when it says "identical architecture" does it mean x86-64, ppc etc. and not variations with the x86-64, ppc class? Assuming some minimum level is met. Obviously the actual Linux kernel being stock RedHat would be the same on every machine regardless of whether it's Skylake or Sandy Bridge, or even for that matter an AMD processor. Consequently it seems strange that I would need different portability interfaces. Would it help to generate the portability layer RPM's on a Sandy Bridge machine and work no the presumption anything that runs on Sandy Bridge will run on Skylake? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 104, Issue 23 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Wed Sep 23 15:57:00 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Sep 2020 15:57:00 +0100 Subject: [gpfsug-discuss] Portability interface In-Reply-To: References: Message-ID: <678f9ba0-0e3a-5ea1-7aac-74def4046f6f@strath.ac.uk> On 22/09/2020 16:47, Truong Vu wrote: > You are correct, the "identical architecture" means the same machine > hardware name as shown by the -m option of the uname command. > Thanks for clearing that up. It just seemed something of a blindly obvious statement; surely nobody would expect an RPM for an Intel based machine to install on a PowerPC machine? that I though it might be referring to something else. I mean you can't actually install an x86_64 RPM on a ppc64le machine as the rpm command will bomb out telling you it is from an incompatible architecture if you try. It's why you have noarch packages which can be installed on anything. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Sep 25 16:53:12 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Sep 2020 15:53:12 +0000 Subject: [gpfsug-discuss] SSUG::Digital: Persistent Storage for Kubernetes and OpenShift environments with Spectrum Scale Message-ID: <6e22851b42b54be8b6fa58376c738fea@bham.ac.uk> Episode 6 in the SSUG::Digital series will discuss the Spectrum Scale Container Storage Interface (CSI). CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes and OpenShift. Spectrum Scale CSI provides your containers fast access to files stored in Spectrum Scale with capabilities such as dynamic provisioning of volumes and read-write-many access. https://www.spectrumscaleug.org/event/ssugdigital-persistent-storage-for-containers-with-spectrum-scale/ SSUG Host: Bill Anderson Speakers: Smita Raut (IBM) Harald Seipp (IBM) Renar Grunenberg Simon Thompson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2233 bytes Desc: not available URL: From joe at excelero.com Sat Sep 26 16:43:15 2020 From: joe at excelero.com (joe at excelero.com) Date: Sat, 26 Sep 2020 10:43:15 -0500 Subject: [gpfsug-discuss] Accepted: gpfsug-discuss Digest, Vol 104, Issue 27 Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: reply.ics Type: application/ics Size: 0 bytes Desc: not available URL: From NISHAAN at za.ibm.com Mon Sep 28 09:09:29 2020 From: NISHAAN at za.ibm.com (Nishaan Docrat) Date: Mon, 28 Sep 2020 10:09:29 +0200 Subject: [gpfsug-discuss] Spectrum Scale Object - Need to support Amazon S3 DNS-style (Virtual hosted) Bucket Addressing Message-ID: Hi All I need to find out if anyone has successfully been able to get our Openstack Swift implementation of the object protocol to support the AWS DNS-syle bucket naming convention. See here for an explanation https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html. AWS DNS-style bucket naming includes the bucket in the DNS name (eg. mybucket1.ssobject.mycompany.com). Openstack Swift supports PATH style bucket naming (eg. https://swift-cluster.example.com/v1/my_account/ container/object). >From what I can tell, I need to enable the domain_remap function in the proxy-server.conf file and also statically resolve the DNS name to a specific bucket by inserting the correct AUTH account. See here for the domain_remap middleware explanation.. https://docs.openstack.org/swift/latest/middleware.html And here for additional DNS work that needs to be done.. https://docs.ovh.com/gb/en/public-cloud/place-an-object-storage-container-behind-domain-name/ Obviously a wildcard DNS server is required for this which is easy enough to implement. However, the steps for Openstack Swift to support this are not very clear. I'm hoping someone else went through the pain of figuring this out already :) Any help with this would be greatly appreciated! Kind Regards Nishaan Docrat Client Technical Specialist - Storage Systems IBM Systems Hardware Work: +27 (0)11 302 5001 Mobile: +27 (0)81 040 3793 Email: nishaan at za.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18044196.jpg Type: image/jpeg Size: 35643 bytes Desc: not available URL: From xhejtman at ics.muni.cz Wed Sep 30 22:52:39 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 30 Sep 2020 23:52:39 +0200 Subject: [gpfsug-discuss] put_cred bug Message-ID: <20200930215239.GU1440758@ics.muni.cz> Hello, is this bug already resolved? https://access.redhat.com/solutions/3132971 I think, I'm seeing it even with latest gpfs 5.0.5.2 [1204205.886192] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff8821c16cdad0 with usage -530190256 maybe also related: [ 1384.404355] GPFS logAssertFailed: oiP->vinfoP->oiP == oiP file /project/spreltac505/build/rtac505s002a/src/avs/fs/mmfs/ts/kernext/gpfsops.C line 5168 [ 1397.657845] <5>kp 28416: cxiPanic: gpfsops.C:5168:0:0:FFFFFFFFC0D15240::oiP->vinfoP->oiP == oiP -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title