From heinrich.billich at id.ethz.ch Mon Feb 3 08:56:09 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 08:56:09 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered Message-ID: Hello, Does mmshutdown or mmumount trigger a file system log recovery, same as a node failure or daemon crash do? Last week we got this advisory: IBM Spectrum Scale (GPFS) 5.0.4 levels: possible metadata or data corruption during file system log recovery https://www.ibm.com/support/pages/node/1274428?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E You need a file system log recovery running to potentially trigger the issue. When does a file system log recovery run? For sure on any unexpected mmfsd/os crash for mounted filesystems, or on connection loss, but what if we do a clean 'mmshutdown' or 'mmumount' - I assume this will cause the client to nicely finish all outstanding transactions and return the empty logfile, hence non log recovery will take place is we do a normal os shutdown/reboot, too? Or am I wrong and Spectrum Scale treats all cases the same way? I asked because the advisory states that a node reboot will trigger a log recovery - until we upgraded to 5.0.4-2 we'll try to avoid log recoveries: > Log recovery happens after a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot). Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From heinrich.billich at id.ethz.ch Mon Feb 3 10:02:06 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 10:02:06 +0000 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: References: Message-ID: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? Cheers, Heiner From: on behalf of Jan-Frode Myklebust Reply to: gpfsug main discussion list Date: Thursday, 30 January 2020 at 18:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes I *think* this was a known bug in the Power firmware included with 5.3.4, and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. -jf tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward >: Interesting. We just deployed an ESS here and are running into a very similar problem with the gui refresh it appears. Takes my ppc64le's about 45 seconds to run rinv when they are idle. I had just opened a support case on this last evening. We're on ESS 5.3.4 as well. I will wait to see what support says. Ed Wahl Ohio Supercomputer Center -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Ulrich Sibiller Sent: Thursday, January 30, 2020 9:44 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Can I change the times at which the GUI runs HW_INVENTORY and related tasks? > > we frequently get messages like > > gui_refresh_task_failed GUI WARNING 12 hours ago > The following GUI refresh task(s) failed: HW_INVENTORY > > The tasks fail due to timeouts. Running the task manually most times > succeeds. We do run two gui nodes per cluster and I noted that both > servers seem run the HW_INVENTORY at the exact same time which may > lead to locking or congestion issues, actually the logs show messages > like > > EFSSA0194I Waiting for concurrent operation to complete. > > The gui calls ?rinv? on the xCat servers. Rinv for a single > little-endian server takes a long time ? about 2-3 minutes , while it finishes in about 15s for big-endian server. > > Hence the long runtime of rinv on little-endian systems may be an > issue, too > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate xCat/ems server for each type. The GUI nodes are ppc64le. > > We did see this issue with several gpfs version on the gui and with at least two ESS/xCat versions. > > Just to be sure I did purge the Posgresql tables. > > I did try > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are difficult. I have seen the same on ppc64le. From time to time it recovers but then it starts again. The timeouts are okay, it is the hardware. I haven opened a call at IBM and they suggested upgrading to ESS 5.3.5 because of the new firmwares which I am currently doing. I can dig out more details if you want. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Feb 3 10:45:43 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 3 Feb 2020 11:45:43 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: <98640bc8-ecb7-d050-ea38-da47cf1b9ea4@science-computing.de> On 2/3/20 11:02 AM, Billich Heinrich Rainer (ID SD) wrote: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for > 5.3.5 lists FW860.60 again, same as 5.3.4? I have done the upgrade to 5.3.5 last week and gssinstallcheck now reports 860.70: [...] Installed version: 5.3.5-20191205T142815Z_ppc64le_datamanagement [OK] Linux kernel installed: 3.10.0-957.35.2.el7.ppc64le [OK] Systemd installed: 219-67.el7_7.2.ppc64le [OK] Networkmgr installed: 1.18.0-5.el7_7.1.ppc64le [OK] OFED level: MLNX_OFED_LINUX-4.6-3.1.9.1 [OK] IPR SAS FW: 19512300 [OK] ipraid RAID level: 10 [OK] ipraid RAID Status: Optimized [OK] IPR SAS queue depth: 64 [OK] System Firmware: FW860.70 (SV860_205) [OK] System profile setting: scale [OK] System profile verification PASSED. [OK] Host adapter driver: 16.100.01.00 [OK] Kernel sysrq level is: kernel.sysrq = 1 [OK] GNR Level: 5.0.4.1 efix6 [...] Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Feb 3 19:41:31 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 3 Feb 2020 20:41:31 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: I think both 5.3.4.2 and 5.3.5 includes FW860.70, but the readme doesn?t show this correctly. -jf man. 3. feb. 2020 kl. 11:02 skrev Billich Heinrich Rainer (ID SD) < heinrich.billich at id.ethz.ch>: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for > ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? > > > > Cheers, > > > > Heiner > > *From: * on behalf of Jan-Frode > Myklebust > *Reply to: *gpfsug main discussion list > *Date: *Thursday, 30 January 2020 at 18:00 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > > > > > I *think* this was a known bug in the Power firmware included with 5.3.4, > and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. > > > > > > > > -jf > > > > tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward : > > Interesting. We just deployed an ESS here and are running into a very > similar problem with the gui refresh it appears. Takes my ppc64le's about > 45 seconds to run rinv when they are idle. > I had just opened a support case on this last evening. We're on ESS > 5.3.4 as well. I will wait to see what support says. > > Ed Wahl > Ohio Supercomputer Center > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ulrich Sibiller > Sent: Thursday, January 30, 2020 9:44 AM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Can I change the times at which the GUI runs HW_INVENTORY and related > tasks? > > > > we frequently get messages like > > > > gui_refresh_task_failed GUI WARNING 12 hours > ago > > The following GUI refresh task(s) failed: HW_INVENTORY > > > > The tasks fail due to timeouts. Running the task manually most times > > succeeds. We do run two gui nodes per cluster and I noted that both > > servers seem run the HW_INVENTORY at the exact same time which may > > lead to locking or congestion issues, actually the logs show messages > > like > > > > EFSSA0194I Waiting for concurrent operation to complete. > > > > The gui calls ?rinv? on the xCat servers. Rinv for a single > > little-endian server takes a long time ? about 2-3 minutes , while it > finishes in about 15s for big-endian server. > > > > Hence the long runtime of rinv on little-endian systems may be an > > issue, too > > > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a > separate xCat/ems server for each type. The GUI nodes are ppc64le. > > > > We did see this issue with several gpfs version on the gui and with at > least two ESS/xCat versions. > > > > Just to be sure I did purge the Posgresql tables. > > > > I did try > > > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are > difficult. > > > I have seen the same on ppc64le. From time to time it recovers but then it > starts again. The timeouts are okay, it is the hardware. I haven opened a > call at IBM and they suggested upgrading to ESS 5.3.5 because of the new > firmwares which I am currently doing. I can dig out more details if you > want. > > Uli > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart Registernummer/Commercial > Register No.: HRB 382196 _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Feb 6 05:02:29 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 6 Feb 2020 05:02:29 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Feb 8 11:33:21 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 8 Feb 2020 11:33:21 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Message-ID: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is "rule of thumb" that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Sat Feb 8 11:59:54 2020 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Sun, 9 Feb 2020 00:59:54 +1300 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: > Hello! > > We are designing two fs where we cannot anticipate if there will be 3000, > or maybe 5000 or more nodes totally accessing these filesystems > > What we saw, was that execution time of mmdf can last 5-7min > > We openend a case and they said, that during such commands like mmdf or > also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is > the reason why it takes so long > > The technichian also said, that it is ?rule of thumb? that there should be > > (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per > pool ?) > > (also Block size has influence on regions ?) > > > > #mmfsadm saferdump stripe > > Gives the regions number > > storage pools: max 8 > > > > alloc map type 'scatter' > > > > 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 > thinProvision reserved inode -1, reserved nBlocks 0 > > > > *regns 170413* segs 1 size 4096 FBlks 0 MBlks 3145728 subblock > size 8192 > > > > > > > > > > > > We also saw when creating the filesystem with a speciicic (-n) very high > (5000) (where mmdf execution time was some minutes) and then changing (-n) > to a lower value this does not influence the behavior any more > > > > My question is: Is the rule (Number of Nodes)x5000 for number of regios in > a pool an good estimation , > > Is it better to overestimate the number of Nodes (lnger running commands) > or is it unrealistic to get into problems when not reaching the regions > number calculated ? > > > > Does anybody have experience with high number of nodes (>>3000) and how > to design the filesystems for such large clusters ? > > > > Thank you very much in advance ! > > > > > > > > Mit freundlichen Gr??en > *Walter Sklenka* > *Technical Consultant* > > > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sun Feb 9 09:59:32 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sun, 9 Feb 2020 09:59:32 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Feb 10 11:09:56 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 10 Feb 2020 11:09:56 +0000 Subject: [gpfsug-discuss] Spectrum scale yum repos - any chance to the number of repos Message-ID: <1B9A9988-7347-41B4-A881-4300F8F9E5BF@id.ethz.ch> Hello, Does it work to merge ?all? Spectrum Scale rpms of one version in one yum repo, can I merge rpms from different versions in the same repo, even different architectures? Yum repos for RedHat, Suse, Debian or application repos like EPEL all manage to keep many rpms and all different versions in a few repos. Spreading the few Spectrum Scale rpms for rhel across about 11 repos for each architecture and version seems overly complicated ? and makes it difficult to use RedHat Satellite to distribute the software ;-( Does anyone have experiences or opinions with this ?single repo? approach ? Does something break if we use it? We run a few clusters where up to now each runs its own yum server. We want to consolidate with RedHat Satellite for os and scale provisioning/updates. RedHat Satellite having just one repo for _all_ versions would fit much better. And may just separate repos for base (including protocols), object and hdfs (which we don?t use). My wish: The number of repos should no grow with the number of versions provided and adding a new version should not require to setup new yum repos. I know you can workaround and script, but would be easier if I wouldn?t need to. Regards, Heiner From nfalk at us.ibm.com Mon Feb 10 14:57:13 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Mon, 10 Feb 2020 14:57:13 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems Phone: 1-720-349-9538 | Mobile: 1-845-546-4930 E-mail: nfalk at us.ibm.com Find me on: From: Walter Sklenka To: gpfsug main discussion list Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=bgNFbl7WeRbpQtvfu8K1GC1HVGofxoeEehWJXVM6H0c&s=BRQWKQ--3xw8g_2o9-RD-XsRdMon6iIy31iSstzRRAw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Mon Feb 10 18:34:45 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Mon, 10 Feb 2020 18:34:45 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: <92ca7c73eb314667be51d79f97f34c9c@Mail.EDVDesign.cloudia> Hello Nate! Thank you very much for the response Do you know if the rule of thumb for ?enough regions =N*32 per pool And isn?t there an other way to increate the number of regions? (mybe by reducing block-size ? It?s only because the commands excetuin time of a couple of minutes make me nervous , or is the reason more a poor metadata perf for the long running command? But if you say so we will change it to N=5000 Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Nathan Falk Gesendet: Monday, February 10, 2020 3:57 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems ________________________________ Phone:1-720-349-9538| Mobile:1-845-546-4930 E-mail:nfalk at us.ibm.com Find me on:[LinkedIn: https://www.linkedin.com/in/nathan-falk-078ba5125] [Twitter: https://twitter.com/natefalk922] [IBM] From: Walter Sklenka > To: gpfsug main discussion list > Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von:gpfsug-discuss-bounces at spectrumscale.org > Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 -->160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Feb 11 21:44:07 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 11 Feb 2020 16:44:07 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> Message-ID: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From scale at us.ibm.com Wed Feb 12 12:48:42 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 12 Feb 2020 07:48:42 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca><5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: Hi Jaime, When I copy & paste your command to try, this is what I got. /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Date: 02/11/2020 05:26 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=vCTEqk_OPEgrWnqq9bJpzD-pn5QnNNNo3citEqiTsEY&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=76T6OenS_DXfRVD5Xh02vz8qnWOyhmv7yWeawZKYmWA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu Feb 13 19:37:12 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 13 Feb 2020 11:37:12 -0800 Subject: [gpfsug-discuss] NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> Message-ID: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. Thanks, Kristy > On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: > > Thanks for your responses to the poll. > > We?re still working on a venue, but working towards: > > March 30 - New User Day (Tuesday) > April 1&2 - Regular User Group Meeting (Wednesday & Thursday) > > Once it?s confirmed we?ll post something again. > > Best, > Kristy. > >> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >> >> Thank you to the 18 wonderful people who filled out the survey. >> >> However, there are well more than 18 people at any given UG meeting. >> >> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >> >> Happy New Year. >> >> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >> >> Thanks, >> Kristy >> >>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>> >>> Best wishes to all in the new year. >>> >>> -Kristy >>> >>> >>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Fri Feb 14 13:25:08 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Fri, 14 Feb 2020 14:25:08 +0100 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? Message-ID: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From S.J.Thompson at bham.ac.uk Fri Feb 14 14:56:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 14 Feb 2020 14:56:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Fri Feb 14 16:24:40 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 14 Feb 2020 16:24:40 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: Some (perhaps obvious) points to consider: - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. Some thoughts about whether it's appropriate to use NSD servers as sync hosts... - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... - do they have enough available RAM and CPU? - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. Good luck, Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, February 14, 2020 09:57 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? This message was sent by an external party. I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Fri Feb 14 16:13:30 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 14 Feb 2020 16:13:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. Ed -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco Sent: Friday, February 14, 2020 8:25 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ From valdis.kletnieks at vt.edu Fri Feb 14 17:28:27 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Fri, 14 Feb 2020 12:28:27 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: <61512.1581701307@turing-police> On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog I got bit by this when cut-n-pasting from IBM documentation - the problem is that the web version has characters that *look* like the command-line hyphen character but are actually something different. It's the same problem as cut-n-pasting a command line where the command *should* have the standard ascii double-quote, but the webpage has "smart quotes" where there's different open and close quote characters. Just even less visually obvious... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From skylar2 at uw.edu Fri Feb 14 17:24:46 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 14 Feb 2020 17:24:46 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <20200214172446.gwzd332efrkpcuxp@utumno.gs.washington.edu> Our experience matches Ed. I have a vague memory that clients will balance traffic across all NSD servers based on the preferred list for each NSD, whereas NSD servers will just read from each NSD directly. On Fri, Feb 14, 2020 at 04:13:30PM +0000, Wahl, Edward wrote: > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From bhill at physics.ucsd.edu Fri Feb 14 18:10:04 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Fri, 14 Feb 2020 10:10:04 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 Message-ID: Hi All: I'm performing a rolling upgrade of one of our GPFS clusters. This particular cluster has 2 CNFS servers for some of our NFS clients. I wiped one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem mounts fine on the node when I disable CNFS on the node, but with it enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd has started, so it assumes the worst and shuts down the file system (I currently have reboot on failure disabled to debug this). The thing is, it actually does start nfsd processes when running mmstartup on the node. Doing a "ps" shows 32 nfsd threads are running. Below is the CNFS-specific output from an attempt to start the node: CNFS[27243]: Restarting lockd to start grace CNFS[27588]: Enabling 172.16.69.76 CNFS[27694]: Restarting lockd to start grace CNFS[27699]: Starting NFS services CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[27910]: Monitor has started pid=27787 CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it CNFS[28705]: Starting NFS services CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it CNFS[28758]: Starting NFS services CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it CNFS[28816]: Starting NFS services CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server is not running or responding. Node failure initiated as configured. CNFS[28924]: Unexporting all GPFS filesystems Any thoughts? My other CNFS node is handling everything for the time being, thankfully! Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 14 21:09:14 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 14 Feb 2020 21:09:14 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: <072a3754-5160-09da-0c14-54e08ecefef7@strath.ac.uk> On 14/02/2020 16:24, Sanchez, Paul wrote: > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or > sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage > the sync before your cutover so that there is less data movement > required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally > inconsistent) during your rsync, so you should generally sync from a > snapshot on the source. In my experience this causes an rsync to exit with a none zero error code. See later as to why this is useful. Also it will likely have a different mtime that will cause it be resynced on a subsequent run, the final one will be with the file system in a "read only" state. Not necessarily mounted read only but without anything running that might change stuff. [SNIP] > > - If you decide to do a final "offline" sync, you want it to be fast > so users can get back to work sooner, so parallelism is usually a > must. If you have lots of filesets, then that's a convenient way to > split the work. This final "offline" sync is an absolute must, in my experience unless you are able to be rather woolly about preserving data. > > - If you have any filesets with many more inodes than the others, > keep in mind that those will likely take the longest to complete. > Indeed. We found last time that we did an rsync which was for a HPC system from the put of woe that is Lustre to GPFS there was huge mileage to be hand from telling users that they would get on the new system once their data was synced, it would be done on a "per user" basis with the priority given to the users with a combination of the smallest amount of data and the smallest number of files. Did unbelievable wonders for the users to clean up their files. One user went from over 17 million files to under 50 thousand! The amount of data needing syncing nearly halved. It shrank to ~60% of the pre-announcement size. > - Test, test, test. You usually won't get this right on the first go > or know how long a full sync takes without practice. Remember that > you'll need to employ options to delete extraneous files on the > target when you're syncing over the top of a previous attempt, since > files intentionally deleted on the source aren't usually welcome if > they reappear after a migration. > rsync has a --delete option for that. I am going to add that if you do any sort of ILM/HSM then an rsync is going to destroy you ability to identify old files that have not been accessed, as the rsync will up date the atime of everything (don't ask how I know). If you have a backup (of course you do) I would strongly recommend considering getting your first "pass" from a restore. Firstly it won't impact the source file system while it is still in use and second it allows you to check your backup actually works :-) Finally when rsyncing systems like this I use a Perl script with an sqlite DB. Basically a list of directories to sync, you can have both source and destination to make wonderful things happen if wanted, along with a flag field. The way I use that is -1 means not synced, -2 means the folder in question is currently been synced, and anything else is the exit code of rsync. If you write the Perl script correctly you can start it on any number of nodes, just dump the sqlite DB on a shared folder somewhere (either the source or destination file systems work well here). If you are doing it in parallel record the node which did the rsync as well it can be useful in finding any issues in my experience. Once everything is done you can quickly check the sqlite DB for none zero flag fields to find out what if anything has failed, which gives you the confidence that your sync has completed accurately. Also any flag fields less than zero show you it's not finished. Finally you might want to record the time each individual rsync took, it's handy for working out that ordering I mentioned :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chris.schlipalius at pawsey.org.au Fri Feb 14 22:47:00 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 15 Feb 2020 06:47:00 +0800 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <168C52FC-4942-4D66-8762-EAEFC4655021@pawsey.org.au> We have used DCP for this, with mmdsh as DCP is MPI and multi node with auto resume. You can also customise threads numbers etc. DDN in fact ran it for us first on our NSD servers for a multi petabyte migration project. It?s in git. For client side, we recommend and use bbcp, our users use this to sync data. It?s fast and reliable and supports resume also. If you do use rsync, as suggested, do dryruns and then a sync and then final copy, as is often run on Isilons to keep geographically separate Isilons in sync. Newest version of rsync also. Regards, Chris Schlipalius Team Lead Data and Storage The Pawsey Supercomputing Centre Australia > On 15 Feb 2020, at 1:28 am, gpfsug-discuss-request at spectrumscale.org wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. naive question about rsync: run it on a client or on NSD > server? (Giovanni Bracco) > 2. Re: naive question about rsync: run it on a client or on NSD > server? (Simon Thompson) > 3. Re: naive question about rsync: run it on a client or on NSD > server? (Sanchez, Paul) > 4. Re: naive question about rsync: run it on a client or on NSD > server? (Wahl, Edward) > 5. Re: mmbackup [--tsm-servers TSMServer[, TSMServer...]] > (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 14 Feb 2020 14:25:08 +0100 > From: Giovanni Bracco > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > > > ------------------------------ > > Message: 2 > Date: Fri, 14 Feb 2020 14:56:30 +0000 > From: Simon Thompson > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: <404B2B75-C094-43CC-9146-C00410F31578 at bham.ac.uk> > Content-Type: text/plain; charset="utf-8" > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Message: 3 > Date: Fri, 14 Feb 2020 16:24:40 +0000 > From: "Sanchez, Paul" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. > > - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) > > - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. > > - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. > > - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. > > - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. > > > Some thoughts about whether it's appropriate to use NSD servers as sync hosts... > > - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... > - do they have enough available RAM and CPU? > - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? > - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? > > - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. > > Good luck, > Paul > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson > Sent: Friday, February 14, 2020 09:57 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > This message was sent by an external party. > > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------ > > Message: 4 > Date: Fri, 14 Feb 2020 16:13:30 +0000 > From: "Wahl, Edward" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ > > > ------------------------------ > > Message: 5 > Date: Fri, 14 Feb 2020 12:28:27 -0500 > From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" > To: gpfsug main discussion list > Cc: Marc A Kaplan > Subject: Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, > TSMServer...]] > Message-ID: <61512.1581701307 at turing-police> > Content-Type: text/plain; charset="utf-8" > > On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > >> # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog > > I got bit by this when cut-n-pasting from IBM documentation - the problem is that > the web version has characters that *look* like the command-line hyphen character > but are actually something different. > > It's the same problem as cut-n-pasting a command line where the command > *should* have the standard ascii double-quote, but the webpage has "smart quotes" > where there's different open and close quote characters. Just even less visually > obvious... > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 832 bytes > Desc: not available > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 97, Issue 12 > ********************************************** From mnaineni at in.ibm.com Sat Feb 15 10:03:20 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Sat, 15 Feb 2020 10:03:20 +0000 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Sun Feb 16 18:19:00 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 10:19:00 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Hi Malahal: Just to clarify, are you saying that on your VM pidof is missing? Or that it is there and not working as it did prior to RHEL/CentOS 8? pidof is returning pid numbers on my system. I've been looking at the mmnfsmonitor script and trying to see where the check for nfsd might be failing, but I've not been able to figure it out yet. Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni wrote: > I am not familiar with CNFS but looking at git source seems to indicate > that it uses 'pidof' to check if a program is running or not. "pidof nfsd" > works on RHEL7.x but it fails on my centos8.1 I just created. So either we > need to make sure pidof works on kernel threads or fix CNFS scripts. > > Regards, Malahal. > > > ----- Original message ----- > From: Bryan Hill > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from > 4.2.3.11 to 5.0.4.2 > Date: Fri, Feb 14, 2020 11:40 PM > > Hi All: > > I'm performing a rolling upgrade of one of our GPFS clusters. This > particular cluster has 2 CNFS servers for some of our NFS clients. I wiped > one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem > mounts fine on the node when I disable CNFS on the node, but with it > enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd > has started, so it assumes the worst and shuts down the file system (I > currently have reboot on failure disabled to debug this). The thing is, it > actually does start nfsd processes when running mmstartup on the node. > Doing a "ps" shows 32 nfsd threads are running. > > Below is the CNFS-specific output from an attempt to start the node: > > CNFS[27243]: Restarting lockd to start grace > CNFS[27588]: Enabling 172.16.69.76 > CNFS[27694]: Restarting lockd to start grace > CNFS[27699]: Starting NFS services > CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[27910]: Monitor has started pid=27787 > CNFS[28702]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28705]: Starting NFS services > CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28755]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28758]: Starting NFS services > CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28813]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28816]: Starting NFS services > CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28867]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS > server is not running or responding. Node failure initiated as configured. > CNFS[28924]: Unexporting all GPFS filesystems > > Any thoughts? My other CNFS node is handling everything for the time > being, thankfully! > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Mon Feb 17 02:56:24 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 18:56:24 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Ah wait, I see what you might mean. pidof works but not specifically for processes like nfsd. That is odd. Thanks, Bryan On Sun, Feb 16, 2020 at 10:19 AM Bryan Hill wrote: > Hi Malahal: > > Just to clarify, are you saying that on your VM pidof is missing? Or > that it is there and not working as it did prior to RHEL/CentOS 8? pidof > is returning pid numbers on my system. I've been looking at the > mmnfsmonitor script and trying to see where the check for nfsd might be > failing, but I've not been able to figure it out yet. > > > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > > > On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni > wrote: > >> I am not familiar with CNFS but looking at git source seems to indicate >> that it uses 'pidof' to check if a program is running or not. "pidof nfsd" >> works on RHEL7.x but it fails on my centos8.1 I just created. So either we >> need to make sure pidof works on kernel threads or fix CNFS scripts. >> >> Regards, Malahal. >> >> >> ----- Original message ----- >> From: Bryan Hill >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from >> 4.2.3.11 to 5.0.4.2 >> Date: Fri, Feb 14, 2020 11:40 PM >> >> Hi All: >> >> I'm performing a rolling upgrade of one of our GPFS clusters. This >> particular cluster has 2 CNFS servers for some of our NFS clients. I wiped >> one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem >> mounts fine on the node when I disable CNFS on the node, but with it >> enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd >> has started, so it assumes the worst and shuts down the file system (I >> currently have reboot on failure disabled to debug this). The thing is, it >> actually does start nfsd processes when running mmstartup on the node. >> Doing a "ps" shows 32 nfsd threads are running. >> >> Below is the CNFS-specific output from an attempt to start the node: >> >> CNFS[27243]: Restarting lockd to start grace >> CNFS[27588]: Enabling 172.16.69.76 >> CNFS[27694]: Restarting lockd to start grace >> CNFS[27699]: Starting NFS services >> CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[27910]: Monitor has started pid=27787 >> CNFS[28702]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28705]: Starting NFS services >> CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28755]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28758]: Starting NFS services >> CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28813]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28816]: Starting NFS services >> CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28867]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS >> server is not running or responding. Node failure initiated as configured. >> CNFS[28924]: Unexporting all GPFS filesystems >> >> Any thoughts? My other CNFS node is handling everything for the time >> being, thankfully! >> >> Thanks, >> Bryan >> >> --- >> Bryan Hill >> Lead System Administrator >> UCSD Physics Computing Facility >> >> 9500 Gilman Dr. # 0319 >> La Jolla, CA 92093 >> +1-858-534-5538 >> bhill at ucsd.edu >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Feb 17 08:02:19 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 17 Feb 2020 08:02:19 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CNFS_issue_after_upgrading_from_4=2E2?= =?utf-8?b?LjMuMTEgdG8JNS4wLjQuMg==?= In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From rp2927 at gsb.columbia.edu Mon Feb 17 18:42:51 2020 From: rp2927 at gsb.columbia.edu (Popescu, Razvan) Date: Mon, 17 Feb 2020 18:42:51 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients Message-ID: Hi, Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. My questions are: 1. Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) 2. Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? 3. Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). Thanks!! Razvan N. Popescu Research Computing Director Office: (212) 851-9298 razvan.popescu at columbia.edu Columbia Business School At the Very Center of Business -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Feb 17 18:57:47 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 17 Feb 2020 18:57:47 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients In-Reply-To: References: Message-ID: We do this. We provision only the GPFS key files ? /var/mmfs/ssl/stage/genkeyData* ? and the appropriate SSH key files needed, and use the following systemd override to the mmsdrserv.service. Where is the appropriate place to do that override will depend on your version of GFPS somewhat as the systemd setup for GPFS has changed in 5.x, but I?ve rigged this up for any of the 4.x and 5.x that exist so far if you need pointers. We use CentOS, FYI, but I don?t think any of this should be different on Debian; our current version of GPFS on nodes where we do this is 5.0.4-1: [root at master ~]# wwsh file print mmsdrserv-override.conf #### mmsdrserv-override.conf ################################################## mmsdrserv-override.conf: ID = 1499 mmsdrserv-override.conf: NAME = mmsdrserv-override.conf mmsdrserv-override.conf: PATH = /etc/systemd/system/mmsdrserv.service.d/override.conf mmsdrserv-override.conf: ORIGIN = /root/clusters/amarel/mmsdrserv-override.conf mmsdrserv-override.conf: FORMAT = data mmsdrserv-override.conf: CHECKSUM = ee7c28f0eee075a014f7a1a5add65b1e mmsdrserv-override.conf: INTERPRETER = UNDEF mmsdrserv-override.conf: SIZE = 210 mmsdrserv-override.conf: MODE = 0644 mmsdrserv-override.conf: UID = 0 mmsdrserv-override.conf: GID = 0 [root at master ~]# wwsh file show mmsdrserv-override.conf [Unit] After=sys-subsystem-net-devices-ib0.device [Service] ExecStartPre=/usr/lpp/mmfs/bin/mmsdrrestore -p $SERVER -R /usr/bin/scp ExecStartPre=/usr/lpp/mmfs/bin/mmauth genkey propagate -N %{NODENAME}-ib0 ?where $SERVER above has been changed for this e-mail; the actual override file contains the hostname of our cluster manager, or other appropriate config server. %{NODENAME} is filled in by Warewulf, which is our cluster manager, and will contain any given node?s short hostname. I?ve since found that we can also set an object that I could use to make the first line include %{CLUSTERMGR} or other arbitrary variable and make this file more cluster-agnostic, but we just haven?t done that yet. Other than that, we build/install the appropriate gpfs.gplbin- RPM, which we build by doing ? on a node with an identical OS ? or you can manually modify the config and have the appropriate kernel source handy: "cd /usr/lpp/mmfs/src; make Autoconfig; make World; make rpm?. You?d do make deb instead. Also obviously installed is the rest of GPFS and you join the node to the cluster while it?s booted up one of the times. Warewulf starts a node off with a nearly empty /var, so anything we need to be in there has to be populated on boot. It?s required a little tweaking from time to time on OS upgrades or GPFS upgrades, but other than that, we?ve been running clusters like this without incident for years. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 17, 2020, at 1:42 PM, Popescu, Razvan wrote: > > Hi, > > Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). > > I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) > > I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. > > I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. > > My questions are: > ? Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) > ? Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? > ? Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). > > Thanks!! > > Razvan N. Popescu > Research Computing Director > Office: (212) 851-9298 > razvan.popescu at columbia.edu > > Columbia Business School > At the Very Center of Business > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.turner at ed.ac.uk Tue Feb 18 09:28:31 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:28:31 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space Message-ID: Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Tue Feb 18 09:36:57 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 18 Feb 2020 09:36:57 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.turner at ed.ac.uk Tue Feb 18 09:41:24 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:41:24 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: No, we weren?t using snapshots. This is from a location I have just moved from so I can?t do any active investigation now, but I am curious. In the end we had a power outage and the system was fine on reboot. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Luis Bolinches Sent: 18 February 2020 09:37 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space Hi Do you have snapshots? -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous ----- Original message ----- From: TURNER Aaron > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] Odd behaviour with regards to reported free space Date: Tue, Feb 18, 2020 11:28 Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Feb 18 10:50:10 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 18 Feb 2020 10:50:10 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did > not fall. There were not obviously any waiters. Has anyone seen > anything similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.turner at ed.ac.uk Tue Feb 18 11:05:41 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 11:05:41 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Dear Jonathan, This is what I had assumed was the case. Since the system ended up with an enforced reboot before we had time for further investigation I wasn't able to confirm this. > I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. Yes, that's very likely to have been the case. Regards Aaron Turner -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 18 February 2020 10:50 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did not > fall. There were not obviously any waiters. Has anyone seen anything > similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at pixitmedia.com Tue Feb 18 13:30:14 2020 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 18 Feb 2020 13:30:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale Jobs Message-ID: ArcaStream/Pixit Media are hiring! We?re on the hunt for Senior Systems Architects, Systems Engineers and DevOps Engineers to be part of our amazing growth in North America. Do you believe that coming up with innovative ways of solving complex workflow challenges is the truth path to storage happiness? Does the thought of knowing you played a small role in producing a blockbuster film, saving lives by reducing diagnosis times, or even discovering new planets excite you? Have you ever thought ?wouldn?t it be cool if?? while working with Spectrum Scale but never had the sponsorship or time to implement it? Do you want to make a lasting legacy of your awesome skills by building software defined solutions that will be used by hundreds of customers, doing thousands of amazing things? Do you have solid Spectrum Scale experience in either a deployment, development, architectural, support or sales capacity? Do you enjoy taking complex concepts and communicating them in a way that is easy for anyone to understand? If the answers to the above are ?yes?, we?d love to hear from you! Send us your CV/Resume to careers at arcastream.com to find out more information and let us know what your ideal position is! Regards, Barry Evans Chief Innovation Officer/Co-Founder Pixit Media/ArcaStream http://pixitmedia.com http://arcastream.com http://arcapix.com -- ? This email is confidential in that it is? intended for the exclusive attention of?the addressee(s) indicated. If you are?not the intended recipient, this email?should not be read or disclosed to?any other person. Please notify the?sender immediately and delete this? email from your computer system.?Any opinions expressed are not?necessarily those of the company?from which this email was sent and,?whilst to the best of our knowledge no?viruses or defects exist, no?responsibility can be accepted for any?loss or damage arising from its?receipt or subsequent use of this?email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Tue Feb 18 17:37:41 2020 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 18 Feb 2020 11:37:41 -0600 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Deleting a file is a two stage process. The original user thread unlinks the file from the directory and reduces the link count. If the count is zero and the file is not open, then it gets queued for the background deletion thread. The background thread then deletes the blocks and frees the space. If there is a snapshot, the data blocks may be captured and not actually freed. After a crash, the recovery code looks for files that were being deleted and restarts the deletion if necessary. -Wayne gpfsug-discuss-bounces at spectrumscale.org wrote on 02/18/2020 06:05:41 AM: > From: TURNER Aaron > To: gpfsug main discussion list > Date: 02/18/2020 06:05 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Odd behaviour with regards > to reported free space > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Dear Jonathan, > > This is what I had assumed was the case. Since the system ended up > with an enforced reboot before we had time for further investigation > I wasn't able to confirm this. > > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > Yes, that's very likely to have been the case. > > Regards > > Aaron Turner > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: 18 February 2020 10:50 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reportedfree space > > On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > > Dear All, > > > > This has happened more than once with both 4.2.3 and 5.0. The > > instances may not be related. > > > > In the first instance, usage was high (over 90%) and so users were > > encouraged to delete files. One user deleted a considerable number of > > files equal to around 10% of the total storage. Reported usage did not > > fall. There were not obviously any waiters. Has anyone seen anything > > similar? > > > > I have seen similar behaviour a number of times. > > I my experience it is because a process somewhere has an open file > handle on one or more files/directories. So you can delete the file > and it goes from a directory listing; it's no long visible when you do ls. > > However the file has not actually gone, and will continue to count > towards total file system usage, user/group/fileset quota's etc. > > Once the errant process is found and killed magically the space becomes free. > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Feb 19 15:24:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Feb 2020 15:24:42 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) Message-ID: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Feb 19 18:49:51 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 19 Feb 2020 10:49:51 -0800 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> Message-ID: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. If you are willing to host a UG meeting later this year, please let us know. Best, Kristy > On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: > > All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. > > Thanks, > Kristy > > >> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose > wrote: >> >> Thanks for your responses to the poll. >> >> We?re still working on a venue, but working towards: >> >> March 30 - New User Day (Tuesday) >> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >> >> Once it?s confirmed we?ll post something again. >> >> Best, >> Kristy. >> >>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >>> >>> Thank you to the 18 wonderful people who filled out the survey. >>> >>> However, there are well more than 18 people at any given UG meeting. >>> >>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>> >>> Happy New Year. >>> >>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>> >>> Thanks, >>> Kristy >>> >>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>>> >>>> Hello, >>>> >>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>> >>>> Best wishes to all in the new year. >>>> >>>> -Kristy >>>> >>>> >>>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Feb 19 19:31:36 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 19 Feb 2020 19:31:36 +0000 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> Message-ID: I believe we could do it at Rutgers in either Newark or New Brunswick. I?m not sure if that meets most people?s definitions for NYC-area, but I do consider Newark to be. Both are fairly easily accessible by public transportation (and about as close to midtown as some uptown location choices anyway). We had planned to attend the 4/1-2 meeting. Not sure what?s involved to know whether keeping the 4/1-2 date is a viable option if we were able to host. We?d have to make sure we didn?t run afoul of any vendor-ethics guidelines. We recently hosted Ray Paden for a GPFS day, though. We had some trouble with remote participation, but that could be dealt with and I actually don?t think these meetings have that as an option anyway. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 19, 2020, at 1:49 PM, Kristy Kallback-Rose wrote: > > I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. > > If you are willing to host a UG meeting later this year, please let us know. > > Best, > Kristy > >> On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: >> >> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. >> >> Thanks, >> Kristy >> >> >>> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: >>> >>> Thanks for your responses to the poll. >>> >>> We?re still working on a venue, but working towards: >>> >>> March 30 - New User Day (Tuesday) >>> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >>> >>> Once it?s confirmed we?ll post something again. >>> >>> Best, >>> Kristy. >>> >>>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose wrote: >>>> >>>> Thank you to the 18 wonderful people who filled out the survey. >>>> >>>> However, there are well more than 18 people at any given UG meeting. >>>> >>>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>>> >>>> Happy New Year. >>>> >>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>>> Thanks, >>>> Kristy >>>> >>>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose wrote: >>>>> >>>>> Hello, >>>>> >>>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>>> >>>>> Best wishes to all in the new year. >>>>> >>>>> -Kristy >>>>> >>>>> >>>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Feb 19 19:58:59 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 19 Feb 2020 19:58:59 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: I?m extremely curious as to this answer as well. At one point a while back I started looking into this via the KMIP side with things, but ran out of time to continue. http://docs.oasis-open.org/kmip/testcases/v1.4/kmip-testcases-v1.4.html http://docs.oasis-open.org/kmip/testcases/v1.4/cnprd01/test-cases/kmip-v1.4/ Ed From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, February 19, 2020 10:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Feb 19 22:07:50 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Feb 2020 22:07:50 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Feb 19 23:34:37 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 19 Feb 2020 15:34:37 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? Thanks, Renata From YARD at il.ibm.com Thu Feb 20 06:46:17 2020 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 20 Feb 2020 08:46:17 +0200 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: Hi Also in case that u configure 3 SKLM servers (1 Primary - 2 Slaves, in case the Primary is not responding you will see in the logs this messages: Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: "Felipe Knop" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 20/02/2020 00:08 Subject: [EXTERNAL] Re: [gpfsug-discuss] Encryption - checking key server health (SKLM) Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, Scale does not yet have a tool to perform a health-check on a key server, or an independent mechanism to retrieve keys. One can use a command such as 'mmkeyserv key show' to retrieve the list of keys from a given SKLM server (and use that to determine whether the key server is responsive), but being able to retrieve a list of keys does not necessarily mean being able to retrieve the actual keys, as the latter goes through the KMIP port/protocol, and the former uses the REST port/API: # mmkeyserv key show --server 192.168.105.146 --server-pwd /tmp/configKeyServ_pid11403914_keyServPass --tenant sklm3Tenant KEY-ad4f3a9-01397ebf-601b-41fb-89bf-6c4ac333290b KEY-ad4f3a9-019465da-edc8-49d4-b183-80ae89635cbc KEY-ad4f3a9-0509893d-cf2a-40d3-8f79-67a444ff14d5 KEY-ad4f3a9-08d514af-ebb2-4d72-aa5c-8df46fe4c282 KEY-ad4f3a9-0d3487cb-a674-44ab-a7d0-1f68e86e2fc9 [...] Having a tool that can retrieve keys independently from mmfsd would be useful capability to have. Could you submit an RFE to request such function? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Encryption - checking key server health (SKLM) Date: Wed, Feb 19, 2020 11:35 AM I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=ARpfta6x0GFP8yy67RAuT4SMBrRHROGRUwCOSPVDEF8&s=aMBH47I25734lVmyzTZBiPd6a1ELRuurxoFCTf6Ij_Y&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 11736 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 10:33:57 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 10:33:57 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: Message-ID: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Feb 20 10:41:17 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 10:41:17 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> Message-ID: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Well, if you were buying some form of extended Life Support for Scale, then you might also be expecting to buy extended life for RedHat. RHEL6 has extended life support until June 2024. Sure its an add on subscription cost, but some people might be prepared to do that over OS upgrades. Simon ?On 20/02/2020, 10:34, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Feb 20 11:23:52 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 11:23:52 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> On 20/02/2020 10:41, Simon Thompson wrote: > Well, if you were buying some form of extended Life Support for > Scale, then you might also be expecting to buy extended life for > RedHat. RHEL6 has extended life support until June 2024. Sure its an > add on subscription cost, but some people might be prepared to do > that over OS upgrades. I would recommend anyone going down that to route to take a *very* close look at what you get for the extended support. Not all of the OS is supported, with large chunks being moved to unsupported even if you pay for the extended support. Consequently extended support is not suitable for HPC usage in my view, so start planning the upgrade now. It's not like you haven't had 10 years notice. If your GPFS is just a storage thing serving out on protocol nodes, upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. It's a relatively easy invisible to the users upgrade. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From knop at us.ibm.com Thu Feb 20 13:27:47 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 20 Feb 2020 13:27:47 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 14:17:58 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 14:17:58 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> To reiterate what?s been said on this thread, and to reaffirm the official IBM position: * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. EUS for Scale is no-charge, it is included as a standard part of your S&S. Regards, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_2106701756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From stockf at us.ibm.com Thu Feb 20 14:34:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 20 Feb 2020 14:34:49 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Feb 20 15:19:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 15:19:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220151909.7rbljupfl27whdtu@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 11:23:52AM +0000, Jonathan Buzzard wrote: > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. I agree, we're having increasing difficulty running CentOS 6, not because of the lack of support from IBM/RedHat, but because the software our customers want to run has started depending on OS features that simply don't exist in CentOS 6. In particular, modern gcc and glibc, and containers are all features that many of our customers are expecting that we provide. The newer kernel available in CentOS 7 (and now 8) supports large numbers of CPUs and large amounts of memory far better than the ancient CentOS 6 kernel as well. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From renata at slac.stanford.edu Thu Feb 20 15:58:08 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 07:58:08 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> References: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> Message-ID: Thanks very much for your response Carl, this is the information I was looking for. Renata On Thu, 20 Feb 2020, Carl Zetie - carlz at us.ibm.com wrote: >To reiterate what?s been said on this thread, and to reaffirm the official IBM position: > > > * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. > * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). > * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. > > >I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. > >The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. > >EUS for Scale is no-charge, it is included as a standard part of your S&S. > > >Regards, > > > >Carl Zetie >Program Director >Offering Management >Spectrum Scale & Spectrum Discover >---- >(919) 473 3318 ][ Research Triangle Park >carlz at us.ibm.com > >[signature_2106701756] > > > From hpc.ken.tw25qn at gmail.com Thu Feb 20 16:29:40 2020 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Thu, 20 Feb 2020 16:29:40 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: Fred, It may be that some HPC users "have to" reverify the results of their computations as being exactly the same as a previous software stack and that is not a minor task. Any change may require this verification process..... Ken Atkjnson On Thu, 20 Feb 2020, 14:35 Frederick Stock, wrote: > This is a bit off the point of this discussion but it seemed like an > appropriate context for me to post this question. IMHO the state of > software is such that it is expected to change rather frequently, for > example the OS on your laptop/tablet/smartphone and your web browser. It > is correct to say those devices are not running an HPC or enterprise > environment but I mention them because I expect none of us would think of > running those devices on software that is a version far from the latest > available. With that as background I am curious to understand why folks > would continue to run systems on software like RHEL 6.x which is now two > major releases(and many years) behind the current version of that product? > Is it simply the effort required to upgrade 100s/1000s of nodes and the > disruption that causes, or are there other factors that make keeping > current with OS releases problematic? I do understand it is not just a > matter of upgrading the OS but all the software, like Spectrum Scale, that > runs atop that OS in your environment. While they all do not remain in > lock step I would think that in some window of time, say 12-18 months > after an OS release, all software in your environment would support a > new/recent OS release that would technically permit the system to be > upgraded. > > I should add that I think you want to be on or near the latest release of > any software with the presumption that newer versions should be an > improvement over older versions, albeit with the usual caveats of new > defects. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 16:41:59 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 16:41:59 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <50DD3E29-5CDC-4FCB-9080-F39DE4532761@us.ibm.com> Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From renata at slac.stanford.edu Thu Feb 20 16:57:47 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 08:57:47 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: Hi Frederick, ours is a physics research lab with a mix of new eperiments and ongoing research. While some users embrace and desire the latest that tech has to offer and are actively writing code to take advantage of it, we also have users running older code on data from older experiments which depends on features of older OS releases and they are often not the ones who wrote the code. We have a mix of systems to accomodate both groups. Renata On Thu, 20 Feb 2020, Frederick Stock wrote: >This is a bit off the point of this discussion but it seemed like an appropriate context for me to post this question.? IMHO the state of software is such that >it is expected to change rather frequently, for example the OS on your laptop/tablet/smartphone and your web browser.? It is correct to say those devices are >not running an HPC or enterprise environment but I mention them because I expect none of us would think of running those devices on software that is a version >far from the latest available.? With that as background I am curious to understand why folks would continue to run systems on software like RHEL 6.x which is >now two major releases(and many years) behind the current version of that product?? Is it simply the effort required to upgrade 100s/1000s of nodes and the >disruption that causes, or are there other factors that make keeping current with OS releases problematic?? I do understand it is not just a matter of upgrading >the OS but all the software, like Spectrum Scale, that runs atop that OS in your environment.? While they all do not remain in lock step I would? think that in >some window of time, say 12-18 months after an OS release, all software in your environment would support a new/recent OS release that would technically permit >the system to be upgraded. >? >I should add that I think you want to be on or near the latest release of any software with the presumption that newer versions should be an improvement over >older versions, albeit with the usual caveats of new defects. > >Fred >__________________________________________________ >Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >stockf at us.ibm.com >? >? > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > ? On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard ? ? ? ? ? ? ? ? ? ? ? ? Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > >? > > > From skylar2 at uw.edu Thu Feb 20 16:59:53 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 16:59:53 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > Fred, > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... > Ken Atkjnson We have this problem too, but at the same time the same people require us to run supported software and remove software versions with known vulnerabilities. The compromise we've worked out for the researchers is to have them track which software versions they used for a particular run/data release. The researchers who care more will have a validation suite that will (hopefully) call out problems as we do required upgrades. At some point, it's simply unrealistic to keep legacy systems around, though we do have a lab that needs a Solaris/SPARC system just to run a 15-year-old component of a pipeline for which they don't have source code... -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From malone12 at illinois.edu Thu Feb 20 17:00:46 2020 From: malone12 at illinois.edu (Maloney, J.D.) Date: Thu, 20 Feb 2020 17:00:46 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <2D960263-2CF3-4834-85CE-EB0F977169CB@illinois.edu> I assisted in a migration a couple years ago when we pushed teams to RHEL 7 and the science pipeline folks weren?t really concerned with the version of Scale we were using, but more what the new OS did to their code stack with the newer version of things like gcc and other libraries. They ended up re-running pipelines from prior data releases to compare the outputs of the pipelines to make sure they were within tolerance and matched prior results. Best, J.D. Maloney HPC Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) From: on behalf of "Carl Zetie - carlz at us.ibm.com" Reply-To: gpfsug main discussion list Date: Thursday, February 20, 2020 at 10:42 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From david_johnson at brown.edu Thu Feb 20 17:14:40 2020 From: david_johnson at brown.edu (David Johnson) Date: Thu, 20 Feb 2020 12:14:40 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Instead of keeping whole legacy systems around, could they achieve the same with a container built from the legacy software? > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: >> Fred, >> It may be that some HPC users "have to" >> reverify the results of their computations as being exactly the same as a >> previous software stack and that is not a minor task. Any change may >> require this verification process..... >> Ken Atkjnson > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. The compromise we've worked out for the researchers is to > have them track which software versions they used for a particular run/data > release. The researchers who care more will have a validation suite that > will (hopefully) call out problems as we do required upgrades. > > At some point, it's simply unrealistic to keep legacy systems around, > though we do have a lab that needs a Solaris/SPARC system just to run a > 15-year-old component of a pipeline for which they don't have source code... > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Thu Feb 20 17:20:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 17:20:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Message-ID: <20200220172009.gtkek3nlohathrro@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 12:14:40PM -0500, David Johnson wrote: > Instead of keeping whole legacy systems around, could they achieve the same > with a container built from the legacy software? That is our hope, at least once we can get off CentOS 6 and run containers. :) Though containers aren't quite a panacea; there's still the issue of insecure software being baked into the container, but at least we can limit what the container can access more easily than running outside a container. > > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > >> Fred, > >> It may be that some HPC users "have to" > >> reverify the results of their computations as being exactly the same as a > >> previous software stack and that is not a minor task. Any change may > >> require this verification process..... > >> Ken Atkjnson > > > > We have this problem too, but at the same time the same people require us > > to run supported software and remove software versions with known > > vulnerabilities. The compromise we've worked out for the researchers is to > > have them track which software versions they used for a particular run/data > > release. The researchers who care more will have a validation suite that > > will (hopefully) call out problems as we do required upgrades. > > > > At some point, it's simply unrealistic to keep legacy systems around, > > though we do have a lab that needs a Solaris/SPARC system just to run a > > 15-year-old component of a pipeline for which they don't have source code... > > > > -- > > -- Skylar Thompson (skylar2 at u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Thu Feb 20 19:45:02 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 19:45:02 +0000 Subject: [gpfsug-discuss] Unkillable snapshots Message-ID: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Thu Feb 20 19:46:53 2020 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 20 Feb 2020 19:46:53 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: Move the file system manager :) On Thu, 20 Feb 2020, 19:45 Simon Thompson, wrote: > Hi, > > > We have a snapshot which is stuck in the state "DeleteRequired". When > deleting, it goes through the motions but eventually gives up with: > > Unable to quiesce all nodes; some processes are busy or holding required > resources. > mmdelsnapshot: Command failed. Examine previous error messages to > determine cause. > > And in the mmfslog on the FS manager there are a bunch of retries and > "failure to quesce" on nodes. However in each retry its never the same set > of nodes. I suspect we have one HPC job somewhere killing us. > > > What's interesting is that we can delete other snapshots OK, it appears to > be one particular fileset. > > > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and > waiters around just tend to point to the FS manager node. > > > So ... any suggestions? I'm assuming its some workload holding a lock open > or some such, but tracking it down is proving elusive! > > > Generally the FS is also "lumpy" ... at times it feels like a wifi > connection on a train using a terminal, I guess its all related though. > > > Thanks > > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 20:13:14 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 20:13:14 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 20 20:29:44 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 15:29:44 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: Message-ID: <13747.1582230584@turing-police> On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > Having a tool that can retrieve keys independently from mmfsd would be useful > capability to have. Could you submit an RFE to request such function? Note that care needs to be taken to do this in a secure manner. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ulmer at ulmer.org Thu Feb 20 20:43:11 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 20 Feb 2020 15:43:11 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <13747.1582230584@turing-police> References: <13747.1582230584@turing-police> Message-ID: It seems like this belongs in mmhealth if it were to be bundled. If you need to use a third party tool, maybe fetch a particular key that is only used for fetching, so it?s compromise would represent no risk. -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Feb 20, 2020, at 3:11 PM, Valdis Kl?tnieks wrote: > > ?On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > >> Having a tool that can retrieve keys independently from mmfsd would be useful >> capability to have. Could you submit an RFE to request such function? > > Note that care needs to be taken to do this in a secure manner. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From truston at mbari.org Thu Feb 20 20:43:03 2020 From: truston at mbari.org (Todd Ruston) Date: Thu, 20 Feb 2020 12:43:03 -0800 Subject: [gpfsug-discuss] Policy REGEX question Message-ID: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Greetings, I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') The Scale policy documentation for REGEX only mentions 2 parameters, not 3: REGEX(String,'Pattern') Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). -- Todd E. Ruston Information Systems Manager Monterey Bay Aquarium Research Institute (MBARI) 7700 Sandholdt Road, Moss Landing, CA, 95039 Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 21:26:39 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 16:26:39 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Message-ID: Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=rIyEAXKyzwEj_pyM9DRQ1mL3x5gHjoqSpnhqxP6Oj-8&s=ZRXJm9u1_WLClH0Xua2PeIr-cWHj8YasvQCwndgdyns&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 21:39:10 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 21:39:10 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, Message-ID: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 22:13:56 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 17:13:56 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk><93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Message-ID: Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=eGuD3K3Va_jMinEQHJN-FU1-fi2V-VpqWjHiTVUK-L8&s=fX3QMwGX7-yxSM4VSqPqBUbkT41ntfZFRZnalg9PZBI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:17:41 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:17:41 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: Looking at the example '*/xy_survey_*/name/*.tif': that's not a "real" (POSIX) regular expression but a use of a much simpler "wildcard pattern" as commonly used in the UNIX shell when matching filenames. So I would assume that the 'f' parameter just mandates that REGEX() must apply "filename matching" rules here instead of POSIX regular expressions. makes sense? -- Peter > On Feb 20, 2020, at 21:43, Todd Ruston wrote: > > Greetings, > > I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: > > REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') > > The Scale policy documentation for REGEX only mentions 2 parameters, not 3: > > REGEX(String,'Pattern') > Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. > > (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) > > Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). > > -- > Todd E. Ruston > Information Systems Manager > Monterey Bay Aquarium Research Institute (MBARI) > 7700 Sandholdt Road, Moss Landing, CA, 95039 > Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:25:35 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:25:35 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Sorry, I believe you had nailed it already -- I didn't read carefully to the end. > On Feb 20, 2020, at 23:17, Peter Serocka wrote: > > Looking at the example '*/xy_survey_*/name/*.tif': > that's not a "real" (POSIX) regular expression but a use of > a much simpler "wildcard pattern" as commonly used in the UNIX shell > when matching filenames. > > So I would assume that the 'f' parameter just mandates that > REGEX() must apply "filename matching" rules here instead > of POSIX regular expressions. > > makes sense? > > -- Peter > > >> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >> >> Greetings, >> >> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >> >> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >> >> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >> >> REGEX(String,'Pattern') >> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >> >> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >> >> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >> >> -- >> Todd E. Ruston >> Information Systems Manager >> Monterey Bay Aquarium Research Institute (MBARI) >> 7700 Sandholdt Road, Moss Landing, CA, 95039 >> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 20 22:28:43 2020 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 20 Feb 2020 17:28:43 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad > On Feb 20, 2020, at 5:14 PM, Nathan Falk wrote: > > ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. > > > > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > Hi Nate, > So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? > But yes, essentially running this by hand to clean up. > What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: > rds-projects-b which is mounted as /rds/projects/b > But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: > /rds/projects/g which is a different file-set. > What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? > Does that sound plausible? > Thanks > Simon > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Unkillable snapshots > > Hello Simon, > > Sadly, that "1036" is not a node ID, but just a counter. > > These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. > > Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. > > You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. > > It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. > > Thanks, > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hmm ... mmdiag --tokenmgr shows: > > > Server stats: requests 195417431 ServerSideRevokes 120140 > nTokens 2146923 nranges 4124507 > designated mnode appointed 55481 mnode thrashing detected 1036 > So how do I convert "1036" to a node? > Simon > > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Unkillable snapshots > > Hi, > We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: > > > Unable to quiesce all nodes; some processes are busy or holding required resources. > mmdelsnapshot: Command failed. Examine previous error messages to determine cause. > And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. > What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. > So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! > Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. > Thanks > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 23:38:15 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 23:38:15 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: On 20/02/2020 16:59, Skylar Thompson wrote: [SNIP] > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. For us, it is a Scottish government mandate that all public funded bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days from a critical vulnerability till your patched. No if's no buts, just do it. So while where are not their yet (its a work in progress to make this as seamless as possible) frankly running unpatched systems for years on end because we are too busy/lazy to validate a new system is completely unacceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Fri Feb 21 02:00:59 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 21:00:59 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <36675.1582250459@turing-police> On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: > For us, it is a Scottish government mandate that all public funded > bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days > from a critical vulnerability till your patched. No if's no buts, just > do it. Is that 10 days from vuln dislosure, or from patch availability? The latter can be a headache, especially if 24-48 hours pass between when the patch actually hits the streets and you get the e-mail, or if you have other legal mandates that patches be tested before production deployment. The former is simply unworkable - you *might* be able to deploy mitigations or other work-arounds, but if it's something complicated that requires a lot of re-work of code, you may be waiting a lot more than 10 days for a patch.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From Paul.Sanchez at deshaw.com Fri Feb 21 02:05:12 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 21 Feb 2020 02:05:12 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: <9ca16f7634354e4db8bed681a306b714@deshaw.com> Another possibility is to try increasing the timeouts. We used to have problems with this all of the time on clusters with thousands of nodes, but now we run with the following settings increased from their [defaults]? sqtBusyThreadTimeout [10] = 120 sqtCommandRetryDelay [60] = 120 sqtCommandTimeout [300] = 500 These are in the category of undocumented configurables, so you may wish to accompany this with a PMR. And you?ll need to know the secret handshake that follows this? mmchconfig: Attention: Unknown attribute specified: sqtBusyThreadTimeout. Press the ENTER key to continue. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sven Oehme Sent: Thursday, February 20, 2020 17:29 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots This message was sent by an external party. Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad On Feb 20, 2020, at 5:14 PM, Nathan Falk > wrote: ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 21 11:04:32 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 21 Feb 2020 11:04:32 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <36675.1582250459@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> Message-ID: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> On 21/02/2020 02:00, Valdis Kl?tnieks wrote: > On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: >> For us, it is a Scottish government mandate that all public funded >> bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days >> from a critical vulnerability till your patched. No if's no buts, just >> do it. > > Is that 10 days from vuln dislosure, or from patch availability? > Patch availability. Basically it's a response to the issue a couple of years ago now where large parts of the NHS in Scotland had serious problems due to some Windows vulnerability for which a patch had been available for some months. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From andi at christiansen.xxx Fri Feb 21 13:07:01 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Fri, 21 Feb 2020 14:07:01 +0100 (CET) Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Message-ID: <270013029.95562.1582290421465@privateemail.com> An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Feb 21 14:14:49 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 21 Feb 2020 15:14:49 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES Message-ID: Dear all, I was wondering if anybody recently encountered a similar issue (I found a related thread from 2018, but it was inconclusive). I just found that one of our production CES nodes have 28k CLOSE_WAIT tcp6 connections, I do not understand why... the second node in the same cluster does not have this issue. Both are: - GPFS 5.0.4.2 - RHEL 7.4 has anybody else encountered anything similar? In the last few days it seems it happened once on one node, and twice on the other, but never on both... Thanks for any feedback! cheers leo -- Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truston at mbari.org Fri Feb 21 16:15:54 2020 From: truston at mbari.org (Todd Ruston) Date: Fri, 21 Feb 2020 08:15:54 -0800 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Message-ID: <9E104C63-9C6D-4E46-BEFF-AEF7E1AF8EC9@mbari.org> Thanks Peter, and no worries; great minds think alike. ;-) - Todd > On Feb 20, 2020, at 2:25 PM, Peter Serocka wrote: > > Sorry, I believe you had nailed it already -- I didn't > read carefully to the end. > >> On Feb 20, 2020, at 23:17, Peter Serocka > wrote: >> >> Looking at the example '*/xy_survey_*/name/*.tif': >> that's not a "real" (POSIX) regular expression but a use of >> a much simpler "wildcard pattern" as commonly used in the UNIX shell >> when matching filenames. >> >> So I would assume that the 'f' parameter just mandates that >> REGEX() must apply "filename matching" rules here instead >> of POSIX regular expressions. >> >> makes sense? >> >> -- Peter >> >> >>> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >>> >>> Greetings, >>> >>> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >>> >>> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >>> >>> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >>> >>> REGEX(String,'Pattern') >>> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >>> >>> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >>> >>> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >>> >>> -- >>> Todd E. Ruston >>> Information Systems Manager >>> Monterey Bay Aquarium Research Institute (MBARI) >>> 7700 Sandholdt Road, Moss Landing, CA, 95039 >>> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gterryc at vmsupport.com Fri Feb 21 17:18:11 2020 From: gterryc at vmsupport.com (George Terry) Date: Fri, 21 Feb 2020 11:18:11 -0600 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 Message-ID: Hello, I've a question about upgrade of GPFS 3.5. We have an infrastructure with GSPF 3.5.0.33 and we need upgrade to Spectrum Scale 5.0.3. Can we upgrade from 3.5 to 4.1, 4.2 and 5.0.3 or can we do something additional like unistall GPFS 3.5 and install Spectrum Scale 5.0.3? Thank you George -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Fri Feb 21 17:25:12 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Fri, 21 Feb 2020 19:25:12 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: <270013029.95562.1582290421465@privateemail.com> References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=XKMIdSqQ76jf_FrIRFtAhMsgU-MkPFhxBJjte8AdeYs&s=vih7W_XcatoqN_MhS3gEK9RR6RxpNrfB2UvvQeXqyH8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Feb 21 18:50:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 21 Feb 2020 18:50:49 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 21 21:15:28 2020 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 21 Feb 2020 21:15:28 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Fri Feb 21 23:32:13 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Sat, 22 Feb 2020 00:32:13 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Sat Feb 22 00:08:19 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 22 Feb 2020 00:08:19 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > ?Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > >> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >> >> Hi, >> >> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >> and >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >> >> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >> >> >> >> Regards, >> >> Tomer Perry >> Scalable I/O Development (Spectrum Scale) >> email: tomp at il.ibm.com >> 1 Azrieli Center, Tel Aviv 67021, Israel >> Global Tel: +1 720 3422758 >> Israel Tel: +972 3 9188625 >> Mobile: +972 52 2554625 >> >> >> >> >> From: Andi Christiansen >> To: "gpfsug-discuss at spectrumscale.org" >> Date: 21/02/2020 15:25 >> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi all, >> >> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >> >> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >> >> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >> >> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >> >> Best Regards >> Andi Christiansen _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Feb 22 05:55:54 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 22 Feb 2020 05:55:54 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > ?Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > >>> On 22 Feb 2020, at 09:32, Andi Christiansen wrote: >>> >> ?Hi, >> >> Thanks for answering! >> >> Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. >> >> I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. >> >> Best Regards >> Andi Christiansen >> >> >> >> Sendt fra min iPhone >> >>> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >>> >>> Hi, >>> >>> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >>> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >>> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >>> and >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >>> >>> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >>> >>> >>> >>> Regards, >>> >>> Tomer Perry >>> Scalable I/O Development (Spectrum Scale) >>> email: tomp at il.ibm.com >>> 1 Azrieli Center, Tel Aviv 67021, Israel >>> Global Tel: +1 720 3422758 >>> Israel Tel: +972 3 9188625 >>> Mobile: +972 52 2554625 >>> >>> >>> >>> >>> From: Andi Christiansen >>> To: "gpfsug-discuss at spectrumscale.org" >>> Date: 21/02/2020 15:25 >>> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi all, >>> >>> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >>> >>> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >>> >>> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >>> >>> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >>> >>> Best Regards >>> Andi Christiansen _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Sat Feb 22 09:35:32 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Sat, 22 Feb 2020 11:35:32 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi, Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning . I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Luis Bolinches" To: "gpfsug main discussion list" Cc: Jake Carrol Date: 22/02/2020 07:56 Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers On 22. Feb 2020, at 3.08, Andrew Beattie wrote: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone On 22 Feb 2020, at 09:32, Andi Christiansen wrote: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=vPbqr3ME98a_M4VrB5IPihvzTzG8CQUAuI0eR-kqXcs&s=kIM8S1pVtYFsFxXT3gGQ0DmcwRGBWS9IqtoYTtcahM8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sun Feb 23 04:43:37 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sat, 22 Feb 2020 23:43:37 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> Message-ID: <208376.1582433017@turing-police> On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > > Is that 10 days from vuln dislosure, or from patch availability? > > > > Patch availability. Basically it's a response to the issue a couple of That's not *quite* so bad. As long as you trust *all* your vendors to notify you when they release a patch for an issue you hadn't heard about. (And that no e-mail servers along the way don't file it under 'spam') -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Sun Feb 23 12:20:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 23 Feb 2020 12:20:48 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <208376.1582433017@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> Message-ID: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> On 23/02/2020 04:43, Valdis Kl?tnieks wrote: > On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > >>> Is that 10 days from vuln dislosure, or from patch availability? >>> >> >> Patch availability. Basically it's a response to the issue a couple of > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of any systems administrator to know when any critical patch becomes available for any software/hardware that they are using. To not be actively monitoring it is IMHO a dereliction of duty, worthy of a verbal and then written warning. I also feel that the old practice of leaving HPC systems unpatched for years on end is no longer acceptable. From a personal perspective I have in now over 20 years never had a system that I have been responsible for knowingly compromised. I would like it to stay that way because I have no desire to be explaining to higher ups why the HPC facility was hacked. The fact that the Scottish government have mandated I apply patches just makes my life easier because any push back from the users is killed dead instantly; I have too, go moan at your elective representative if you want it changed. In the meantime suck it up :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Sun Feb 23 21:58:03 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sun, 23 Feb 2020 16:58:03 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> Message-ID: <272151.1582495083@turing-police> On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of > any systems administrator to know when any critical patch becomes > available for any software/hardware that they are using. You missed the point. Unless you spend your time constantly e-mailing *all* of your vendors "Are there new patches I don't know about?", you're relying on them to notify you when there's a known issue, and when a patch comes out. Redhat is good about notification. IBM is. But how about things like your Infiniband stack? OFED? The firmware in all your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you get notified about security issues in the Management Engine stuff (and there's been plenty of them). Do *all* of those vendors have security lists? Are you subscribed to *all* of them? Do *all* of them actually post to those lists? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From andi at christiansen.xxx Mon Feb 24 22:31:45 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Mon, 24 Feb 2020 23:31:45 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi all, Thank you for all your suggestions! The latency is 30ms between the sites (1600km to be exact). So if I have entered correctly in the calculator 1Gb is actually what is expected on that distance. I had a meeting today with IBM where we were able to push that from the 1Gb to about 4Gb on one link with minimal tuning, more tuning will come the next few days! We are also looking to implement the feature afmParallelMounts which should give us the full bandwidth we have between the sites :-) Thanks! Best Regards Andi Christiansen Sendt fra min iPhone > Den 22. feb. 2020 kl. 10.35 skrev Tomer Perry : > > Hi, > > Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). > The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning. > I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ > > The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Luis Bolinches" > To: "gpfsug main discussion list" > Cc: Jake Carrol > Date: 22/02/2020 07:56 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. > > -- > Cheers > > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm- and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Feb 24 23:58:15 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 24 Feb 2020 15:58:15 -0800 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <272151.1582495083@turing-police> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> <272151.1582495083@turing-police> Message-ID: <20200224235815.mjecsge35rqseoq5@hithlum> On Sun, Feb 23, 2020 at 04:58:03PM -0500, Valdis Kl?tnieks wrote: > On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > > you when they release a patch for an issue you hadn't heard about. > > > Er, what do you think I am paid for? Specifically it is IMHO the job of > > any systems administrator to know when any critical patch becomes > > available for any software/hardware that they are using. > > You missed the point. > > Unless you spend your time constantly e-mailing *all* of your vendors > "Are there new patches I don't know about?", you're relying on them to > notify you when there's a known issue, and when a patch comes out. > > Redhat is good about notification. IBM is. > > But how about things like your Infiniband stack? OFED? The firmware in all > your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you > get notified about security issues in the Management Engine stuff (and there's > been plenty of them). Do *all* of those vendors have security lists? Are you > subscribed to *all* of them? Do *all* of them actually post to those lists? We put our notification sources (Nessus, US-CERT, etc.) into our response plan. Of course it's still a problem if we don't get notified, but part of the plan is to make it clear where we're willing to accept risk, and to limit our own liability. No process is going to be perfect, but we at least know and accept where those imperfections are. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From stockf at us.ibm.com Tue Feb 25 14:01:20 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 25 Feb 2020 14:01:20 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Tue Feb 25 20:32:10 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 25 Feb 2020 21:32:10 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: Hi Frederick, thanks for the answer! Unfortunately it seems not the case :( [root at xbl-ces-4 ~]# netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l 0 In our case, Zimon does not directly interact with Grafana over the bridge, but we have a small python script that (through Telegraf) polls the collector and ingest data into InfluxDB, which acts as data source for Grafana. An example of the opened port is: tcp6?????? 1????? 0 129.129.95.84:40038 129.129.99.247:39707??? CLOSE_WAIT? 39131/gpfs.ganesha. We opened a PMR to check what's happening, let's see :) But possibly first thing to do is to disable IPv6 cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 25.02.20 15:01, Frederick Stock wrote: > netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 12:58:40 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 13:58:40 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? Message-ID: <313052288.162314.1582721920742@privateemail.com> An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Feb 26 13:04:52 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 26 Feb 2020 13:04:52 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> Message-ID: Why don?t you look at packaging your small files into larger files which will be handled more effectively. There is no simple way to replicate / move billions of small files, But surely you can build your work flow to package the files up into a zip or tar format which will simplify not only the number of IO transactions but also make the whole process more palatable to the NFS protocol Sent from my iPhone > On 26 Feb 2020, at 22:58, Andi Christiansen wrote: > > ? > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=BDsYqP0is2zoDGYU5Ej1lSJ4s9DJhMsW40equi5dqCs&s=22KcLJbUqsq3nfr3qWnxDqA3kuHnFxSDeiENVUITmdA&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Feb 26 13:27:32 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 26 Feb 2020 13:27:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> References: <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 13:33:51 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 13:33:51 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:18 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:18 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <688463139.162864.1582724298905@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:59 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:59 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <673673077.162875.1582724339498@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:39:22 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:39:22 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: <262580944.162883.1582724362722@privateemail.com> An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 14:24:32 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 14:24:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <262580944.162883.1582724362722@privateemail.com> References: <262580944.162883.1582724362722@privateemail.com>, , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Feb 26 15:49:45 2020 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Feb 2020 08:49:45 -0700 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> <262580944.162883.1582724362722@privateemail.com> Message-ID: if you are looking for a commercial supported solution, our Dataflow product is purpose build for this kind of task. a presentation that covers some high level aspects of it was given by me last year at one of the spectrum scale meetings in the UK --> https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSUG19UK-Day-1-05-DDN-Optimizing-storage-stacks-for-AI.pdf. its at the end of the deck. if you want more infos, please let me know and i can get you in contact with the right person. Sven On Wed, Feb 26, 2020 at 7:24 AM Frederick Stock wrote: > > What sources are you using to help you with configuring AFM? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Andi Christiansen > To: Frederick Stock , gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] RE: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:39 AM > > 5.0.4-2.1 (home and cache) > > On February 26, 2020 2:33 PM Frederick Stock wrote: > > > Andi, what version of Spectrum Scale do you have installed? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: andi at christiansen.xxx, gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:27 AM > > you may consider WatchFolder ... (cluster wider inotify --> kafka) .. and then you go from there > > > > ----- Original message ----- > From: Andi Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 1:59 PM > > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chris.schlipalius at pawsey.org.au Thu Feb 27 00:23:56 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Thu, 27 Feb 2020 08:23:56 +0800 Subject: [gpfsug-discuss] AFM Alternative? Aspera? Message-ID: Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? From vpuvvada at in.ibm.com Fri Feb 28 05:22:56 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 28 Feb 2020 10:52:56 +0530 Subject: [gpfsug-discuss] AFM Alternative? Aspera? In-Reply-To: References: Message-ID: Transferring the small files with AFM + NFS over high latency networks is always a challenge. For example, for each small file replication AFM performs a lookup, create, write and set mtime operation. If the latency is 10ms, replication of each file takes minimum (10 * 4 = 40 ms) amount of time. AFM is not a network acceleration tool and also it does not use compression. If the file sizes are big, AFM parallel IO and parallel mounts feature can be used. Aspera can be used to transfer the small files over high latency network with better utilization of the network bandwidth. https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm ~Venkat (vpuvvada at in.ibm.com) From: Chris Schlipalius To: Date: 02/27/2020 05:54 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? Aspera? Sent by: gpfsug-discuss-bounces at spectrumscale.org Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spectrumscaleug.org_wp-2Dcontent_uploads_2019_05_SSSD19DE-2DDay-2D2-2DB02-2DIntegration-2Dof-2DSpectrum-2DScale-2Dand-2DAspera-2DSync.pdf&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=tRCxd4SimJH_eycqekhzM0Qp3TB3NtaIYWBvyQnrIiM&e= Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au < https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pawsey.org.au_&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=Xkm8VFy3l6nyD40yhONihsKcqmwRhy4SZyd0lwHf1GA&e= > On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=mYK1ZsVgtsM6HntRMLPS49tKvEhhgGAdWF2qniyn9Ko&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Feb 28 08:55:06 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 28 Feb 2020 08:55:06 +0000 Subject: [gpfsug-discuss] SSUG Events 2020 update Message-ID: <780D9B15-E329-45B7-B62E-1F880512CE7E@spectrumscale.org> Hi All, I thought it might be giving a little bit of an update on where we are with events this year. As you may know, SCAsia was cancelled in its entirety due to Covid-19 in Singapore and so there was no SSUG meeting. In the US, we struggled to find a venue to host the spring meeting and now time is a little short to arrange something for the end of March planned date. The IBM Spectrum Scale Strategy Days in Germany in March are currently still planned to happen next week. For the UK meeting (May), we haven?t yet opened registration but are planning to do so next week. We currently believe that as an event with 120-130 attendees, this is probably very low risk, but we?ll keep the current government advice under review as we approach the date. I would suggest that if you are planning to travel internationally to the UK event that you delay booking flights/book refundable transport and ensure you have adequate insurance in place in the event we have to cancel the event. For ISC in June, we currently don?t have a date, nor any firm plans to run an event this year. Simon Thompson UK group chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 15:12:31 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 10:12:31 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers Message-ID: Hello Everyone, I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. One of our users run a deep learning application on GPUs, that does the following IO pattern: It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. I observed that the performance varies widely depending on what maxblocksize parameter I use. For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. We have different file system block sizes in our environment ranging from 512K, 1M and 16M. We also use storage clusters and compute clusters design. Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. I am thinking if there is anything else that can do the same job as maxblocksize parameter. I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? Thank you, Lohit From anobre at br.ibm.com Fri Feb 28 17:58:22 2020 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 28 Feb 2020 17:58:22 +0000 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 21:53:25 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 16:53:25 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: <2B1F9901-0712-44EB-9D0A-8B40F7BE58EA@cbio.mskcc.org> Hello Anderson, This application requires minimum throughput of about 10-13MB/s initially and almost no IOPS during first phase where it opens all the files and reads the headers and about 30MB/s throughput during the second phase. The issue that I face is during the second phase where it tries to randomly read about 4K of block size from random files from 20000 to about 100000. In this phase - I see a big difference in maxblocksize parameter changing the performance of the reads, with almost no throughput and may be around 2-4K IOPS. This issue is a follow up to the previous issue that I had mentioned about an year ago - where I see differences in performance - ?though there is practically no IO to the storage? I mean - I see a difference in performance between different FS block-sizes even if all data is cached in pagepool. Sven had replied to that thread mentioning that it could be because of buffer locking issue. The info requested is as below: 4 Storage clusters: Storage cluster for compute: 5.0.3-2 GPFS version FS version: 19.01 (5.0.1.0) Subblock size: 16384 Blocksize : 16M Flash Storage Cluster for compute: 5.0.4-2 GPFS version FS version: 18.00 (5.0.0.0) Subblock size: 8192 Blocksize: 512K Storage cluster for admin tools: 5.0.4-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 131072 Blocksize: 4M Storage cluster for archival: 5.0.3-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 32K Blocksize: 1M The only two clusters that users do/will do compute on is the 16M filesystem and the 512K Filesystem. When you ask what is the throughput/IOPS and block size - it varies a lot and has not been recorded. The 16M FS is capable of doing about 27GB/s seq read for about 1.8 PB of storage. The 512K FS is capable of doing about 10-12GB/s seq read for about 100T of storage. Now as I mentioned previously - the issue that I am seeing has been related to different FS block sizes on the same storage. For example: On the Flash Storage cluster: Block size of 512K with maxblocksize of 16M gives worse performance than Block size of 512K with maxblocksize of 512K. It is the maxblocksize that is affecting the performance, on the same storage with same block size and everything else being the same. I am thinking the above is because of the number of buffers involved, but would like to learn if it happens to be anything else. I have debugged the same with IBM GPFS techs and it has been found that there is no issue with the storage itself or any of the other GPFS tuning parameters. Now since we do know that maxblocksize is making a big difference. I would like to keep it as low as possible but still be able to mount other remote GPFS filesystems with higher block sizes. Or since it is required to keep the maxblocksize the same across all storage - I would like to know if there is any other parameters that could do the same change as maxblocksize. Thank you, Lohit > On Feb 28, 2020, at 12:58 PM, Anderson Ferreira Nobre wrote: > > Hi Lohit, > > First, a few questions to understand better your problem: > - What is the minimum release level of both clusters? > - What is the version of filesystem layout for 16MB, 1MB and 512KB? > - What is the subblocksize of each filesystem? > - How many IOPS, block size and throughput are you doing on each filesystem? > > Abra?os / Regards / Saludos, > > Anderson Nobre > Power and Storage Consultant > IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services > > > > Phone: 55-19-2132-4317 > E-mail: anobre at br.ibm.com > > > ----- Original message ----- > From: "Valleru, Lohit/Information Systems" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers > Date: Fri, Feb 28, 2020 12:30 > > Hello Everyone, > > I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. > > One of our users run a deep learning application on GPUs, that does the following IO pattern: > > It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. > > When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. > > I observed that the performance varies widely depending on what maxblocksize parameter I use. > For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. > > The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. > > We have different file system block sizes in our environment ranging from 512K, 1M and 16M. > > We also use storage clusters and compute clusters design. > > Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. > > I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. > However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. > > I am thinking if there is anything else that can do the same job as maxblocksize parameter. > > I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. > > May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? > > Thank you, > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Feb 3 08:56:09 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 08:56:09 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered Message-ID: Hello, Does mmshutdown or mmumount trigger a file system log recovery, same as a node failure or daemon crash do? Last week we got this advisory: IBM Spectrum Scale (GPFS) 5.0.4 levels: possible metadata or data corruption during file system log recovery https://www.ibm.com/support/pages/node/1274428?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E You need a file system log recovery running to potentially trigger the issue. When does a file system log recovery run? For sure on any unexpected mmfsd/os crash for mounted filesystems, or on connection loss, but what if we do a clean 'mmshutdown' or 'mmumount' - I assume this will cause the client to nicely finish all outstanding transactions and return the empty logfile, hence non log recovery will take place is we do a normal os shutdown/reboot, too? Or am I wrong and Spectrum Scale treats all cases the same way? I asked because the advisory states that a node reboot will trigger a log recovery - until we upgraded to 5.0.4-2 we'll try to avoid log recoveries: > Log recovery happens after a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot). Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From heinrich.billich at id.ethz.ch Mon Feb 3 10:02:06 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 10:02:06 +0000 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: References: Message-ID: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? Cheers, Heiner From: on behalf of Jan-Frode Myklebust Reply to: gpfsug main discussion list Date: Thursday, 30 January 2020 at 18:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes I *think* this was a known bug in the Power firmware included with 5.3.4, and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. -jf tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward >: Interesting. We just deployed an ESS here and are running into a very similar problem with the gui refresh it appears. Takes my ppc64le's about 45 seconds to run rinv when they are idle. I had just opened a support case on this last evening. We're on ESS 5.3.4 as well. I will wait to see what support says. Ed Wahl Ohio Supercomputer Center -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Ulrich Sibiller Sent: Thursday, January 30, 2020 9:44 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Can I change the times at which the GUI runs HW_INVENTORY and related tasks? > > we frequently get messages like > > gui_refresh_task_failed GUI WARNING 12 hours ago > The following GUI refresh task(s) failed: HW_INVENTORY > > The tasks fail due to timeouts. Running the task manually most times > succeeds. We do run two gui nodes per cluster and I noted that both > servers seem run the HW_INVENTORY at the exact same time which may > lead to locking or congestion issues, actually the logs show messages > like > > EFSSA0194I Waiting for concurrent operation to complete. > > The gui calls ?rinv? on the xCat servers. Rinv for a single > little-endian server takes a long time ? about 2-3 minutes , while it finishes in about 15s for big-endian server. > > Hence the long runtime of rinv on little-endian systems may be an > issue, too > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate xCat/ems server for each type. The GUI nodes are ppc64le. > > We did see this issue with several gpfs version on the gui and with at least two ESS/xCat versions. > > Just to be sure I did purge the Posgresql tables. > > I did try > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are difficult. I have seen the same on ppc64le. From time to time it recovers but then it starts again. The timeouts are okay, it is the hardware. I haven opened a call at IBM and they suggested upgrading to ESS 5.3.5 because of the new firmwares which I am currently doing. I can dig out more details if you want. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Feb 3 10:45:43 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 3 Feb 2020 11:45:43 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: <98640bc8-ecb7-d050-ea38-da47cf1b9ea4@science-computing.de> On 2/3/20 11:02 AM, Billich Heinrich Rainer (ID SD) wrote: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for > 5.3.5 lists FW860.60 again, same as 5.3.4? I have done the upgrade to 5.3.5 last week and gssinstallcheck now reports 860.70: [...] Installed version: 5.3.5-20191205T142815Z_ppc64le_datamanagement [OK] Linux kernel installed: 3.10.0-957.35.2.el7.ppc64le [OK] Systemd installed: 219-67.el7_7.2.ppc64le [OK] Networkmgr installed: 1.18.0-5.el7_7.1.ppc64le [OK] OFED level: MLNX_OFED_LINUX-4.6-3.1.9.1 [OK] IPR SAS FW: 19512300 [OK] ipraid RAID level: 10 [OK] ipraid RAID Status: Optimized [OK] IPR SAS queue depth: 64 [OK] System Firmware: FW860.70 (SV860_205) [OK] System profile setting: scale [OK] System profile verification PASSED. [OK] Host adapter driver: 16.100.01.00 [OK] Kernel sysrq level is: kernel.sysrq = 1 [OK] GNR Level: 5.0.4.1 efix6 [...] Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Feb 3 19:41:31 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 3 Feb 2020 20:41:31 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: I think both 5.3.4.2 and 5.3.5 includes FW860.70, but the readme doesn?t show this correctly. -jf man. 3. feb. 2020 kl. 11:02 skrev Billich Heinrich Rainer (ID SD) < heinrich.billich at id.ethz.ch>: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for > ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? > > > > Cheers, > > > > Heiner > > *From: * on behalf of Jan-Frode > Myklebust > *Reply to: *gpfsug main discussion list > *Date: *Thursday, 30 January 2020 at 18:00 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > > > > > I *think* this was a known bug in the Power firmware included with 5.3.4, > and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. > > > > > > > > -jf > > > > tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward : > > Interesting. We just deployed an ESS here and are running into a very > similar problem with the gui refresh it appears. Takes my ppc64le's about > 45 seconds to run rinv when they are idle. > I had just opened a support case on this last evening. We're on ESS > 5.3.4 as well. I will wait to see what support says. > > Ed Wahl > Ohio Supercomputer Center > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ulrich Sibiller > Sent: Thursday, January 30, 2020 9:44 AM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Can I change the times at which the GUI runs HW_INVENTORY and related > tasks? > > > > we frequently get messages like > > > > gui_refresh_task_failed GUI WARNING 12 hours > ago > > The following GUI refresh task(s) failed: HW_INVENTORY > > > > The tasks fail due to timeouts. Running the task manually most times > > succeeds. We do run two gui nodes per cluster and I noted that both > > servers seem run the HW_INVENTORY at the exact same time which may > > lead to locking or congestion issues, actually the logs show messages > > like > > > > EFSSA0194I Waiting for concurrent operation to complete. > > > > The gui calls ?rinv? on the xCat servers. Rinv for a single > > little-endian server takes a long time ? about 2-3 minutes , while it > finishes in about 15s for big-endian server. > > > > Hence the long runtime of rinv on little-endian systems may be an > > issue, too > > > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a > separate xCat/ems server for each type. The GUI nodes are ppc64le. > > > > We did see this issue with several gpfs version on the gui and with at > least two ESS/xCat versions. > > > > Just to be sure I did purge the Posgresql tables. > > > > I did try > > > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are > difficult. > > > I have seen the same on ppc64le. From time to time it recovers but then it > starts again. The timeouts are okay, it is the hardware. I haven opened a > call at IBM and they suggested upgrading to ESS 5.3.5 because of the new > firmwares which I am currently doing. I can dig out more details if you > want. > > Uli > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart Registernummer/Commercial > Register No.: HRB 382196 _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Feb 6 05:02:29 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 6 Feb 2020 05:02:29 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Feb 8 11:33:21 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 8 Feb 2020 11:33:21 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Message-ID: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is "rule of thumb" that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Sat Feb 8 11:59:54 2020 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Sun, 9 Feb 2020 00:59:54 +1300 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: > Hello! > > We are designing two fs where we cannot anticipate if there will be 3000, > or maybe 5000 or more nodes totally accessing these filesystems > > What we saw, was that execution time of mmdf can last 5-7min > > We openend a case and they said, that during such commands like mmdf or > also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is > the reason why it takes so long > > The technichian also said, that it is ?rule of thumb? that there should be > > (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per > pool ?) > > (also Block size has influence on regions ?) > > > > #mmfsadm saferdump stripe > > Gives the regions number > > storage pools: max 8 > > > > alloc map type 'scatter' > > > > 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 > thinProvision reserved inode -1, reserved nBlocks 0 > > > > *regns 170413* segs 1 size 4096 FBlks 0 MBlks 3145728 subblock > size 8192 > > > > > > > > > > > > We also saw when creating the filesystem with a speciicic (-n) very high > (5000) (where mmdf execution time was some minutes) and then changing (-n) > to a lower value this does not influence the behavior any more > > > > My question is: Is the rule (Number of Nodes)x5000 for number of regios in > a pool an good estimation , > > Is it better to overestimate the number of Nodes (lnger running commands) > or is it unrealistic to get into problems when not reaching the regions > number calculated ? > > > > Does anybody have experience with high number of nodes (>>3000) and how > to design the filesystems for such large clusters ? > > > > Thank you very much in advance ! > > > > > > > > Mit freundlichen Gr??en > *Walter Sklenka* > *Technical Consultant* > > > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sun Feb 9 09:59:32 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sun, 9 Feb 2020 09:59:32 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Feb 10 11:09:56 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 10 Feb 2020 11:09:56 +0000 Subject: [gpfsug-discuss] Spectrum scale yum repos - any chance to the number of repos Message-ID: <1B9A9988-7347-41B4-A881-4300F8F9E5BF@id.ethz.ch> Hello, Does it work to merge ?all? Spectrum Scale rpms of one version in one yum repo, can I merge rpms from different versions in the same repo, even different architectures? Yum repos for RedHat, Suse, Debian or application repos like EPEL all manage to keep many rpms and all different versions in a few repos. Spreading the few Spectrum Scale rpms for rhel across about 11 repos for each architecture and version seems overly complicated ? and makes it difficult to use RedHat Satellite to distribute the software ;-( Does anyone have experiences or opinions with this ?single repo? approach ? Does something break if we use it? We run a few clusters where up to now each runs its own yum server. We want to consolidate with RedHat Satellite for os and scale provisioning/updates. RedHat Satellite having just one repo for _all_ versions would fit much better. And may just separate repos for base (including protocols), object and hdfs (which we don?t use). My wish: The number of repos should no grow with the number of versions provided and adding a new version should not require to setup new yum repos. I know you can workaround and script, but would be easier if I wouldn?t need to. Regards, Heiner From nfalk at us.ibm.com Mon Feb 10 14:57:13 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Mon, 10 Feb 2020 14:57:13 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems Phone: 1-720-349-9538 | Mobile: 1-845-546-4930 E-mail: nfalk at us.ibm.com Find me on: From: Walter Sklenka To: gpfsug main discussion list Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=bgNFbl7WeRbpQtvfu8K1GC1HVGofxoeEehWJXVM6H0c&s=BRQWKQ--3xw8g_2o9-RD-XsRdMon6iIy31iSstzRRAw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Mon Feb 10 18:34:45 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Mon, 10 Feb 2020 18:34:45 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: <92ca7c73eb314667be51d79f97f34c9c@Mail.EDVDesign.cloudia> Hello Nate! Thank you very much for the response Do you know if the rule of thumb for ?enough regions =N*32 per pool And isn?t there an other way to increate the number of regions? (mybe by reducing block-size ? It?s only because the commands excetuin time of a couple of minutes make me nervous , or is the reason more a poor metadata perf for the long running command? But if you say so we will change it to N=5000 Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Nathan Falk Gesendet: Monday, February 10, 2020 3:57 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems ________________________________ Phone:1-720-349-9538| Mobile:1-845-546-4930 E-mail:nfalk at us.ibm.com Find me on:[LinkedIn: https://www.linkedin.com/in/nathan-falk-078ba5125] [Twitter: https://twitter.com/natefalk922] [IBM] From: Walter Sklenka > To: gpfsug main discussion list > Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von:gpfsug-discuss-bounces at spectrumscale.org > Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 -->160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Feb 11 21:44:07 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 11 Feb 2020 16:44:07 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> Message-ID: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From scale at us.ibm.com Wed Feb 12 12:48:42 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 12 Feb 2020 07:48:42 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca><5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: Hi Jaime, When I copy & paste your command to try, this is what I got. /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Date: 02/11/2020 05:26 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=vCTEqk_OPEgrWnqq9bJpzD-pn5QnNNNo3citEqiTsEY&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=76T6OenS_DXfRVD5Xh02vz8qnWOyhmv7yWeawZKYmWA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu Feb 13 19:37:12 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 13 Feb 2020 11:37:12 -0800 Subject: [gpfsug-discuss] NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> Message-ID: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. Thanks, Kristy > On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: > > Thanks for your responses to the poll. > > We?re still working on a venue, but working towards: > > March 30 - New User Day (Tuesday) > April 1&2 - Regular User Group Meeting (Wednesday & Thursday) > > Once it?s confirmed we?ll post something again. > > Best, > Kristy. > >> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >> >> Thank you to the 18 wonderful people who filled out the survey. >> >> However, there are well more than 18 people at any given UG meeting. >> >> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >> >> Happy New Year. >> >> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >> >> Thanks, >> Kristy >> >>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>> >>> Best wishes to all in the new year. >>> >>> -Kristy >>> >>> >>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Fri Feb 14 13:25:08 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Fri, 14 Feb 2020 14:25:08 +0100 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? Message-ID: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From S.J.Thompson at bham.ac.uk Fri Feb 14 14:56:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 14 Feb 2020 14:56:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Fri Feb 14 16:24:40 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 14 Feb 2020 16:24:40 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: Some (perhaps obvious) points to consider: - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. Some thoughts about whether it's appropriate to use NSD servers as sync hosts... - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... - do they have enough available RAM and CPU? - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. Good luck, Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, February 14, 2020 09:57 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? This message was sent by an external party. I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Fri Feb 14 16:13:30 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 14 Feb 2020 16:13:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. Ed -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco Sent: Friday, February 14, 2020 8:25 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ From valdis.kletnieks at vt.edu Fri Feb 14 17:28:27 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Fri, 14 Feb 2020 12:28:27 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: <61512.1581701307@turing-police> On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog I got bit by this when cut-n-pasting from IBM documentation - the problem is that the web version has characters that *look* like the command-line hyphen character but are actually something different. It's the same problem as cut-n-pasting a command line where the command *should* have the standard ascii double-quote, but the webpage has "smart quotes" where there's different open and close quote characters. Just even less visually obvious... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From skylar2 at uw.edu Fri Feb 14 17:24:46 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 14 Feb 2020 17:24:46 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <20200214172446.gwzd332efrkpcuxp@utumno.gs.washington.edu> Our experience matches Ed. I have a vague memory that clients will balance traffic across all NSD servers based on the preferred list for each NSD, whereas NSD servers will just read from each NSD directly. On Fri, Feb 14, 2020 at 04:13:30PM +0000, Wahl, Edward wrote: > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From bhill at physics.ucsd.edu Fri Feb 14 18:10:04 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Fri, 14 Feb 2020 10:10:04 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 Message-ID: Hi All: I'm performing a rolling upgrade of one of our GPFS clusters. This particular cluster has 2 CNFS servers for some of our NFS clients. I wiped one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem mounts fine on the node when I disable CNFS on the node, but with it enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd has started, so it assumes the worst and shuts down the file system (I currently have reboot on failure disabled to debug this). The thing is, it actually does start nfsd processes when running mmstartup on the node. Doing a "ps" shows 32 nfsd threads are running. Below is the CNFS-specific output from an attempt to start the node: CNFS[27243]: Restarting lockd to start grace CNFS[27588]: Enabling 172.16.69.76 CNFS[27694]: Restarting lockd to start grace CNFS[27699]: Starting NFS services CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[27910]: Monitor has started pid=27787 CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it CNFS[28705]: Starting NFS services CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it CNFS[28758]: Starting NFS services CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it CNFS[28816]: Starting NFS services CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server is not running or responding. Node failure initiated as configured. CNFS[28924]: Unexporting all GPFS filesystems Any thoughts? My other CNFS node is handling everything for the time being, thankfully! Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 14 21:09:14 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 14 Feb 2020 21:09:14 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: <072a3754-5160-09da-0c14-54e08ecefef7@strath.ac.uk> On 14/02/2020 16:24, Sanchez, Paul wrote: > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or > sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage > the sync before your cutover so that there is less data movement > required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally > inconsistent) during your rsync, so you should generally sync from a > snapshot on the source. In my experience this causes an rsync to exit with a none zero error code. See later as to why this is useful. Also it will likely have a different mtime that will cause it be resynced on a subsequent run, the final one will be with the file system in a "read only" state. Not necessarily mounted read only but without anything running that might change stuff. [SNIP] > > - If you decide to do a final "offline" sync, you want it to be fast > so users can get back to work sooner, so parallelism is usually a > must. If you have lots of filesets, then that's a convenient way to > split the work. This final "offline" sync is an absolute must, in my experience unless you are able to be rather woolly about preserving data. > > - If you have any filesets with many more inodes than the others, > keep in mind that those will likely take the longest to complete. > Indeed. We found last time that we did an rsync which was for a HPC system from the put of woe that is Lustre to GPFS there was huge mileage to be hand from telling users that they would get on the new system once their data was synced, it would be done on a "per user" basis with the priority given to the users with a combination of the smallest amount of data and the smallest number of files. Did unbelievable wonders for the users to clean up their files. One user went from over 17 million files to under 50 thousand! The amount of data needing syncing nearly halved. It shrank to ~60% of the pre-announcement size. > - Test, test, test. You usually won't get this right on the first go > or know how long a full sync takes without practice. Remember that > you'll need to employ options to delete extraneous files on the > target when you're syncing over the top of a previous attempt, since > files intentionally deleted on the source aren't usually welcome if > they reappear after a migration. > rsync has a --delete option for that. I am going to add that if you do any sort of ILM/HSM then an rsync is going to destroy you ability to identify old files that have not been accessed, as the rsync will up date the atime of everything (don't ask how I know). If you have a backup (of course you do) I would strongly recommend considering getting your first "pass" from a restore. Firstly it won't impact the source file system while it is still in use and second it allows you to check your backup actually works :-) Finally when rsyncing systems like this I use a Perl script with an sqlite DB. Basically a list of directories to sync, you can have both source and destination to make wonderful things happen if wanted, along with a flag field. The way I use that is -1 means not synced, -2 means the folder in question is currently been synced, and anything else is the exit code of rsync. If you write the Perl script correctly you can start it on any number of nodes, just dump the sqlite DB on a shared folder somewhere (either the source or destination file systems work well here). If you are doing it in parallel record the node which did the rsync as well it can be useful in finding any issues in my experience. Once everything is done you can quickly check the sqlite DB for none zero flag fields to find out what if anything has failed, which gives you the confidence that your sync has completed accurately. Also any flag fields less than zero show you it's not finished. Finally you might want to record the time each individual rsync took, it's handy for working out that ordering I mentioned :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chris.schlipalius at pawsey.org.au Fri Feb 14 22:47:00 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 15 Feb 2020 06:47:00 +0800 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <168C52FC-4942-4D66-8762-EAEFC4655021@pawsey.org.au> We have used DCP for this, with mmdsh as DCP is MPI and multi node with auto resume. You can also customise threads numbers etc. DDN in fact ran it for us first on our NSD servers for a multi petabyte migration project. It?s in git. For client side, we recommend and use bbcp, our users use this to sync data. It?s fast and reliable and supports resume also. If you do use rsync, as suggested, do dryruns and then a sync and then final copy, as is often run on Isilons to keep geographically separate Isilons in sync. Newest version of rsync also. Regards, Chris Schlipalius Team Lead Data and Storage The Pawsey Supercomputing Centre Australia > On 15 Feb 2020, at 1:28 am, gpfsug-discuss-request at spectrumscale.org wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. naive question about rsync: run it on a client or on NSD > server? (Giovanni Bracco) > 2. Re: naive question about rsync: run it on a client or on NSD > server? (Simon Thompson) > 3. Re: naive question about rsync: run it on a client or on NSD > server? (Sanchez, Paul) > 4. Re: naive question about rsync: run it on a client or on NSD > server? (Wahl, Edward) > 5. Re: mmbackup [--tsm-servers TSMServer[, TSMServer...]] > (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 14 Feb 2020 14:25:08 +0100 > From: Giovanni Bracco > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > > > ------------------------------ > > Message: 2 > Date: Fri, 14 Feb 2020 14:56:30 +0000 > From: Simon Thompson > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: <404B2B75-C094-43CC-9146-C00410F31578 at bham.ac.uk> > Content-Type: text/plain; charset="utf-8" > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Message: 3 > Date: Fri, 14 Feb 2020 16:24:40 +0000 > From: "Sanchez, Paul" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. > > - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) > > - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. > > - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. > > - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. > > - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. > > > Some thoughts about whether it's appropriate to use NSD servers as sync hosts... > > - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... > - do they have enough available RAM and CPU? > - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? > - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? > > - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. > > Good luck, > Paul > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson > Sent: Friday, February 14, 2020 09:57 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > This message was sent by an external party. > > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------ > > Message: 4 > Date: Fri, 14 Feb 2020 16:13:30 +0000 > From: "Wahl, Edward" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ > > > ------------------------------ > > Message: 5 > Date: Fri, 14 Feb 2020 12:28:27 -0500 > From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" > To: gpfsug main discussion list > Cc: Marc A Kaplan > Subject: Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, > TSMServer...]] > Message-ID: <61512.1581701307 at turing-police> > Content-Type: text/plain; charset="utf-8" > > On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > >> # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog > > I got bit by this when cut-n-pasting from IBM documentation - the problem is that > the web version has characters that *look* like the command-line hyphen character > but are actually something different. > > It's the same problem as cut-n-pasting a command line where the command > *should* have the standard ascii double-quote, but the webpage has "smart quotes" > where there's different open and close quote characters. Just even less visually > obvious... > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 832 bytes > Desc: not available > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 97, Issue 12 > ********************************************** From mnaineni at in.ibm.com Sat Feb 15 10:03:20 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Sat, 15 Feb 2020 10:03:20 +0000 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Sun Feb 16 18:19:00 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 10:19:00 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Hi Malahal: Just to clarify, are you saying that on your VM pidof is missing? Or that it is there and not working as it did prior to RHEL/CentOS 8? pidof is returning pid numbers on my system. I've been looking at the mmnfsmonitor script and trying to see where the check for nfsd might be failing, but I've not been able to figure it out yet. Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni wrote: > I am not familiar with CNFS but looking at git source seems to indicate > that it uses 'pidof' to check if a program is running or not. "pidof nfsd" > works on RHEL7.x but it fails on my centos8.1 I just created. So either we > need to make sure pidof works on kernel threads or fix CNFS scripts. > > Regards, Malahal. > > > ----- Original message ----- > From: Bryan Hill > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from > 4.2.3.11 to 5.0.4.2 > Date: Fri, Feb 14, 2020 11:40 PM > > Hi All: > > I'm performing a rolling upgrade of one of our GPFS clusters. This > particular cluster has 2 CNFS servers for some of our NFS clients. I wiped > one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem > mounts fine on the node when I disable CNFS on the node, but with it > enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd > has started, so it assumes the worst and shuts down the file system (I > currently have reboot on failure disabled to debug this). The thing is, it > actually does start nfsd processes when running mmstartup on the node. > Doing a "ps" shows 32 nfsd threads are running. > > Below is the CNFS-specific output from an attempt to start the node: > > CNFS[27243]: Restarting lockd to start grace > CNFS[27588]: Enabling 172.16.69.76 > CNFS[27694]: Restarting lockd to start grace > CNFS[27699]: Starting NFS services > CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[27910]: Monitor has started pid=27787 > CNFS[28702]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28705]: Starting NFS services > CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28755]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28758]: Starting NFS services > CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28813]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28816]: Starting NFS services > CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28867]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS > server is not running or responding. Node failure initiated as configured. > CNFS[28924]: Unexporting all GPFS filesystems > > Any thoughts? My other CNFS node is handling everything for the time > being, thankfully! > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Mon Feb 17 02:56:24 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 18:56:24 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Ah wait, I see what you might mean. pidof works but not specifically for processes like nfsd. That is odd. Thanks, Bryan On Sun, Feb 16, 2020 at 10:19 AM Bryan Hill wrote: > Hi Malahal: > > Just to clarify, are you saying that on your VM pidof is missing? Or > that it is there and not working as it did prior to RHEL/CentOS 8? pidof > is returning pid numbers on my system. I've been looking at the > mmnfsmonitor script and trying to see where the check for nfsd might be > failing, but I've not been able to figure it out yet. > > > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > > > On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni > wrote: > >> I am not familiar with CNFS but looking at git source seems to indicate >> that it uses 'pidof' to check if a program is running or not. "pidof nfsd" >> works on RHEL7.x but it fails on my centos8.1 I just created. So either we >> need to make sure pidof works on kernel threads or fix CNFS scripts. >> >> Regards, Malahal. >> >> >> ----- Original message ----- >> From: Bryan Hill >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from >> 4.2.3.11 to 5.0.4.2 >> Date: Fri, Feb 14, 2020 11:40 PM >> >> Hi All: >> >> I'm performing a rolling upgrade of one of our GPFS clusters. This >> particular cluster has 2 CNFS servers for some of our NFS clients. I wiped >> one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem >> mounts fine on the node when I disable CNFS on the node, but with it >> enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd >> has started, so it assumes the worst and shuts down the file system (I >> currently have reboot on failure disabled to debug this). The thing is, it >> actually does start nfsd processes when running mmstartup on the node. >> Doing a "ps" shows 32 nfsd threads are running. >> >> Below is the CNFS-specific output from an attempt to start the node: >> >> CNFS[27243]: Restarting lockd to start grace >> CNFS[27588]: Enabling 172.16.69.76 >> CNFS[27694]: Restarting lockd to start grace >> CNFS[27699]: Starting NFS services >> CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[27910]: Monitor has started pid=27787 >> CNFS[28702]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28705]: Starting NFS services >> CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28755]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28758]: Starting NFS services >> CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28813]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28816]: Starting NFS services >> CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28867]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS >> server is not running or responding. Node failure initiated as configured. >> CNFS[28924]: Unexporting all GPFS filesystems >> >> Any thoughts? My other CNFS node is handling everything for the time >> being, thankfully! >> >> Thanks, >> Bryan >> >> --- >> Bryan Hill >> Lead System Administrator >> UCSD Physics Computing Facility >> >> 9500 Gilman Dr. # 0319 >> La Jolla, CA 92093 >> +1-858-534-5538 >> bhill at ucsd.edu >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Feb 17 08:02:19 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 17 Feb 2020 08:02:19 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CNFS_issue_after_upgrading_from_4=2E2?= =?utf-8?b?LjMuMTEgdG8JNS4wLjQuMg==?= In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From rp2927 at gsb.columbia.edu Mon Feb 17 18:42:51 2020 From: rp2927 at gsb.columbia.edu (Popescu, Razvan) Date: Mon, 17 Feb 2020 18:42:51 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients Message-ID: Hi, Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. My questions are: 1. Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) 2. Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? 3. Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). Thanks!! Razvan N. Popescu Research Computing Director Office: (212) 851-9298 razvan.popescu at columbia.edu Columbia Business School At the Very Center of Business -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Feb 17 18:57:47 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 17 Feb 2020 18:57:47 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients In-Reply-To: References: Message-ID: We do this. We provision only the GPFS key files ? /var/mmfs/ssl/stage/genkeyData* ? and the appropriate SSH key files needed, and use the following systemd override to the mmsdrserv.service. Where is the appropriate place to do that override will depend on your version of GFPS somewhat as the systemd setup for GPFS has changed in 5.x, but I?ve rigged this up for any of the 4.x and 5.x that exist so far if you need pointers. We use CentOS, FYI, but I don?t think any of this should be different on Debian; our current version of GPFS on nodes where we do this is 5.0.4-1: [root at master ~]# wwsh file print mmsdrserv-override.conf #### mmsdrserv-override.conf ################################################## mmsdrserv-override.conf: ID = 1499 mmsdrserv-override.conf: NAME = mmsdrserv-override.conf mmsdrserv-override.conf: PATH = /etc/systemd/system/mmsdrserv.service.d/override.conf mmsdrserv-override.conf: ORIGIN = /root/clusters/amarel/mmsdrserv-override.conf mmsdrserv-override.conf: FORMAT = data mmsdrserv-override.conf: CHECKSUM = ee7c28f0eee075a014f7a1a5add65b1e mmsdrserv-override.conf: INTERPRETER = UNDEF mmsdrserv-override.conf: SIZE = 210 mmsdrserv-override.conf: MODE = 0644 mmsdrserv-override.conf: UID = 0 mmsdrserv-override.conf: GID = 0 [root at master ~]# wwsh file show mmsdrserv-override.conf [Unit] After=sys-subsystem-net-devices-ib0.device [Service] ExecStartPre=/usr/lpp/mmfs/bin/mmsdrrestore -p $SERVER -R /usr/bin/scp ExecStartPre=/usr/lpp/mmfs/bin/mmauth genkey propagate -N %{NODENAME}-ib0 ?where $SERVER above has been changed for this e-mail; the actual override file contains the hostname of our cluster manager, or other appropriate config server. %{NODENAME} is filled in by Warewulf, which is our cluster manager, and will contain any given node?s short hostname. I?ve since found that we can also set an object that I could use to make the first line include %{CLUSTERMGR} or other arbitrary variable and make this file more cluster-agnostic, but we just haven?t done that yet. Other than that, we build/install the appropriate gpfs.gplbin- RPM, which we build by doing ? on a node with an identical OS ? or you can manually modify the config and have the appropriate kernel source handy: "cd /usr/lpp/mmfs/src; make Autoconfig; make World; make rpm?. You?d do make deb instead. Also obviously installed is the rest of GPFS and you join the node to the cluster while it?s booted up one of the times. Warewulf starts a node off with a nearly empty /var, so anything we need to be in there has to be populated on boot. It?s required a little tweaking from time to time on OS upgrades or GPFS upgrades, but other than that, we?ve been running clusters like this without incident for years. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 17, 2020, at 1:42 PM, Popescu, Razvan wrote: > > Hi, > > Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). > > I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) > > I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. > > I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. > > My questions are: > ? Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) > ? Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? > ? Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). > > Thanks!! > > Razvan N. Popescu > Research Computing Director > Office: (212) 851-9298 > razvan.popescu at columbia.edu > > Columbia Business School > At the Very Center of Business > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.turner at ed.ac.uk Tue Feb 18 09:28:31 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:28:31 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space Message-ID: Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Tue Feb 18 09:36:57 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 18 Feb 2020 09:36:57 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.turner at ed.ac.uk Tue Feb 18 09:41:24 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:41:24 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: No, we weren?t using snapshots. This is from a location I have just moved from so I can?t do any active investigation now, but I am curious. In the end we had a power outage and the system was fine on reboot. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Luis Bolinches Sent: 18 February 2020 09:37 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space Hi Do you have snapshots? -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous ----- Original message ----- From: TURNER Aaron > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] Odd behaviour with regards to reported free space Date: Tue, Feb 18, 2020 11:28 Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Feb 18 10:50:10 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 18 Feb 2020 10:50:10 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did > not fall. There were not obviously any waiters. Has anyone seen > anything similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.turner at ed.ac.uk Tue Feb 18 11:05:41 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 11:05:41 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Dear Jonathan, This is what I had assumed was the case. Since the system ended up with an enforced reboot before we had time for further investigation I wasn't able to confirm this. > I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. Yes, that's very likely to have been the case. Regards Aaron Turner -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 18 February 2020 10:50 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did not > fall. There were not obviously any waiters. Has anyone seen anything > similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at pixitmedia.com Tue Feb 18 13:30:14 2020 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 18 Feb 2020 13:30:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale Jobs Message-ID: ArcaStream/Pixit Media are hiring! We?re on the hunt for Senior Systems Architects, Systems Engineers and DevOps Engineers to be part of our amazing growth in North America. Do you believe that coming up with innovative ways of solving complex workflow challenges is the truth path to storage happiness? Does the thought of knowing you played a small role in producing a blockbuster film, saving lives by reducing diagnosis times, or even discovering new planets excite you? Have you ever thought ?wouldn?t it be cool if?? while working with Spectrum Scale but never had the sponsorship or time to implement it? Do you want to make a lasting legacy of your awesome skills by building software defined solutions that will be used by hundreds of customers, doing thousands of amazing things? Do you have solid Spectrum Scale experience in either a deployment, development, architectural, support or sales capacity? Do you enjoy taking complex concepts and communicating them in a way that is easy for anyone to understand? If the answers to the above are ?yes?, we?d love to hear from you! Send us your CV/Resume to careers at arcastream.com to find out more information and let us know what your ideal position is! Regards, Barry Evans Chief Innovation Officer/Co-Founder Pixit Media/ArcaStream http://pixitmedia.com http://arcastream.com http://arcapix.com -- ? This email is confidential in that it is? intended for the exclusive attention of?the addressee(s) indicated. If you are?not the intended recipient, this email?should not be read or disclosed to?any other person. Please notify the?sender immediately and delete this? email from your computer system.?Any opinions expressed are not?necessarily those of the company?from which this email was sent and,?whilst to the best of our knowledge no?viruses or defects exist, no?responsibility can be accepted for any?loss or damage arising from its?receipt or subsequent use of this?email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Tue Feb 18 17:37:41 2020 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 18 Feb 2020 11:37:41 -0600 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Deleting a file is a two stage process. The original user thread unlinks the file from the directory and reduces the link count. If the count is zero and the file is not open, then it gets queued for the background deletion thread. The background thread then deletes the blocks and frees the space. If there is a snapshot, the data blocks may be captured and not actually freed. After a crash, the recovery code looks for files that were being deleted and restarts the deletion if necessary. -Wayne gpfsug-discuss-bounces at spectrumscale.org wrote on 02/18/2020 06:05:41 AM: > From: TURNER Aaron > To: gpfsug main discussion list > Date: 02/18/2020 06:05 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Odd behaviour with regards > to reported free space > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Dear Jonathan, > > This is what I had assumed was the case. Since the system ended up > with an enforced reboot before we had time for further investigation > I wasn't able to confirm this. > > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > Yes, that's very likely to have been the case. > > Regards > > Aaron Turner > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: 18 February 2020 10:50 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reportedfree space > > On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > > Dear All, > > > > This has happened more than once with both 4.2.3 and 5.0. The > > instances may not be related. > > > > In the first instance, usage was high (over 90%) and so users were > > encouraged to delete files. One user deleted a considerable number of > > files equal to around 10% of the total storage. Reported usage did not > > fall. There were not obviously any waiters. Has anyone seen anything > > similar? > > > > I have seen similar behaviour a number of times. > > I my experience it is because a process somewhere has an open file > handle on one or more files/directories. So you can delete the file > and it goes from a directory listing; it's no long visible when you do ls. > > However the file has not actually gone, and will continue to count > towards total file system usage, user/group/fileset quota's etc. > > Once the errant process is found and killed magically the space becomes free. > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Feb 19 15:24:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Feb 2020 15:24:42 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) Message-ID: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Feb 19 18:49:51 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 19 Feb 2020 10:49:51 -0800 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> Message-ID: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. If you are willing to host a UG meeting later this year, please let us know. Best, Kristy > On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: > > All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. > > Thanks, > Kristy > > >> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose > wrote: >> >> Thanks for your responses to the poll. >> >> We?re still working on a venue, but working towards: >> >> March 30 - New User Day (Tuesday) >> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >> >> Once it?s confirmed we?ll post something again. >> >> Best, >> Kristy. >> >>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >>> >>> Thank you to the 18 wonderful people who filled out the survey. >>> >>> However, there are well more than 18 people at any given UG meeting. >>> >>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>> >>> Happy New Year. >>> >>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>> >>> Thanks, >>> Kristy >>> >>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>>> >>>> Hello, >>>> >>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>> >>>> Best wishes to all in the new year. >>>> >>>> -Kristy >>>> >>>> >>>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Feb 19 19:31:36 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 19 Feb 2020 19:31:36 +0000 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> Message-ID: I believe we could do it at Rutgers in either Newark or New Brunswick. I?m not sure if that meets most people?s definitions for NYC-area, but I do consider Newark to be. Both are fairly easily accessible by public transportation (and about as close to midtown as some uptown location choices anyway). We had planned to attend the 4/1-2 meeting. Not sure what?s involved to know whether keeping the 4/1-2 date is a viable option if we were able to host. We?d have to make sure we didn?t run afoul of any vendor-ethics guidelines. We recently hosted Ray Paden for a GPFS day, though. We had some trouble with remote participation, but that could be dealt with and I actually don?t think these meetings have that as an option anyway. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 19, 2020, at 1:49 PM, Kristy Kallback-Rose wrote: > > I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. > > If you are willing to host a UG meeting later this year, please let us know. > > Best, > Kristy > >> On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: >> >> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. >> >> Thanks, >> Kristy >> >> >>> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: >>> >>> Thanks for your responses to the poll. >>> >>> We?re still working on a venue, but working towards: >>> >>> March 30 - New User Day (Tuesday) >>> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >>> >>> Once it?s confirmed we?ll post something again. >>> >>> Best, >>> Kristy. >>> >>>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose wrote: >>>> >>>> Thank you to the 18 wonderful people who filled out the survey. >>>> >>>> However, there are well more than 18 people at any given UG meeting. >>>> >>>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>>> >>>> Happy New Year. >>>> >>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>>> Thanks, >>>> Kristy >>>> >>>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose wrote: >>>>> >>>>> Hello, >>>>> >>>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>>> >>>>> Best wishes to all in the new year. >>>>> >>>>> -Kristy >>>>> >>>>> >>>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Feb 19 19:58:59 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 19 Feb 2020 19:58:59 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: I?m extremely curious as to this answer as well. At one point a while back I started looking into this via the KMIP side with things, but ran out of time to continue. http://docs.oasis-open.org/kmip/testcases/v1.4/kmip-testcases-v1.4.html http://docs.oasis-open.org/kmip/testcases/v1.4/cnprd01/test-cases/kmip-v1.4/ Ed From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, February 19, 2020 10:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Feb 19 22:07:50 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Feb 2020 22:07:50 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Feb 19 23:34:37 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 19 Feb 2020 15:34:37 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? Thanks, Renata From YARD at il.ibm.com Thu Feb 20 06:46:17 2020 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 20 Feb 2020 08:46:17 +0200 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: Hi Also in case that u configure 3 SKLM servers (1 Primary - 2 Slaves, in case the Primary is not responding you will see in the logs this messages: Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: "Felipe Knop" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 20/02/2020 00:08 Subject: [EXTERNAL] Re: [gpfsug-discuss] Encryption - checking key server health (SKLM) Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, Scale does not yet have a tool to perform a health-check on a key server, or an independent mechanism to retrieve keys. One can use a command such as 'mmkeyserv key show' to retrieve the list of keys from a given SKLM server (and use that to determine whether the key server is responsive), but being able to retrieve a list of keys does not necessarily mean being able to retrieve the actual keys, as the latter goes through the KMIP port/protocol, and the former uses the REST port/API: # mmkeyserv key show --server 192.168.105.146 --server-pwd /tmp/configKeyServ_pid11403914_keyServPass --tenant sklm3Tenant KEY-ad4f3a9-01397ebf-601b-41fb-89bf-6c4ac333290b KEY-ad4f3a9-019465da-edc8-49d4-b183-80ae89635cbc KEY-ad4f3a9-0509893d-cf2a-40d3-8f79-67a444ff14d5 KEY-ad4f3a9-08d514af-ebb2-4d72-aa5c-8df46fe4c282 KEY-ad4f3a9-0d3487cb-a674-44ab-a7d0-1f68e86e2fc9 [...] Having a tool that can retrieve keys independently from mmfsd would be useful capability to have. Could you submit an RFE to request such function? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Encryption - checking key server health (SKLM) Date: Wed, Feb 19, 2020 11:35 AM I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=ARpfta6x0GFP8yy67RAuT4SMBrRHROGRUwCOSPVDEF8&s=aMBH47I25734lVmyzTZBiPd6a1ELRuurxoFCTf6Ij_Y&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 11736 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 10:33:57 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 10:33:57 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: Message-ID: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Feb 20 10:41:17 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 10:41:17 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> Message-ID: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Well, if you were buying some form of extended Life Support for Scale, then you might also be expecting to buy extended life for RedHat. RHEL6 has extended life support until June 2024. Sure its an add on subscription cost, but some people might be prepared to do that over OS upgrades. Simon ?On 20/02/2020, 10:34, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Feb 20 11:23:52 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 11:23:52 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> On 20/02/2020 10:41, Simon Thompson wrote: > Well, if you were buying some form of extended Life Support for > Scale, then you might also be expecting to buy extended life for > RedHat. RHEL6 has extended life support until June 2024. Sure its an > add on subscription cost, but some people might be prepared to do > that over OS upgrades. I would recommend anyone going down that to route to take a *very* close look at what you get for the extended support. Not all of the OS is supported, with large chunks being moved to unsupported even if you pay for the extended support. Consequently extended support is not suitable for HPC usage in my view, so start planning the upgrade now. It's not like you haven't had 10 years notice. If your GPFS is just a storage thing serving out on protocol nodes, upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. It's a relatively easy invisible to the users upgrade. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From knop at us.ibm.com Thu Feb 20 13:27:47 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 20 Feb 2020 13:27:47 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 14:17:58 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 14:17:58 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> To reiterate what?s been said on this thread, and to reaffirm the official IBM position: * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. EUS for Scale is no-charge, it is included as a standard part of your S&S. Regards, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_2106701756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From stockf at us.ibm.com Thu Feb 20 14:34:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 20 Feb 2020 14:34:49 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Feb 20 15:19:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 15:19:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220151909.7rbljupfl27whdtu@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 11:23:52AM +0000, Jonathan Buzzard wrote: > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. I agree, we're having increasing difficulty running CentOS 6, not because of the lack of support from IBM/RedHat, but because the software our customers want to run has started depending on OS features that simply don't exist in CentOS 6. In particular, modern gcc and glibc, and containers are all features that many of our customers are expecting that we provide. The newer kernel available in CentOS 7 (and now 8) supports large numbers of CPUs and large amounts of memory far better than the ancient CentOS 6 kernel as well. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From renata at slac.stanford.edu Thu Feb 20 15:58:08 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 07:58:08 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> References: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> Message-ID: Thanks very much for your response Carl, this is the information I was looking for. Renata On Thu, 20 Feb 2020, Carl Zetie - carlz at us.ibm.com wrote: >To reiterate what?s been said on this thread, and to reaffirm the official IBM position: > > > * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. > * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). > * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. > > >I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. > >The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. > >EUS for Scale is no-charge, it is included as a standard part of your S&S. > > >Regards, > > > >Carl Zetie >Program Director >Offering Management >Spectrum Scale & Spectrum Discover >---- >(919) 473 3318 ][ Research Triangle Park >carlz at us.ibm.com > >[signature_2106701756] > > > From hpc.ken.tw25qn at gmail.com Thu Feb 20 16:29:40 2020 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Thu, 20 Feb 2020 16:29:40 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: Fred, It may be that some HPC users "have to" reverify the results of their computations as being exactly the same as a previous software stack and that is not a minor task. Any change may require this verification process..... Ken Atkjnson On Thu, 20 Feb 2020, 14:35 Frederick Stock, wrote: > This is a bit off the point of this discussion but it seemed like an > appropriate context for me to post this question. IMHO the state of > software is such that it is expected to change rather frequently, for > example the OS on your laptop/tablet/smartphone and your web browser. It > is correct to say those devices are not running an HPC or enterprise > environment but I mention them because I expect none of us would think of > running those devices on software that is a version far from the latest > available. With that as background I am curious to understand why folks > would continue to run systems on software like RHEL 6.x which is now two > major releases(and many years) behind the current version of that product? > Is it simply the effort required to upgrade 100s/1000s of nodes and the > disruption that causes, or are there other factors that make keeping > current with OS releases problematic? I do understand it is not just a > matter of upgrading the OS but all the software, like Spectrum Scale, that > runs atop that OS in your environment. While they all do not remain in > lock step I would think that in some window of time, say 12-18 months > after an OS release, all software in your environment would support a > new/recent OS release that would technically permit the system to be > upgraded. > > I should add that I think you want to be on or near the latest release of > any software with the presumption that newer versions should be an > improvement over older versions, albeit with the usual caveats of new > defects. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 16:41:59 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 16:41:59 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <50DD3E29-5CDC-4FCB-9080-F39DE4532761@us.ibm.com> Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From renata at slac.stanford.edu Thu Feb 20 16:57:47 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 08:57:47 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: Hi Frederick, ours is a physics research lab with a mix of new eperiments and ongoing research. While some users embrace and desire the latest that tech has to offer and are actively writing code to take advantage of it, we also have users running older code on data from older experiments which depends on features of older OS releases and they are often not the ones who wrote the code. We have a mix of systems to accomodate both groups. Renata On Thu, 20 Feb 2020, Frederick Stock wrote: >This is a bit off the point of this discussion but it seemed like an appropriate context for me to post this question.? IMHO the state of software is such that >it is expected to change rather frequently, for example the OS on your laptop/tablet/smartphone and your web browser.? It is correct to say those devices are >not running an HPC or enterprise environment but I mention them because I expect none of us would think of running those devices on software that is a version >far from the latest available.? With that as background I am curious to understand why folks would continue to run systems on software like RHEL 6.x which is >now two major releases(and many years) behind the current version of that product?? Is it simply the effort required to upgrade 100s/1000s of nodes and the >disruption that causes, or are there other factors that make keeping current with OS releases problematic?? I do understand it is not just a matter of upgrading >the OS but all the software, like Spectrum Scale, that runs atop that OS in your environment.? While they all do not remain in lock step I would? think that in >some window of time, say 12-18 months after an OS release, all software in your environment would support a new/recent OS release that would technically permit >the system to be upgraded. >? >I should add that I think you want to be on or near the latest release of any software with the presumption that newer versions should be an improvement over >older versions, albeit with the usual caveats of new defects. > >Fred >__________________________________________________ >Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >stockf at us.ibm.com >? >? > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > ? On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard ? ? ? ? ? ? ? ? ? ? ? ? Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > >? > > > From skylar2 at uw.edu Thu Feb 20 16:59:53 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 16:59:53 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > Fred, > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... > Ken Atkjnson We have this problem too, but at the same time the same people require us to run supported software and remove software versions with known vulnerabilities. The compromise we've worked out for the researchers is to have them track which software versions they used for a particular run/data release. The researchers who care more will have a validation suite that will (hopefully) call out problems as we do required upgrades. At some point, it's simply unrealistic to keep legacy systems around, though we do have a lab that needs a Solaris/SPARC system just to run a 15-year-old component of a pipeline for which they don't have source code... -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From malone12 at illinois.edu Thu Feb 20 17:00:46 2020 From: malone12 at illinois.edu (Maloney, J.D.) Date: Thu, 20 Feb 2020 17:00:46 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <2D960263-2CF3-4834-85CE-EB0F977169CB@illinois.edu> I assisted in a migration a couple years ago when we pushed teams to RHEL 7 and the science pipeline folks weren?t really concerned with the version of Scale we were using, but more what the new OS did to their code stack with the newer version of things like gcc and other libraries. They ended up re-running pipelines from prior data releases to compare the outputs of the pipelines to make sure they were within tolerance and matched prior results. Best, J.D. Maloney HPC Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) From: on behalf of "Carl Zetie - carlz at us.ibm.com" Reply-To: gpfsug main discussion list Date: Thursday, February 20, 2020 at 10:42 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From david_johnson at brown.edu Thu Feb 20 17:14:40 2020 From: david_johnson at brown.edu (David Johnson) Date: Thu, 20 Feb 2020 12:14:40 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Instead of keeping whole legacy systems around, could they achieve the same with a container built from the legacy software? > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: >> Fred, >> It may be that some HPC users "have to" >> reverify the results of their computations as being exactly the same as a >> previous software stack and that is not a minor task. Any change may >> require this verification process..... >> Ken Atkjnson > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. The compromise we've worked out for the researchers is to > have them track which software versions they used for a particular run/data > release. The researchers who care more will have a validation suite that > will (hopefully) call out problems as we do required upgrades. > > At some point, it's simply unrealistic to keep legacy systems around, > though we do have a lab that needs a Solaris/SPARC system just to run a > 15-year-old component of a pipeline for which they don't have source code... > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Thu Feb 20 17:20:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 17:20:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Message-ID: <20200220172009.gtkek3nlohathrro@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 12:14:40PM -0500, David Johnson wrote: > Instead of keeping whole legacy systems around, could they achieve the same > with a container built from the legacy software? That is our hope, at least once we can get off CentOS 6 and run containers. :) Though containers aren't quite a panacea; there's still the issue of insecure software being baked into the container, but at least we can limit what the container can access more easily than running outside a container. > > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > >> Fred, > >> It may be that some HPC users "have to" > >> reverify the results of their computations as being exactly the same as a > >> previous software stack and that is not a minor task. Any change may > >> require this verification process..... > >> Ken Atkjnson > > > > We have this problem too, but at the same time the same people require us > > to run supported software and remove software versions with known > > vulnerabilities. The compromise we've worked out for the researchers is to > > have them track which software versions they used for a particular run/data > > release. The researchers who care more will have a validation suite that > > will (hopefully) call out problems as we do required upgrades. > > > > At some point, it's simply unrealistic to keep legacy systems around, > > though we do have a lab that needs a Solaris/SPARC system just to run a > > 15-year-old component of a pipeline for which they don't have source code... > > > > -- > > -- Skylar Thompson (skylar2 at u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Thu Feb 20 19:45:02 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 19:45:02 +0000 Subject: [gpfsug-discuss] Unkillable snapshots Message-ID: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Thu Feb 20 19:46:53 2020 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 20 Feb 2020 19:46:53 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: Move the file system manager :) On Thu, 20 Feb 2020, 19:45 Simon Thompson, wrote: > Hi, > > > We have a snapshot which is stuck in the state "DeleteRequired". When > deleting, it goes through the motions but eventually gives up with: > > Unable to quiesce all nodes; some processes are busy or holding required > resources. > mmdelsnapshot: Command failed. Examine previous error messages to > determine cause. > > And in the mmfslog on the FS manager there are a bunch of retries and > "failure to quesce" on nodes. However in each retry its never the same set > of nodes. I suspect we have one HPC job somewhere killing us. > > > What's interesting is that we can delete other snapshots OK, it appears to > be one particular fileset. > > > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and > waiters around just tend to point to the FS manager node. > > > So ... any suggestions? I'm assuming its some workload holding a lock open > or some such, but tracking it down is proving elusive! > > > Generally the FS is also "lumpy" ... at times it feels like a wifi > connection on a train using a terminal, I guess its all related though. > > > Thanks > > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 20:13:14 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 20:13:14 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 20 20:29:44 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 15:29:44 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: Message-ID: <13747.1582230584@turing-police> On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > Having a tool that can retrieve keys independently from mmfsd would be useful > capability to have. Could you submit an RFE to request such function? Note that care needs to be taken to do this in a secure manner. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ulmer at ulmer.org Thu Feb 20 20:43:11 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 20 Feb 2020 15:43:11 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <13747.1582230584@turing-police> References: <13747.1582230584@turing-police> Message-ID: It seems like this belongs in mmhealth if it were to be bundled. If you need to use a third party tool, maybe fetch a particular key that is only used for fetching, so it?s compromise would represent no risk. -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Feb 20, 2020, at 3:11 PM, Valdis Kl?tnieks wrote: > > ?On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > >> Having a tool that can retrieve keys independently from mmfsd would be useful >> capability to have. Could you submit an RFE to request such function? > > Note that care needs to be taken to do this in a secure manner. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From truston at mbari.org Thu Feb 20 20:43:03 2020 From: truston at mbari.org (Todd Ruston) Date: Thu, 20 Feb 2020 12:43:03 -0800 Subject: [gpfsug-discuss] Policy REGEX question Message-ID: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Greetings, I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') The Scale policy documentation for REGEX only mentions 2 parameters, not 3: REGEX(String,'Pattern') Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). -- Todd E. Ruston Information Systems Manager Monterey Bay Aquarium Research Institute (MBARI) 7700 Sandholdt Road, Moss Landing, CA, 95039 Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 21:26:39 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 16:26:39 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Message-ID: Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=rIyEAXKyzwEj_pyM9DRQ1mL3x5gHjoqSpnhqxP6Oj-8&s=ZRXJm9u1_WLClH0Xua2PeIr-cWHj8YasvQCwndgdyns&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 21:39:10 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 21:39:10 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, Message-ID: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 22:13:56 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 17:13:56 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk><93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Message-ID: Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=eGuD3K3Va_jMinEQHJN-FU1-fi2V-VpqWjHiTVUK-L8&s=fX3QMwGX7-yxSM4VSqPqBUbkT41ntfZFRZnalg9PZBI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:17:41 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:17:41 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: Looking at the example '*/xy_survey_*/name/*.tif': that's not a "real" (POSIX) regular expression but a use of a much simpler "wildcard pattern" as commonly used in the UNIX shell when matching filenames. So I would assume that the 'f' parameter just mandates that REGEX() must apply "filename matching" rules here instead of POSIX regular expressions. makes sense? -- Peter > On Feb 20, 2020, at 21:43, Todd Ruston wrote: > > Greetings, > > I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: > > REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') > > The Scale policy documentation for REGEX only mentions 2 parameters, not 3: > > REGEX(String,'Pattern') > Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. > > (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) > > Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). > > -- > Todd E. Ruston > Information Systems Manager > Monterey Bay Aquarium Research Institute (MBARI) > 7700 Sandholdt Road, Moss Landing, CA, 95039 > Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:25:35 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:25:35 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Sorry, I believe you had nailed it already -- I didn't read carefully to the end. > On Feb 20, 2020, at 23:17, Peter Serocka wrote: > > Looking at the example '*/xy_survey_*/name/*.tif': > that's not a "real" (POSIX) regular expression but a use of > a much simpler "wildcard pattern" as commonly used in the UNIX shell > when matching filenames. > > So I would assume that the 'f' parameter just mandates that > REGEX() must apply "filename matching" rules here instead > of POSIX regular expressions. > > makes sense? > > -- Peter > > >> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >> >> Greetings, >> >> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >> >> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >> >> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >> >> REGEX(String,'Pattern') >> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >> >> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >> >> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >> >> -- >> Todd E. Ruston >> Information Systems Manager >> Monterey Bay Aquarium Research Institute (MBARI) >> 7700 Sandholdt Road, Moss Landing, CA, 95039 >> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 20 22:28:43 2020 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 20 Feb 2020 17:28:43 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad > On Feb 20, 2020, at 5:14 PM, Nathan Falk wrote: > > ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. > > > > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > Hi Nate, > So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? > But yes, essentially running this by hand to clean up. > What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: > rds-projects-b which is mounted as /rds/projects/b > But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: > /rds/projects/g which is a different file-set. > What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? > Does that sound plausible? > Thanks > Simon > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Unkillable snapshots > > Hello Simon, > > Sadly, that "1036" is not a node ID, but just a counter. > > These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. > > Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. > > You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. > > It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. > > Thanks, > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hmm ... mmdiag --tokenmgr shows: > > > Server stats: requests 195417431 ServerSideRevokes 120140 > nTokens 2146923 nranges 4124507 > designated mnode appointed 55481 mnode thrashing detected 1036 > So how do I convert "1036" to a node? > Simon > > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Unkillable snapshots > > Hi, > We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: > > > Unable to quiesce all nodes; some processes are busy or holding required resources. > mmdelsnapshot: Command failed. Examine previous error messages to determine cause. > And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. > What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. > So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! > Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. > Thanks > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 23:38:15 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 23:38:15 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: On 20/02/2020 16:59, Skylar Thompson wrote: [SNIP] > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. For us, it is a Scottish government mandate that all public funded bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days from a critical vulnerability till your patched. No if's no buts, just do it. So while where are not their yet (its a work in progress to make this as seamless as possible) frankly running unpatched systems for years on end because we are too busy/lazy to validate a new system is completely unacceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Fri Feb 21 02:00:59 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 21:00:59 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <36675.1582250459@turing-police> On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: > For us, it is a Scottish government mandate that all public funded > bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days > from a critical vulnerability till your patched. No if's no buts, just > do it. Is that 10 days from vuln dislosure, or from patch availability? The latter can be a headache, especially if 24-48 hours pass between when the patch actually hits the streets and you get the e-mail, or if you have other legal mandates that patches be tested before production deployment. The former is simply unworkable - you *might* be able to deploy mitigations or other work-arounds, but if it's something complicated that requires a lot of re-work of code, you may be waiting a lot more than 10 days for a patch.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From Paul.Sanchez at deshaw.com Fri Feb 21 02:05:12 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 21 Feb 2020 02:05:12 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: <9ca16f7634354e4db8bed681a306b714@deshaw.com> Another possibility is to try increasing the timeouts. We used to have problems with this all of the time on clusters with thousands of nodes, but now we run with the following settings increased from their [defaults]? sqtBusyThreadTimeout [10] = 120 sqtCommandRetryDelay [60] = 120 sqtCommandTimeout [300] = 500 These are in the category of undocumented configurables, so you may wish to accompany this with a PMR. And you?ll need to know the secret handshake that follows this? mmchconfig: Attention: Unknown attribute specified: sqtBusyThreadTimeout. Press the ENTER key to continue. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sven Oehme Sent: Thursday, February 20, 2020 17:29 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots This message was sent by an external party. Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad On Feb 20, 2020, at 5:14 PM, Nathan Falk > wrote: ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 21 11:04:32 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 21 Feb 2020 11:04:32 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <36675.1582250459@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> Message-ID: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> On 21/02/2020 02:00, Valdis Kl?tnieks wrote: > On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: >> For us, it is a Scottish government mandate that all public funded >> bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days >> from a critical vulnerability till your patched. No if's no buts, just >> do it. > > Is that 10 days from vuln dislosure, or from patch availability? > Patch availability. Basically it's a response to the issue a couple of years ago now where large parts of the NHS in Scotland had serious problems due to some Windows vulnerability for which a patch had been available for some months. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From andi at christiansen.xxx Fri Feb 21 13:07:01 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Fri, 21 Feb 2020 14:07:01 +0100 (CET) Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Message-ID: <270013029.95562.1582290421465@privateemail.com> An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Feb 21 14:14:49 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 21 Feb 2020 15:14:49 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES Message-ID: Dear all, I was wondering if anybody recently encountered a similar issue (I found a related thread from 2018, but it was inconclusive). I just found that one of our production CES nodes have 28k CLOSE_WAIT tcp6 connections, I do not understand why... the second node in the same cluster does not have this issue. Both are: - GPFS 5.0.4.2 - RHEL 7.4 has anybody else encountered anything similar? In the last few days it seems it happened once on one node, and twice on the other, but never on both... Thanks for any feedback! cheers leo -- Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truston at mbari.org Fri Feb 21 16:15:54 2020 From: truston at mbari.org (Todd Ruston) Date: Fri, 21 Feb 2020 08:15:54 -0800 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Message-ID: <9E104C63-9C6D-4E46-BEFF-AEF7E1AF8EC9@mbari.org> Thanks Peter, and no worries; great minds think alike. ;-) - Todd > On Feb 20, 2020, at 2:25 PM, Peter Serocka wrote: > > Sorry, I believe you had nailed it already -- I didn't > read carefully to the end. > >> On Feb 20, 2020, at 23:17, Peter Serocka > wrote: >> >> Looking at the example '*/xy_survey_*/name/*.tif': >> that's not a "real" (POSIX) regular expression but a use of >> a much simpler "wildcard pattern" as commonly used in the UNIX shell >> when matching filenames. >> >> So I would assume that the 'f' parameter just mandates that >> REGEX() must apply "filename matching" rules here instead >> of POSIX regular expressions. >> >> makes sense? >> >> -- Peter >> >> >>> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >>> >>> Greetings, >>> >>> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >>> >>> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >>> >>> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >>> >>> REGEX(String,'Pattern') >>> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >>> >>> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >>> >>> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >>> >>> -- >>> Todd E. Ruston >>> Information Systems Manager >>> Monterey Bay Aquarium Research Institute (MBARI) >>> 7700 Sandholdt Road, Moss Landing, CA, 95039 >>> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gterryc at vmsupport.com Fri Feb 21 17:18:11 2020 From: gterryc at vmsupport.com (George Terry) Date: Fri, 21 Feb 2020 11:18:11 -0600 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 Message-ID: Hello, I've a question about upgrade of GPFS 3.5. We have an infrastructure with GSPF 3.5.0.33 and we need upgrade to Spectrum Scale 5.0.3. Can we upgrade from 3.5 to 4.1, 4.2 and 5.0.3 or can we do something additional like unistall GPFS 3.5 and install Spectrum Scale 5.0.3? Thank you George -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Fri Feb 21 17:25:12 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Fri, 21 Feb 2020 19:25:12 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: <270013029.95562.1582290421465@privateemail.com> References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=XKMIdSqQ76jf_FrIRFtAhMsgU-MkPFhxBJjte8AdeYs&s=vih7W_XcatoqN_MhS3gEK9RR6RxpNrfB2UvvQeXqyH8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Feb 21 18:50:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 21 Feb 2020 18:50:49 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 21 21:15:28 2020 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 21 Feb 2020 21:15:28 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Fri Feb 21 23:32:13 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Sat, 22 Feb 2020 00:32:13 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Sat Feb 22 00:08:19 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 22 Feb 2020 00:08:19 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > ?Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > >> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >> >> Hi, >> >> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >> and >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >> >> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >> >> >> >> Regards, >> >> Tomer Perry >> Scalable I/O Development (Spectrum Scale) >> email: tomp at il.ibm.com >> 1 Azrieli Center, Tel Aviv 67021, Israel >> Global Tel: +1 720 3422758 >> Israel Tel: +972 3 9188625 >> Mobile: +972 52 2554625 >> >> >> >> >> From: Andi Christiansen >> To: "gpfsug-discuss at spectrumscale.org" >> Date: 21/02/2020 15:25 >> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi all, >> >> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >> >> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >> >> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >> >> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >> >> Best Regards >> Andi Christiansen _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Feb 22 05:55:54 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 22 Feb 2020 05:55:54 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > ?Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > >>> On 22 Feb 2020, at 09:32, Andi Christiansen wrote: >>> >> ?Hi, >> >> Thanks for answering! >> >> Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. >> >> I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. >> >> Best Regards >> Andi Christiansen >> >> >> >> Sendt fra min iPhone >> >>> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >>> >>> Hi, >>> >>> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >>> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >>> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >>> and >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >>> >>> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >>> >>> >>> >>> Regards, >>> >>> Tomer Perry >>> Scalable I/O Development (Spectrum Scale) >>> email: tomp at il.ibm.com >>> 1 Azrieli Center, Tel Aviv 67021, Israel >>> Global Tel: +1 720 3422758 >>> Israel Tel: +972 3 9188625 >>> Mobile: +972 52 2554625 >>> >>> >>> >>> >>> From: Andi Christiansen >>> To: "gpfsug-discuss at spectrumscale.org" >>> Date: 21/02/2020 15:25 >>> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi all, >>> >>> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >>> >>> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >>> >>> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >>> >>> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >>> >>> Best Regards >>> Andi Christiansen _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Sat Feb 22 09:35:32 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Sat, 22 Feb 2020 11:35:32 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi, Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning . I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Luis Bolinches" To: "gpfsug main discussion list" Cc: Jake Carrol Date: 22/02/2020 07:56 Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers On 22. Feb 2020, at 3.08, Andrew Beattie wrote: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone On 22 Feb 2020, at 09:32, Andi Christiansen wrote: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=vPbqr3ME98a_M4VrB5IPihvzTzG8CQUAuI0eR-kqXcs&s=kIM8S1pVtYFsFxXT3gGQ0DmcwRGBWS9IqtoYTtcahM8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sun Feb 23 04:43:37 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sat, 22 Feb 2020 23:43:37 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> Message-ID: <208376.1582433017@turing-police> On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > > Is that 10 days from vuln dislosure, or from patch availability? > > > > Patch availability. Basically it's a response to the issue a couple of That's not *quite* so bad. As long as you trust *all* your vendors to notify you when they release a patch for an issue you hadn't heard about. (And that no e-mail servers along the way don't file it under 'spam') -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Sun Feb 23 12:20:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 23 Feb 2020 12:20:48 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <208376.1582433017@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> Message-ID: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> On 23/02/2020 04:43, Valdis Kl?tnieks wrote: > On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > >>> Is that 10 days from vuln dislosure, or from patch availability? >>> >> >> Patch availability. Basically it's a response to the issue a couple of > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of any systems administrator to know when any critical patch becomes available for any software/hardware that they are using. To not be actively monitoring it is IMHO a dereliction of duty, worthy of a verbal and then written warning. I also feel that the old practice of leaving HPC systems unpatched for years on end is no longer acceptable. From a personal perspective I have in now over 20 years never had a system that I have been responsible for knowingly compromised. I would like it to stay that way because I have no desire to be explaining to higher ups why the HPC facility was hacked. The fact that the Scottish government have mandated I apply patches just makes my life easier because any push back from the users is killed dead instantly; I have too, go moan at your elective representative if you want it changed. In the meantime suck it up :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Sun Feb 23 21:58:03 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sun, 23 Feb 2020 16:58:03 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> Message-ID: <272151.1582495083@turing-police> On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of > any systems administrator to know when any critical patch becomes > available for any software/hardware that they are using. You missed the point. Unless you spend your time constantly e-mailing *all* of your vendors "Are there new patches I don't know about?", you're relying on them to notify you when there's a known issue, and when a patch comes out. Redhat is good about notification. IBM is. But how about things like your Infiniband stack? OFED? The firmware in all your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you get notified about security issues in the Management Engine stuff (and there's been plenty of them). Do *all* of those vendors have security lists? Are you subscribed to *all* of them? Do *all* of them actually post to those lists? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From andi at christiansen.xxx Mon Feb 24 22:31:45 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Mon, 24 Feb 2020 23:31:45 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi all, Thank you for all your suggestions! The latency is 30ms between the sites (1600km to be exact). So if I have entered correctly in the calculator 1Gb is actually what is expected on that distance. I had a meeting today with IBM where we were able to push that from the 1Gb to about 4Gb on one link with minimal tuning, more tuning will come the next few days! We are also looking to implement the feature afmParallelMounts which should give us the full bandwidth we have between the sites :-) Thanks! Best Regards Andi Christiansen Sendt fra min iPhone > Den 22. feb. 2020 kl. 10.35 skrev Tomer Perry : > > Hi, > > Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). > The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning. > I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ > > The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Luis Bolinches" > To: "gpfsug main discussion list" > Cc: Jake Carrol > Date: 22/02/2020 07:56 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. > > -- > Cheers > > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm- and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Feb 24 23:58:15 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 24 Feb 2020 15:58:15 -0800 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <272151.1582495083@turing-police> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> <272151.1582495083@turing-police> Message-ID: <20200224235815.mjecsge35rqseoq5@hithlum> On Sun, Feb 23, 2020 at 04:58:03PM -0500, Valdis Kl?tnieks wrote: > On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > > you when they release a patch for an issue you hadn't heard about. > > > Er, what do you think I am paid for? Specifically it is IMHO the job of > > any systems administrator to know when any critical patch becomes > > available for any software/hardware that they are using. > > You missed the point. > > Unless you spend your time constantly e-mailing *all* of your vendors > "Are there new patches I don't know about?", you're relying on them to > notify you when there's a known issue, and when a patch comes out. > > Redhat is good about notification. IBM is. > > But how about things like your Infiniband stack? OFED? The firmware in all > your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you > get notified about security issues in the Management Engine stuff (and there's > been plenty of them). Do *all* of those vendors have security lists? Are you > subscribed to *all* of them? Do *all* of them actually post to those lists? We put our notification sources (Nessus, US-CERT, etc.) into our response plan. Of course it's still a problem if we don't get notified, but part of the plan is to make it clear where we're willing to accept risk, and to limit our own liability. No process is going to be perfect, but we at least know and accept where those imperfections are. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From stockf at us.ibm.com Tue Feb 25 14:01:20 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 25 Feb 2020 14:01:20 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Tue Feb 25 20:32:10 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 25 Feb 2020 21:32:10 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: Hi Frederick, thanks for the answer! Unfortunately it seems not the case :( [root at xbl-ces-4 ~]# netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l 0 In our case, Zimon does not directly interact with Grafana over the bridge, but we have a small python script that (through Telegraf) polls the collector and ingest data into InfluxDB, which acts as data source for Grafana. An example of the opened port is: tcp6?????? 1????? 0 129.129.95.84:40038 129.129.99.247:39707??? CLOSE_WAIT? 39131/gpfs.ganesha. We opened a PMR to check what's happening, let's see :) But possibly first thing to do is to disable IPv6 cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 25.02.20 15:01, Frederick Stock wrote: > netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 12:58:40 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 13:58:40 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? Message-ID: <313052288.162314.1582721920742@privateemail.com> An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Feb 26 13:04:52 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 26 Feb 2020 13:04:52 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> Message-ID: Why don?t you look at packaging your small files into larger files which will be handled more effectively. There is no simple way to replicate / move billions of small files, But surely you can build your work flow to package the files up into a zip or tar format which will simplify not only the number of IO transactions but also make the whole process more palatable to the NFS protocol Sent from my iPhone > On 26 Feb 2020, at 22:58, Andi Christiansen wrote: > > ? > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=BDsYqP0is2zoDGYU5Ej1lSJ4s9DJhMsW40equi5dqCs&s=22KcLJbUqsq3nfr3qWnxDqA3kuHnFxSDeiENVUITmdA&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Feb 26 13:27:32 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 26 Feb 2020 13:27:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> References: <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 13:33:51 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 13:33:51 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:18 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:18 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <688463139.162864.1582724298905@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:59 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:59 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <673673077.162875.1582724339498@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:39:22 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:39:22 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: <262580944.162883.1582724362722@privateemail.com> An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 14:24:32 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 14:24:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <262580944.162883.1582724362722@privateemail.com> References: <262580944.162883.1582724362722@privateemail.com>, , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Feb 26 15:49:45 2020 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Feb 2020 08:49:45 -0700 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> <262580944.162883.1582724362722@privateemail.com> Message-ID: if you are looking for a commercial supported solution, our Dataflow product is purpose build for this kind of task. a presentation that covers some high level aspects of it was given by me last year at one of the spectrum scale meetings in the UK --> https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSUG19UK-Day-1-05-DDN-Optimizing-storage-stacks-for-AI.pdf. its at the end of the deck. if you want more infos, please let me know and i can get you in contact with the right person. Sven On Wed, Feb 26, 2020 at 7:24 AM Frederick Stock wrote: > > What sources are you using to help you with configuring AFM? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Andi Christiansen > To: Frederick Stock , gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] RE: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:39 AM > > 5.0.4-2.1 (home and cache) > > On February 26, 2020 2:33 PM Frederick Stock wrote: > > > Andi, what version of Spectrum Scale do you have installed? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: andi at christiansen.xxx, gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:27 AM > > you may consider WatchFolder ... (cluster wider inotify --> kafka) .. and then you go from there > > > > ----- Original message ----- > From: Andi Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 1:59 PM > > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chris.schlipalius at pawsey.org.au Thu Feb 27 00:23:56 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Thu, 27 Feb 2020 08:23:56 +0800 Subject: [gpfsug-discuss] AFM Alternative? Aspera? Message-ID: Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? From vpuvvada at in.ibm.com Fri Feb 28 05:22:56 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 28 Feb 2020 10:52:56 +0530 Subject: [gpfsug-discuss] AFM Alternative? Aspera? In-Reply-To: References: Message-ID: Transferring the small files with AFM + NFS over high latency networks is always a challenge. For example, for each small file replication AFM performs a lookup, create, write and set mtime operation. If the latency is 10ms, replication of each file takes minimum (10 * 4 = 40 ms) amount of time. AFM is not a network acceleration tool and also it does not use compression. If the file sizes are big, AFM parallel IO and parallel mounts feature can be used. Aspera can be used to transfer the small files over high latency network with better utilization of the network bandwidth. https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm ~Venkat (vpuvvada at in.ibm.com) From: Chris Schlipalius To: Date: 02/27/2020 05:54 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? Aspera? Sent by: gpfsug-discuss-bounces at spectrumscale.org Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spectrumscaleug.org_wp-2Dcontent_uploads_2019_05_SSSD19DE-2DDay-2D2-2DB02-2DIntegration-2Dof-2DSpectrum-2DScale-2Dand-2DAspera-2DSync.pdf&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=tRCxd4SimJH_eycqekhzM0Qp3TB3NtaIYWBvyQnrIiM&e= Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au < https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pawsey.org.au_&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=Xkm8VFy3l6nyD40yhONihsKcqmwRhy4SZyd0lwHf1GA&e= > On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=mYK1ZsVgtsM6HntRMLPS49tKvEhhgGAdWF2qniyn9Ko&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Feb 28 08:55:06 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 28 Feb 2020 08:55:06 +0000 Subject: [gpfsug-discuss] SSUG Events 2020 update Message-ID: <780D9B15-E329-45B7-B62E-1F880512CE7E@spectrumscale.org> Hi All, I thought it might be giving a little bit of an update on where we are with events this year. As you may know, SCAsia was cancelled in its entirety due to Covid-19 in Singapore and so there was no SSUG meeting. In the US, we struggled to find a venue to host the spring meeting and now time is a little short to arrange something for the end of March planned date. The IBM Spectrum Scale Strategy Days in Germany in March are currently still planned to happen next week. For the UK meeting (May), we haven?t yet opened registration but are planning to do so next week. We currently believe that as an event with 120-130 attendees, this is probably very low risk, but we?ll keep the current government advice under review as we approach the date. I would suggest that if you are planning to travel internationally to the UK event that you delay booking flights/book refundable transport and ensure you have adequate insurance in place in the event we have to cancel the event. For ISC in June, we currently don?t have a date, nor any firm plans to run an event this year. Simon Thompson UK group chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 15:12:31 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 10:12:31 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers Message-ID: Hello Everyone, I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. One of our users run a deep learning application on GPUs, that does the following IO pattern: It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. I observed that the performance varies widely depending on what maxblocksize parameter I use. For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. We have different file system block sizes in our environment ranging from 512K, 1M and 16M. We also use storage clusters and compute clusters design. Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. I am thinking if there is anything else that can do the same job as maxblocksize parameter. I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? Thank you, Lohit From anobre at br.ibm.com Fri Feb 28 17:58:22 2020 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 28 Feb 2020 17:58:22 +0000 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 21:53:25 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 16:53:25 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: <2B1F9901-0712-44EB-9D0A-8B40F7BE58EA@cbio.mskcc.org> Hello Anderson, This application requires minimum throughput of about 10-13MB/s initially and almost no IOPS during first phase where it opens all the files and reads the headers and about 30MB/s throughput during the second phase. The issue that I face is during the second phase where it tries to randomly read about 4K of block size from random files from 20000 to about 100000. In this phase - I see a big difference in maxblocksize parameter changing the performance of the reads, with almost no throughput and may be around 2-4K IOPS. This issue is a follow up to the previous issue that I had mentioned about an year ago - where I see differences in performance - ?though there is practically no IO to the storage? I mean - I see a difference in performance between different FS block-sizes even if all data is cached in pagepool. Sven had replied to that thread mentioning that it could be because of buffer locking issue. The info requested is as below: 4 Storage clusters: Storage cluster for compute: 5.0.3-2 GPFS version FS version: 19.01 (5.0.1.0) Subblock size: 16384 Blocksize : 16M Flash Storage Cluster for compute: 5.0.4-2 GPFS version FS version: 18.00 (5.0.0.0) Subblock size: 8192 Blocksize: 512K Storage cluster for admin tools: 5.0.4-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 131072 Blocksize: 4M Storage cluster for archival: 5.0.3-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 32K Blocksize: 1M The only two clusters that users do/will do compute on is the 16M filesystem and the 512K Filesystem. When you ask what is the throughput/IOPS and block size - it varies a lot and has not been recorded. The 16M FS is capable of doing about 27GB/s seq read for about 1.8 PB of storage. The 512K FS is capable of doing about 10-12GB/s seq read for about 100T of storage. Now as I mentioned previously - the issue that I am seeing has been related to different FS block sizes on the same storage. For example: On the Flash Storage cluster: Block size of 512K with maxblocksize of 16M gives worse performance than Block size of 512K with maxblocksize of 512K. It is the maxblocksize that is affecting the performance, on the same storage with same block size and everything else being the same. I am thinking the above is because of the number of buffers involved, but would like to learn if it happens to be anything else. I have debugged the same with IBM GPFS techs and it has been found that there is no issue with the storage itself or any of the other GPFS tuning parameters. Now since we do know that maxblocksize is making a big difference. I would like to keep it as low as possible but still be able to mount other remote GPFS filesystems with higher block sizes. Or since it is required to keep the maxblocksize the same across all storage - I would like to know if there is any other parameters that could do the same change as maxblocksize. Thank you, Lohit > On Feb 28, 2020, at 12:58 PM, Anderson Ferreira Nobre wrote: > > Hi Lohit, > > First, a few questions to understand better your problem: > - What is the minimum release level of both clusters? > - What is the version of filesystem layout for 16MB, 1MB and 512KB? > - What is the subblocksize of each filesystem? > - How many IOPS, block size and throughput are you doing on each filesystem? > > Abra?os / Regards / Saludos, > > Anderson Nobre > Power and Storage Consultant > IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services > > > > Phone: 55-19-2132-4317 > E-mail: anobre at br.ibm.com > > > ----- Original message ----- > From: "Valleru, Lohit/Information Systems" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers > Date: Fri, Feb 28, 2020 12:30 > > Hello Everyone, > > I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. > > One of our users run a deep learning application on GPUs, that does the following IO pattern: > > It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. > > When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. > > I observed that the performance varies widely depending on what maxblocksize parameter I use. > For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. > > The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. > > We have different file system block sizes in our environment ranging from 512K, 1M and 16M. > > We also use storage clusters and compute clusters design. > > Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. > > I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. > However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. > > I am thinking if there is anything else that can do the same job as maxblocksize parameter. > > I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. > > May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? > > Thank you, > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Feb 3 08:56:09 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 08:56:09 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered Message-ID: Hello, Does mmshutdown or mmumount trigger a file system log recovery, same as a node failure or daemon crash do? Last week we got this advisory: IBM Spectrum Scale (GPFS) 5.0.4 levels: possible metadata or data corruption during file system log recovery https://www.ibm.com/support/pages/node/1274428?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E You need a file system log recovery running to potentially trigger the issue. When does a file system log recovery run? For sure on any unexpected mmfsd/os crash for mounted filesystems, or on connection loss, but what if we do a clean 'mmshutdown' or 'mmumount' - I assume this will cause the client to nicely finish all outstanding transactions and return the empty logfile, hence non log recovery will take place is we do a normal os shutdown/reboot, too? Or am I wrong and Spectrum Scale treats all cases the same way? I asked because the advisory states that a node reboot will trigger a log recovery - until we upgraded to 5.0.4-2 we'll try to avoid log recoveries: > Log recovery happens after a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot). Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== From heinrich.billich at id.ethz.ch Mon Feb 3 10:02:06 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 3 Feb 2020 10:02:06 +0000 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: References: Message-ID: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? Cheers, Heiner From: on behalf of Jan-Frode Myklebust Reply to: gpfsug main discussion list Date: Thursday, 30 January 2020 at 18:00 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes I *think* this was a known bug in the Power firmware included with 5.3.4, and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. -jf tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward >: Interesting. We just deployed an ESS here and are running into a very similar problem with the gui refresh it appears. Takes my ppc64le's about 45 seconds to run rinv when they are idle. I had just opened a support case on this last evening. We're on ESS 5.3.4 as well. I will wait to see what support says. Ed Wahl Ohio Supercomputer Center -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Ulrich Sibiller Sent: Thursday, January 30, 2020 9:44 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Can I change the times at which the GUI runs HW_INVENTORY and related tasks? > > we frequently get messages like > > gui_refresh_task_failed GUI WARNING 12 hours ago > The following GUI refresh task(s) failed: HW_INVENTORY > > The tasks fail due to timeouts. Running the task manually most times > succeeds. We do run two gui nodes per cluster and I noted that both > servers seem run the HW_INVENTORY at the exact same time which may > lead to locking or congestion issues, actually the logs show messages > like > > EFSSA0194I Waiting for concurrent operation to complete. > > The gui calls ?rinv? on the xCat servers. Rinv for a single > little-endian server takes a long time ? about 2-3 minutes , while it finishes in about 15s for big-endian server. > > Hence the long runtime of rinv on little-endian systems may be an > issue, too > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate xCat/ems server for each type. The GUI nodes are ppc64le. > > We did see this issue with several gpfs version on the gui and with at least two ESS/xCat versions. > > Just to be sure I did purge the Posgresql tables. > > I did try > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are difficult. I have seen the same on ppc64le. From time to time it recovers but then it starts again. The timeouts are okay, it is the hardware. I haven opened a call at IBM and they suggested upgrading to ESS 5.3.5 because of the new firmwares which I am currently doing. I can dig out more details if you want. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Feb 3 10:45:43 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 3 Feb 2020 11:45:43 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: <98640bc8-ecb7-d050-ea38-da47cf1b9ea4@science-computing.de> On 2/3/20 11:02 AM, Billich Heinrich Rainer (ID SD) wrote: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for ppc64le. The Readme for > 5.3.5 lists FW860.60 again, same as 5.3.4? I have done the upgrade to 5.3.5 last week and gssinstallcheck now reports 860.70: [...] Installed version: 5.3.5-20191205T142815Z_ppc64le_datamanagement [OK] Linux kernel installed: 3.10.0-957.35.2.el7.ppc64le [OK] Systemd installed: 219-67.el7_7.2.ppc64le [OK] Networkmgr installed: 1.18.0-5.el7_7.1.ppc64le [OK] OFED level: MLNX_OFED_LINUX-4.6-3.1.9.1 [OK] IPR SAS FW: 19512300 [OK] ipraid RAID level: 10 [OK] ipraid RAID Status: Optimized [OK] IPR SAS queue depth: 64 [OK] System Firmware: FW860.70 (SV860_205) [OK] System profile setting: scale [OK] System profile verification PASSED. [OK] Host adapter driver: 16.100.01.00 [OK] Kernel sysrq level is: kernel.sysrq = 1 [OK] GNR Level: 5.0.4.1 efix6 [...] Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Feb 3 19:41:31 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 3 Feb 2020 20:41:31 +0100 Subject: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes In-Reply-To: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> References: <7C0BA69E-1320-4CD0-BD0F-E9FCBB7A47CB@id.ethz.ch> Message-ID: I think both 5.3.4.2 and 5.3.5 includes FW860.70, but the readme doesn?t show this correctly. -jf man. 3. feb. 2020 kl. 11:02 skrev Billich Heinrich Rainer (ID SD) < heinrich.billich at id.ethz.ch>: > Thank you. I wonder if there is any ESS version which deploys FW860.70 for > ppc64le. The Readme for 5.3.5 lists FW860.60 again, same as 5.3.4? > > > > Cheers, > > > > Heiner > > *From: * on behalf of Jan-Frode > Myklebust > *Reply to: *gpfsug main discussion list > *Date: *Thursday, 30 January 2020 at 18:00 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > > > > > I *think* this was a known bug in the Power firmware included with 5.3.4, > and that it was fixed in the FW860.70. Something hanging/crashing in IPMI. > > > > > > > > -jf > > > > tor. 30. jan. 2020 kl. 17:10 skrev Wahl, Edward : > > Interesting. We just deployed an ESS here and are running into a very > similar problem with the gui refresh it appears. Takes my ppc64le's about > 45 seconds to run rinv when they are idle. > I had just opened a support case on this last evening. We're on ESS > 5.3.4 as well. I will wait to see what support says. > > Ed Wahl > Ohio Supercomputer Center > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ulrich Sibiller > Sent: Thursday, January 30, 2020 9:44 AM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY > with two active GUI nodes > > On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Can I change the times at which the GUI runs HW_INVENTORY and related > tasks? > > > > we frequently get messages like > > > > gui_refresh_task_failed GUI WARNING 12 hours > ago > > The following GUI refresh task(s) failed: HW_INVENTORY > > > > The tasks fail due to timeouts. Running the task manually most times > > succeeds. We do run two gui nodes per cluster and I noted that both > > servers seem run the HW_INVENTORY at the exact same time which may > > lead to locking or congestion issues, actually the logs show messages > > like > > > > EFSSA0194I Waiting for concurrent operation to complete. > > > > The gui calls ?rinv? on the xCat servers. Rinv for a single > > little-endian server takes a long time ? about 2-3 minutes , while it > finishes in about 15s for big-endian server. > > > > Hence the long runtime of rinv on little-endian systems may be an > > issue, too > > > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a > separate xCat/ems server for each type. The GUI nodes are ppc64le. > > > > We did see this issue with several gpfs version on the gui and with at > least two ESS/xCat versions. > > > > Just to be sure I did purge the Posgresql tables. > > > > I did try > > > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY ?debug > > > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are > difficult. > > > I have seen the same on ppc64le. From time to time it recovers but then it > starts again. The timeouts are okay, it is the hardware. I haven opened a > call at IBM and they suggested upgrading to ESS 5.3.5 because of the new > firmwares which I am currently doing. I can dig out more details if you > want. > > Uli > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart Registernummer/Commercial > Register No.: HRB 382196 _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Feb 6 05:02:29 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 6 Feb 2020 05:02:29 +0000 Subject: [gpfsug-discuss] When is a file system log recovery triggered In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sat Feb 8 11:33:21 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sat, 8 Feb 2020 11:33:21 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Message-ID: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is "rule of thumb" that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Sat Feb 8 11:59:54 2020 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Sun, 9 Feb 2020 00:59:54 +1300 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: > Hello! > > We are designing two fs where we cannot anticipate if there will be 3000, > or maybe 5000 or more nodes totally accessing these filesystems > > What we saw, was that execution time of mmdf can last 5-7min > > We openend a case and they said, that during such commands like mmdf or > also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is > the reason why it takes so long > > The technichian also said, that it is ?rule of thumb? that there should be > > (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per > pool ?) > > (also Block size has influence on regions ?) > > > > #mmfsadm saferdump stripe > > Gives the regions number > > storage pools: max 8 > > > > alloc map type 'scatter' > > > > 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 > thinProvision reserved inode -1, reserved nBlocks 0 > > > > *regns 170413* segs 1 size 4096 FBlks 0 MBlks 3145728 subblock > size 8192 > > > > > > > > > > > > We also saw when creating the filesystem with a speciicic (-n) very high > (5000) (where mmdf execution time was some minutes) and then changing (-n) > to a lower value this does not influence the behavior any more > > > > My question is: Is the rule (Number of Nodes)x5000 for number of regios in > a pool an good estimation , > > Is it better to overestimate the number of Nodes (lnger running commands) > or is it unrealistic to get into problems when not reaching the regions > number calculated ? > > > > Does anybody have experience with high number of nodes (>>3000) and how > to design the filesystems for such large clusters ? > > > > Thank you very much in advance ! > > > > > > > > Mit freundlichen Gr??en > *Walter Sklenka* > *Technical Consultant* > > > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Sun Feb 9 09:59:32 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Sun, 9 Feb 2020 09:59:32 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> Message-ID: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 --> 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Mon Feb 10 11:09:56 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 10 Feb 2020 11:09:56 +0000 Subject: [gpfsug-discuss] Spectrum scale yum repos - any chance to the number of repos Message-ID: <1B9A9988-7347-41B4-A881-4300F8F9E5BF@id.ethz.ch> Hello, Does it work to merge ?all? Spectrum Scale rpms of one version in one yum repo, can I merge rpms from different versions in the same repo, even different architectures? Yum repos for RedHat, Suse, Debian or application repos like EPEL all manage to keep many rpms and all different versions in a few repos. Spreading the few Spectrum Scale rpms for rhel across about 11 repos for each architecture and version seems overly complicated ? and makes it difficult to use RedHat Satellite to distribute the software ;-( Does anyone have experiences or opinions with this ?single repo? approach ? Does something break if we use it? We run a few clusters where up to now each runs its own yum server. We want to consolidate with RedHat Satellite for os and scale provisioning/updates. RedHat Satellite having just one repo for _all_ versions would fit much better. And may just separate repos for base (including protocols), object and hdfs (which we don?t use). My wish: The number of repos should no grow with the number of versions provided and adding a new version should not require to setup new yum repos. I know you can workaround and script, but would be easier if I wouldn?t need to. Regards, Heiner From nfalk at us.ibm.com Mon Feb 10 14:57:13 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Mon, 10 Feb 2020 14:57:13 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems Phone: 1-720-349-9538 | Mobile: 1-845-546-4930 E-mail: nfalk at us.ibm.com Find me on: From: Walter Sklenka To: gpfsug main discussion list Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 ? 160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=bgNFbl7WeRbpQtvfu8K1GC1HVGofxoeEehWJXVM6H0c&s=BRQWKQ--3xw8g_2o9-RD-XsRdMon6iIy31iSstzRRAw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Mon Feb 10 18:34:45 2020 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Mon, 10 Feb 2020 18:34:45 +0000 Subject: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions In-Reply-To: References: <4f3f6a0d7191448cb460ec90f4eebf5a@Mail.EDVDesign.cloudia> <560d571f2552444badb9614407fdc8c7@Mail.EDVDesign.cloudia> Message-ID: <92ca7c73eb314667be51d79f97f34c9c@Mail.EDVDesign.cloudia> Hello Nate! Thank you very much for the response Do you know if the rule of thumb for ?enough regions =N*32 per pool And isn?t there an other way to increate the number of regions? (mybe by reducing block-size ? It?s only because the commands excetuin time of a couple of minutes make me nervous , or is the reason more a poor metadata perf for the long running command? But if you say so we will change it to N=5000 Best regards Walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Nathan Falk Gesendet: Monday, February 10, 2020 3:57 PM An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Hello Walter, If you anticipate that the number of clients accessing this file system may grow as high as 5000, then that is probably the value you should use when creating the file system. The data structures (regions for example) are allocated at file system creation time (more precisely at storage pool creation time) and are not changed later. The mmcrfs doc explains this: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmcrfs.htm -n NumNodes The estimated number of nodes that will mount the file system in the local cluster and all remote clusters. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value can be changed after the file system has been created but it does not change the existing data structures. Only the newly created data structure is affected by the new value. For example, new storage pool. When you create a GPFS file system, you might want to overestimate the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations (For more information, see GPFS architecture ). If you are sure there will never be more than 64 nodes, allow the default value to be applied. If you are planning to add nodes to your system, you should specify a number larger than the default. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems ________________________________ Phone:1-720-349-9538| Mobile:1-845-546-4930 E-mail:nfalk at us.ibm.com Find me on:[LinkedIn: https://www.linkedin.com/in/nathan-falk-078ba5125] [Twitter: https://twitter.com/natefalk922] [IBM] From: Walter Sklenka > To: gpfsug main discussion list > Date: 02/09/2020 04:59 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi! At the time of writing we set N to 1200 , but we are not sure if it would be better to set to overestimated 5000 ? We use 6 backend nodes The backend storage is a Flash9100 for metadata and 6x Lenovo DE6000H . We will finally use 2 filesystems : data and home Fs ?data? consist of 12 metadada-nsd and 72 dataonly nsds We have enough space to add nsds (finally the fs [root at nsd75-01 ~]# mmlspool data Storage pools in file system at '/gpfs/data': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 4 MB no yes 0 0 ( 0%) 12884901888 12800315392 ( 99%) saspool 65537 4 MB yes no 1082331758592 1082326446080 (100%) 0 0 ( 0%) [root at nsd75-01 ~]# mmlsfs data flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 4194304 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:32:05 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 1342177280 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system;saspool Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d de750101vol01;de750101vol02;de750101vol03;de750101vol04;de750101vol05;de750101vol06;de750102vol01;de750102vol02;de750102vol03;de750102vol04;de750102vol05;de750102vol06; -d de750201vol01;de750201vol02;de750201vol03;de750201vol04;de750201vol05;de750201vol06;de750202vol01;de750202vol02;de750202vol03;de750202vol04;de750202vol05;de750202vol06; -d de760101vol01;de760101vol02;de760101vol03;de760101vol04;de760101vol05;de760101vol06;de760102vol01;de760102vol02;de760102vol03;de760102vol04;de760102vol05;de760102vol06; -d de760201vol01;de760201vol02;de760201vol03;de760201vol04;de760201vol05;de760201vol06;de760202vol01;de760202vol02;de760202vol03;de760202vol04;de760202vol05;de760202vol06; -d de770101vol01;de770101vol02;de770101vol03;de770101vol04;de770101vol05;de770101vol06;de770102vol01;de770102vol02;de770102vol03;de770102vol04;de770102vol05;de770102vol06; -d de770201vol01;de770201vol02;de770201vol03;de770201vol04;de770201vol05;de770201vol06;de770202vol01;de770202vol02;de770202vol03;de770202vol04;de770202vol05;de770202vol06; -d globalmeta0;globalmeta1;globalmeta2;globalmeta3;globalmeta4;globalmeta5;globalmeta6;globalmeta7;globalmeta8;globalmeta9;globalmeta10;globalmeta11 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/data Default mount point --mount-priority 0 Mount priority ## For fs Home we use 24 dataAdnMetadata disks only on flash [root at nsd75-01 ~]# mmlspool home Storage pools in file system at '/gpfs/home': Name Id BlkSize Data Meta Total Data in (KB) Free Data in (KB) Total Meta in (KB) Free Meta in (KB) system 0 1024 KB yes yes 25769803776 25722931200 (100%) 25769803776 25722981376 (100%) [root at nsd75-01 ~]# [root at nsd75-01 ~]# mmlsfs home flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1200 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced fileset Default quotas enabled --perfileset-quota Yes Per-fileset quota enforcement --filesetdf Yes Fileset df enabled? -V 21.00 (5.0.3.0) File system version --create-time Fri Feb 7 15:31:28 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 25166080 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d home0;home10;home11;home12;home13;home14;home15;home16;home17;home18;home19;home1;home20;home21;home22;home23;home2;home3;home4;home5;home6;home7;home8;home9 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs/home Default mount point --mount-priority 0 Mount priority [root at nsd75-01 ~]# Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von:gpfsug-discuss-bounces at spectrumscale.org > Im Auftrag von Jos? Filipe Higino Gesendet: Saturday, February 8, 2020 1:00 PM An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] mmdf and maybe other commands long running // influence of n and B on number of regions How many back end nodes for that cluster? and how many filesystems for that same access... and how many pools for the same data access type (12 ndisks sounds very LOW to me, for that size of a cluster, probably no other filesystem can do more than that). On GPFS there are so many different ways to access the data, that is sometimes hard to start a conversation. And you did a very great job of introducing it. =) We (I am a customer too) do not have that many nodes, but from experience, I know some clusters (and also multicluster configs) depend mostly on how much metadata you can service in the network and how fast (latency wise) you can do it, to accommodate such amount of nodes. There is never design by the book that can safely tell something will work 100% times. But the beauty of it is that GPFS allows lots of aspects to be resized at your convenience to facilitate what you need most the system to do. Let us know more... On Sun, 9 Feb 2020 at 00:40, Walter Sklenka > wrote: Hello! We are designing two fs where we cannot anticipate if there will be 3000, or maybe 5000 or more nodes totally accessing these filesystems What we saw, was that execution time of mmdf can last 5-7min We openend a case and they said, that during such commands like mmdf or also mmfsck, mmdefragfs,mmresripefs all regions must be scanned at this is the reason why it takes so long The technichian also said, that it is ?rule of thumb? that there should be (-n)*32 regions , this would then be enough ( N=5000 -->160000 regions per pool ?) (also Block size has influence on regions ?) #mmfsadm saferdump stripe Gives the regions number storage pools: max 8 alloc map type 'scatter' 0: name 'system' Valid nDisks 12 nInUse 12 id 0 poolFlags 0 thinProvision reserved inode -1, reserved nBlocks 0 regns 170413 segs 1 size 4096 FBlks 0 MBlks 3145728 subblock size 8192 We also saw when creating the filesystem with a speciicic (-n) very high (5000) (where mmdf execution time was some minutes) and then changing (-n) to a lower value this does not influence the behavior any more My question is: Is the rule (Number of Nodes)x5000 for number of regios in a pool an good estimation , Is it better to overestimate the number of Nodes (lnger running commands) or is it unrealistic to get into problems when not reaching the regions number calculated ? Does anybody have experience with high number of nodes (>>3000) and how to design the filesystems for such large clusters ? Thank you very much in advance ! Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Tue Feb 11 21:44:07 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 11 Feb 2020 16:44:07 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> Message-ID: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From scale at us.ibm.com Wed Feb 12 12:48:42 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 12 Feb 2020 07:48:42 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca><5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: Hi Jaime, When I copy & paste your command to try, this is what I got. /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jaime Pinto To: gpfsug main discussion list , Marc A Kaplan Date: 02/11/2020 05:26 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ??tsm?servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ??tsm?servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer [,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 10.20.205.51 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS 192.168.94.128 NODENAME home COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT 1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: > > > On 11/3/2019 20:24:35, Marc A Kaplan wrote: >> Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. > > Hey Marc, > They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the > start of each instance, each with its own dsm.sys (3 wrappers). > (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg1? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg2? --scope inodespace -v -a 8 -L 2) > (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog? -g > /gpfs/fs1/home/.mmbackupCfg3? --scope inodespace -v -a 8 -L 2) > > I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for > multiple target servers. It may be just what I need. I'll try this next. > > Thank you very much, > Jaime > >> >> Peeking into the script, I find: >> >> if [[ $scope == "inode-space" ]] >> then >> deviceSuffix="${deviceName}.${filesetName}" >> else >> deviceSuffix="${deviceName}" >> >> >> I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... >> >> And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: >> >> # --tsm-servers TSMserver[,TSMserver...] >> # List of TSM servers to use instead of the servers in the dsm.sys file. >> >> >> >> Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto >> ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac >> >> From: Jaime Pinto >> To: gpfsug main discussion list >> Date: 11/01/2019 07:40 PM >> Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ?g GlobalWorkDirectory not being followed >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> How can I force secondary processes to use the folder instructed by the -g option? >> >> I started a mmbackup with ?g /gpfs/fs1/home/.mmbackupCfg1 and another with ?g /gpfs/fs1/home/.mmbackupCfg2 (and another with ?g >> /gpfs/fs1/home/.mmbackupCfg3 ...) >> >> However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not >> happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. >> >> See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): >> >> DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance >> -a 8 ?-P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope >> inodespace >> >> >> Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the >> different mmbackup instances tripping over each other. The serializing is very undesirable. >> >> Thanks >> Jaime >> >> >> ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=vCTEqk_OPEgrWnqq9bJpzD-pn5QnNNNo3citEqiTsEY&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=or2HFYOoCdTJ5x-rCnVcq8cFo3SsnpCzODVHNLp7jlA&s=76T6OenS_DXfRVD5Xh02vz8qnWOyhmv7yWeawZKYmWA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu Feb 13 19:37:12 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 13 Feb 2020 11:37:12 -0800 Subject: [gpfsug-discuss] NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> Message-ID: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. Thanks, Kristy > On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: > > Thanks for your responses to the poll. > > We?re still working on a venue, but working towards: > > March 30 - New User Day (Tuesday) > April 1&2 - Regular User Group Meeting (Wednesday & Thursday) > > Once it?s confirmed we?ll post something again. > > Best, > Kristy. > >> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >> >> Thank you to the 18 wonderful people who filled out the survey. >> >> However, there are well more than 18 people at any given UG meeting. >> >> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >> >> Happy New Year. >> >> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >> >> Thanks, >> Kristy >> >>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>> >>> Best wishes to all in the new year. >>> >>> -Kristy >>> >>> >>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Fri Feb 14 13:25:08 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Fri, 14 Feb 2020 14:25:08 +0100 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? Message-ID: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From S.J.Thompson at bham.ac.uk Fri Feb 14 14:56:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 14 Feb 2020 14:56:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Fri Feb 14 16:24:40 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 14 Feb 2020 16:24:40 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: Some (perhaps obvious) points to consider: - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. Some thoughts about whether it's appropriate to use NSD servers as sync hosts... - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... - do they have enough available RAM and CPU? - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. Good luck, Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, February 14, 2020 09:57 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? This message was sent by an external party. I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils And use MPI to parallelise the find and copy. Simon ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Fri Feb 14 16:13:30 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 14 Feb 2020 16:13:30 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. Ed -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco Sent: Friday, February 14, 2020 8:25 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. The question: is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? The environment: GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ From valdis.kletnieks at vt.edu Fri Feb 14 17:28:27 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Fri, 14 Feb 2020 12:28:27 -0500 Subject: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]] In-Reply-To: <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> References: <15ae3e3d-9274-13a1-06e0-9ddea4f200a7@scinet.utoronto.ca> <5bf94749-4add-c4f4-63df-21551c5111e1@scinet.utoronto.ca> <3e90cceb-36cf-6d42-dddc-c1ce2dfc46a4@scinet.utoronto.ca> Message-ID: <61512.1581701307@turing-police> On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog I got bit by this when cut-n-pasting from IBM documentation - the problem is that the web version has characters that *look* like the command-line hyphen character but are actually something different. It's the same problem as cut-n-pasting a command line where the command *should* have the standard ascii double-quote, but the webpage has "smart quotes" where there's different open and close quote characters. Just even less visually obvious... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From skylar2 at uw.edu Fri Feb 14 17:24:46 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Fri, 14 Feb 2020 17:24:46 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <20200214172446.gwzd332efrkpcuxp@utumno.gs.washington.edu> Our experience matches Ed. I have a vague memory that clients will balance traffic across all NSD servers based on the preferred list for each NSD, whereas NSD servers will just read from each NSD directly. On Fri, Feb 14, 2020 at 04:13:30PM +0000, Wahl, Edward wrote: > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From bhill at physics.ucsd.edu Fri Feb 14 18:10:04 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Fri, 14 Feb 2020 10:10:04 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 Message-ID: Hi All: I'm performing a rolling upgrade of one of our GPFS clusters. This particular cluster has 2 CNFS servers for some of our NFS clients. I wiped one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem mounts fine on the node when I disable CNFS on the node, but with it enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd has started, so it assumes the worst and shuts down the file system (I currently have reboot on failure disabled to debug this). The thing is, it actually does start nfsd processes when running mmstartup on the node. Doing a "ps" shows 32 nfsd threads are running. Below is the CNFS-specific output from an attempt to start the node: CNFS[27243]: Restarting lockd to start grace CNFS[27588]: Enabling 172.16.69.76 CNFS[27694]: Restarting lockd to start grace CNFS[27699]: Starting NFS services CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[27910]: Monitor has started pid=27787 CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it CNFS[28705]: Starting NFS services CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it CNFS[28758]: Starting NFS services CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it CNFS[28816]: Starting NFS services CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server is not running or responding. Node failure initiated as configured. CNFS[28924]: Unexporting all GPFS filesystems Any thoughts? My other CNFS node is handling everything for the time being, thankfully! Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 14 21:09:14 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 14 Feb 2020 21:09:14 +0000 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: <404B2B75-C094-43CC-9146-C00410F31578@bham.ac.uk> Message-ID: <072a3754-5160-09da-0c14-54e08ecefef7@strath.ac.uk> On 14/02/2020 16:24, Sanchez, Paul wrote: > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or > sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage > the sync before your cutover so that there is less data movement > required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally > inconsistent) during your rsync, so you should generally sync from a > snapshot on the source. In my experience this causes an rsync to exit with a none zero error code. See later as to why this is useful. Also it will likely have a different mtime that will cause it be resynced on a subsequent run, the final one will be with the file system in a "read only" state. Not necessarily mounted read only but without anything running that might change stuff. [SNIP] > > - If you decide to do a final "offline" sync, you want it to be fast > so users can get back to work sooner, so parallelism is usually a > must. If you have lots of filesets, then that's a convenient way to > split the work. This final "offline" sync is an absolute must, in my experience unless you are able to be rather woolly about preserving data. > > - If you have any filesets with many more inodes than the others, > keep in mind that those will likely take the longest to complete. > Indeed. We found last time that we did an rsync which was for a HPC system from the put of woe that is Lustre to GPFS there was huge mileage to be hand from telling users that they would get on the new system once their data was synced, it would be done on a "per user" basis with the priority given to the users with a combination of the smallest amount of data and the smallest number of files. Did unbelievable wonders for the users to clean up their files. One user went from over 17 million files to under 50 thousand! The amount of data needing syncing nearly halved. It shrank to ~60% of the pre-announcement size. > - Test, test, test. You usually won't get this right on the first go > or know how long a full sync takes without practice. Remember that > you'll need to employ options to delete extraneous files on the > target when you're syncing over the top of a previous attempt, since > files intentionally deleted on the source aren't usually welcome if > they reappear after a migration. > rsync has a --delete option for that. I am going to add that if you do any sort of ILM/HSM then an rsync is going to destroy you ability to identify old files that have not been accessed, as the rsync will up date the atime of everything (don't ask how I know). If you have a backup (of course you do) I would strongly recommend considering getting your first "pass" from a restore. Firstly it won't impact the source file system while it is still in use and second it allows you to check your backup actually works :-) Finally when rsyncing systems like this I use a Perl script with an sqlite DB. Basically a list of directories to sync, you can have both source and destination to make wonderful things happen if wanted, along with a flag field. The way I use that is -1 means not synced, -2 means the folder in question is currently been synced, and anything else is the exit code of rsync. If you write the Perl script correctly you can start it on any number of nodes, just dump the sqlite DB on a shared folder somewhere (either the source or destination file systems work well here). If you are doing it in parallel record the node which did the rsync as well it can be useful in finding any issues in my experience. Once everything is done you can quickly check the sqlite DB for none zero flag fields to find out what if anything has failed, which gives you the confidence that your sync has completed accurately. Also any flag fields less than zero show you it's not finished. Finally you might want to record the time each individual rsync took, it's handy for working out that ordering I mentioned :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chris.schlipalius at pawsey.org.au Fri Feb 14 22:47:00 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 15 Feb 2020 06:47:00 +0800 Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? In-Reply-To: References: Message-ID: <168C52FC-4942-4D66-8762-EAEFC4655021@pawsey.org.au> We have used DCP for this, with mmdsh as DCP is MPI and multi node with auto resume. You can also customise threads numbers etc. DDN in fact ran it for us first on our NSD servers for a multi petabyte migration project. It?s in git. For client side, we recommend and use bbcp, our users use this to sync data. It?s fast and reliable and supports resume also. If you do use rsync, as suggested, do dryruns and then a sync and then final copy, as is often run on Isilons to keep geographically separate Isilons in sync. Newest version of rsync also. Regards, Chris Schlipalius Team Lead Data and Storage The Pawsey Supercomputing Centre Australia > On 15 Feb 2020, at 1:28 am, gpfsug-discuss-request at spectrumscale.org wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. naive question about rsync: run it on a client or on NSD > server? (Giovanni Bracco) > 2. Re: naive question about rsync: run it on a client or on NSD > server? (Simon Thompson) > 3. Re: naive question about rsync: run it on a client or on NSD > server? (Sanchez, Paul) > 4. Re: naive question about rsync: run it on a client or on NSD > server? (Wahl, Edward) > 5. Re: mmbackup [--tsm-servers TSMServer[, TSMServer...]] > (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 14 Feb 2020 14:25:08 +0100 > From: Giovanni Bracco > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > > > ------------------------------ > > Message: 2 > Date: Fri, 14 Feb 2020 14:56:30 +0000 > From: Simon Thompson > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: <404B2B75-C094-43CC-9146-C00410F31578 at bham.ac.uk> > Content-Type: text/plain; charset="utf-8" > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Message: 3 > Date: Fri, 14 Feb 2020 16:24:40 +0000 > From: "Sanchez, Paul" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Some (perhaps obvious) points to consider: > > - There are some corner cases (e.g. preserving hard-linked files or sparseness) which require special options. > > - Depending on your level of churn, it may be helpful to pre-stage the sync before your cutover so that there is less data movement required, and you're primarily comparing metadata. > > - Files on the source filesysytem might change (and become internally inconsistent) during your rsync, so you should generally sync from a snapshot on the source. > > - If users can still modify the source filesystem, then you might not get everything. For the final sync, you may need to make the source read-only, or unmount it on clients, kill user processes, or some combination to prevent all new writes from succeeding. (If you're going to use the clients for MPI sync, you obviously need the filesystem to remain mounted there so you may need to take other measures to keep users away.) > > - If you decide to do a final "offline" sync, you want it to be fast so users can get back to work sooner, so parallelism is usually a must. If you have lots of filesets, then that's a convenient way to split the work. > > - If you have any filesets with many more inodes than the others, keep in mind that those will likely take the longest to complete. > > - Test, test, test. You usually won't get this right on the first go or know how long a full sync takes without practice. Remember that you'll need to employ options to delete extraneous files on the target when you're syncing over the top of a previous attempt, since files intentionally deleted on the source aren't usually welcome if they reappear after a migration. > > - Verify. Whether you use rsync of dsync, repeating the process with dry-run/no-op flags which report differences can be helpful to increase your confidence in the process. If you don't have time to verify after the final offline sync, hopefully you were able to fit this in during testing. > > > Some thoughts about whether it's appropriate to use NSD servers as sync hosts... > > - If they are the managers and they have the best (direct) connectivity to the metadata NSDs, then I would at least consider them before ruling this out, with caveats... > - do they have enough available RAM and CPU? > - where do they get their software? Do you trust the version of kernel/libc/rsync there to behave as you expect? > - if the data NSDs aren't local to these NSD servers, do they have sufficient network connectivity to not cause other problems during the sync? > > - Test at low parallelism and work your way up. You can also compare performance of this method with any other, on a small scale, in your environment to see what you can expect from each. > > Good luck, > Paul > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson > Sent: Friday, February 14, 2020 09:57 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > This message was sent by an external party. > > > I wouldn't run it on an NSD server. Ideally you want to avoid running other processes etc on there. > > If you are running on clients, you also might want to look at: https://github.com/hpc/mpifileutils > > And use MPI to parallelise the find and copy. > > Simon > > ?On 14/02/2020, 14:25, "gpfsug-discuss-bounces at spectrumscale.org on behalf of giovanni.bracco at enea.it" wrote: > > We must replicate about 100 TB data between two filesystems supported by > two different storages (DDN9900 and DDN7990) both connected to the same > NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use > the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or > is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and > storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------ > > Message: 4 > Date: Fri, 14 Feb 2020 16:13:30 +0000 > From: "Wahl, Edward" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] naive question about rsync: run it on a > client or on NSD server? > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > Disregarding all the other reasons not to run it on the NSDs, many years of rsync on GPFS has shown us it is ALWAYS faster from clients with reasonable networks and no other overhead. > > Ed > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Giovanni Bracco > Sent: Friday, February 14, 2020 8:25 AM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] naive question about rsync: run it on a client or on NSD server? > > We must replicate about 100 TB data between two filesystems supported by two different storages (DDN9900 and DDN7990) both connected to the same NSD servers (6 of them) and we plan to use rsync. > > Non special GPFS attributes, just the standard POSIX one, we plan to use the standard rsync. > > The question: > is there any advantage in running the rsync on one of the NSD server or is better to run it on a client? > > The environment: > GPFS 4.2.3.19, NSD CentOS7.4, clients mostly CentOS6.4 (connected by IB > QDR) and CentOS7.3 (connected by OPA), connection between NSD and storage with IB QDR) > > Giovanni > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW https://urldefense.com/v3/__http://www.afs.enea.it/bracco__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTgu9tB2V$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!g5RuD3fGuhmAJMIOdC_LgW0sNdejJCxdMTaLQfVtFcySDF1pkEvsTn2QwFQn$ > > > ------------------------------ > > Message: 5 > Date: Fri, 14 Feb 2020 12:28:27 -0500 > From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" > To: gpfsug main discussion list > Cc: Marc A Kaplan > Subject: Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, > TSMServer...]] > Message-ID: <61512.1581701307 at turing-police> > Content-Type: text/plain; charset="utf-8" > > On Tue, 11 Feb 2020 16:44:07 -0500, Jaime Pinto said: > >> # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ??tsm?servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog > > I got bit by this when cut-n-pasting from IBM documentation - the problem is that > the web version has characters that *look* like the command-line hyphen character > but are actually something different. > > It's the same problem as cut-n-pasting a command line where the command > *should* have the standard ascii double-quote, but the webpage has "smart quotes" > where there's different open and close quote characters. Just even less visually > obvious... > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 832 bytes > Desc: not available > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 97, Issue 12 > ********************************************** From mnaineni at in.ibm.com Sat Feb 15 10:03:20 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Sat, 15 Feb 2020 10:03:20 +0000 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Sun Feb 16 18:19:00 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 10:19:00 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Hi Malahal: Just to clarify, are you saying that on your VM pidof is missing? Or that it is there and not working as it did prior to RHEL/CentOS 8? pidof is returning pid numbers on my system. I've been looking at the mmnfsmonitor script and trying to see where the check for nfsd might be failing, but I've not been able to figure it out yet. Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni wrote: > I am not familiar with CNFS but looking at git source seems to indicate > that it uses 'pidof' to check if a program is running or not. "pidof nfsd" > works on RHEL7.x but it fails on my centos8.1 I just created. So either we > need to make sure pidof works on kernel threads or fix CNFS scripts. > > Regards, Malahal. > > > ----- Original message ----- > From: Bryan Hill > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from > 4.2.3.11 to 5.0.4.2 > Date: Fri, Feb 14, 2020 11:40 PM > > Hi All: > > I'm performing a rolling upgrade of one of our GPFS clusters. This > particular cluster has 2 CNFS servers for some of our NFS clients. I wiped > one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem > mounts fine on the node when I disable CNFS on the node, but with it > enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd > has started, so it assumes the worst and shuts down the file system (I > currently have reboot on failure disabled to debug this). The thing is, it > actually does start nfsd processes when running mmstartup on the node. > Doing a "ps" shows 32 nfsd threads are running. > > Below is the CNFS-specific output from an attempt to start the node: > > CNFS[27243]: Restarting lockd to start grace > CNFS[27588]: Enabling 172.16.69.76 > CNFS[27694]: Restarting lockd to start grace > CNFS[27699]: Starting NFS services > CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[27910]: Monitor has started pid=27787 > CNFS[28702]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28705]: Starting NFS services > CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28755]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28758]: Starting NFS services > CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28813]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28816]: Starting NFS services > CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM > locks > CNFS[28867]: Monitor detected nfsd was not running, will attempt to start > it > CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS > server is not running or responding. Node failure initiated as configured. > CNFS[28924]: Unexporting all GPFS filesystems > > Any thoughts? My other CNFS node is handling everything for the time > being, thankfully! > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Mon Feb 17 02:56:24 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Sun, 16 Feb 2020 18:56:24 -0800 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References: Message-ID: Ah wait, I see what you might mean. pidof works but not specifically for processes like nfsd. That is odd. Thanks, Bryan On Sun, Feb 16, 2020 at 10:19 AM Bryan Hill wrote: > Hi Malahal: > > Just to clarify, are you saying that on your VM pidof is missing? Or > that it is there and not working as it did prior to RHEL/CentOS 8? pidof > is returning pid numbers on my system. I've been looking at the > mmnfsmonitor script and trying to see where the check for nfsd might be > failing, but I've not been able to figure it out yet. > > > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > > > On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni > wrote: > >> I am not familiar with CNFS but looking at git source seems to indicate >> that it uses 'pidof' to check if a program is running or not. "pidof nfsd" >> works on RHEL7.x but it fails on my centos8.1 I just created. So either we >> need to make sure pidof works on kernel threads or fix CNFS scripts. >> >> Regards, Malahal. >> >> >> ----- Original message ----- >> From: Bryan Hill >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: gpfsug-discuss at spectrumscale.org >> Cc: >> Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from >> 4.2.3.11 to 5.0.4.2 >> Date: Fri, Feb 14, 2020 11:40 PM >> >> Hi All: >> >> I'm performing a rolling upgrade of one of our GPFS clusters. This >> particular cluster has 2 CNFS servers for some of our NFS clients. I wiped >> one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem >> mounts fine on the node when I disable CNFS on the node, but with it >> enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd >> has started, so it assumes the worst and shuts down the file system (I >> currently have reboot on failure disabled to debug this). The thing is, it >> actually does start nfsd processes when running mmstartup on the node. >> Doing a "ps" shows 32 nfsd threads are running. >> >> Below is the CNFS-specific output from an attempt to start the node: >> >> CNFS[27243]: Restarting lockd to start grace >> CNFS[27588]: Enabling 172.16.69.76 >> CNFS[27694]: Restarting lockd to start grace >> CNFS[27699]: Starting NFS services >> CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[27910]: Monitor has started pid=27787 >> CNFS[28702]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28705]: Starting NFS services >> CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28755]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28758]: Starting NFS services >> CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28813]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28816]: Starting NFS services >> CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM >> locks >> CNFS[28867]: Monitor detected nfsd was not running, will attempt to start >> it >> CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS >> server is not running or responding. Node failure initiated as configured. >> CNFS[28924]: Unexporting all GPFS filesystems >> >> Any thoughts? My other CNFS node is handling everything for the time >> being, thankfully! >> >> Thanks, >> Bryan >> >> --- >> Bryan Hill >> Lead System Administrator >> UCSD Physics Computing Facility >> >> 9500 Gilman Dr. # 0319 >> La Jolla, CA 92093 >> +1-858-534-5538 >> bhill at ucsd.edu >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Feb 17 08:02:19 2020 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 17 Feb 2020 08:02:19 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CNFS_issue_after_upgrading_from_4=2E2?= =?utf-8?b?LjMuMTEgdG8JNS4wLjQuMg==?= In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From rp2927 at gsb.columbia.edu Mon Feb 17 18:42:51 2020 From: rp2927 at gsb.columbia.edu (Popescu, Razvan) Date: Mon, 17 Feb 2020 18:42:51 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients Message-ID: Hi, Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. My questions are: 1. Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) 2. Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? 3. Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). Thanks!! Razvan N. Popescu Research Computing Director Office: (212) 851-9298 razvan.popescu at columbia.edu Columbia Business School At the Very Center of Business -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Feb 17 18:57:47 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 17 Feb 2020 18:57:47 +0000 Subject: [gpfsug-discuss] Dataless nodes as GPFS clients In-Reply-To: References: Message-ID: We do this. We provision only the GPFS key files ? /var/mmfs/ssl/stage/genkeyData* ? and the appropriate SSH key files needed, and use the following systemd override to the mmsdrserv.service. Where is the appropriate place to do that override will depend on your version of GFPS somewhat as the systemd setup for GPFS has changed in 5.x, but I?ve rigged this up for any of the 4.x and 5.x that exist so far if you need pointers. We use CentOS, FYI, but I don?t think any of this should be different on Debian; our current version of GPFS on nodes where we do this is 5.0.4-1: [root at master ~]# wwsh file print mmsdrserv-override.conf #### mmsdrserv-override.conf ################################################## mmsdrserv-override.conf: ID = 1499 mmsdrserv-override.conf: NAME = mmsdrserv-override.conf mmsdrserv-override.conf: PATH = /etc/systemd/system/mmsdrserv.service.d/override.conf mmsdrserv-override.conf: ORIGIN = /root/clusters/amarel/mmsdrserv-override.conf mmsdrserv-override.conf: FORMAT = data mmsdrserv-override.conf: CHECKSUM = ee7c28f0eee075a014f7a1a5add65b1e mmsdrserv-override.conf: INTERPRETER = UNDEF mmsdrserv-override.conf: SIZE = 210 mmsdrserv-override.conf: MODE = 0644 mmsdrserv-override.conf: UID = 0 mmsdrserv-override.conf: GID = 0 [root at master ~]# wwsh file show mmsdrserv-override.conf [Unit] After=sys-subsystem-net-devices-ib0.device [Service] ExecStartPre=/usr/lpp/mmfs/bin/mmsdrrestore -p $SERVER -R /usr/bin/scp ExecStartPre=/usr/lpp/mmfs/bin/mmauth genkey propagate -N %{NODENAME}-ib0 ?where $SERVER above has been changed for this e-mail; the actual override file contains the hostname of our cluster manager, or other appropriate config server. %{NODENAME} is filled in by Warewulf, which is our cluster manager, and will contain any given node?s short hostname. I?ve since found that we can also set an object that I could use to make the first line include %{CLUSTERMGR} or other arbitrary variable and make this file more cluster-agnostic, but we just haven?t done that yet. Other than that, we build/install the appropriate gpfs.gplbin- RPM, which we build by doing ? on a node with an identical OS ? or you can manually modify the config and have the appropriate kernel source handy: "cd /usr/lpp/mmfs/src; make Autoconfig; make World; make rpm?. You?d do make deb instead. Also obviously installed is the rest of GPFS and you join the node to the cluster while it?s booted up one of the times. Warewulf starts a node off with a nearly empty /var, so anything we need to be in there has to be populated on boot. It?s required a little tweaking from time to time on OS upgrades or GPFS upgrades, but other than that, we?ve been running clusters like this without incident for years. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 17, 2020, at 1:42 PM, Popescu, Razvan wrote: > > Hi, > > Here at CBS we run our compute cluster as dataless nodes loading the base OS from a root server and using AUFS to overlay a few node config files (just krb5.keytab at this time) plus a tmpfs writtable layer on top of everything. The result is that a node restart resets the configuration to whatever is recorded on the root server which does not include any node specific runtime files. The (Debian10) system is based on debian-live, with a few in-house modification, a major feature being that we nfs mount the bottom r/o root layer such that we can make live updates (within certain limits). > > I?m trying to add native (GPL) GPFS access to it. (so far, we?ve used NFS to gain access to the GPFS resident data) > > I was successful in building an Ubuntu 18.04 LTS based prototype of a similar design. I installed on the root server all required GPFS (client) packages and manually built the GPL chroot?ed in the exported system tree. I booted a test node with a persistent top layer to catch the data created by the GPFS node addition. I successfully added the (client) node to the GPFS cluster. It seems to work fine. > > I?ve copied some the captured node data to the node specific overlay to try to run without any persistency: the critical one seems to be the one in /var/mmfs/gen. (copied all the /var/mmfs in fact). It runs fine without persistency. > > My questions are: > ? Am I insane and take the risk of compromising the cluster?s data integrity? (?by resetting the whole content of /var to whatever was generated after the mmaddnode command?!?!) > ? Would such a configuration run safely through a proper reboot? How about a forced power-off and restart? > ? Is there a properly identified minimum set of files that must be added to the node specific overlay to make this work? (for now, I?ve used my ?knowledge? and guesswork to decide what to retain and what not: e.g. keep startup links, certificates and config dumps, drop: logs, pids. etc?.). > > Thanks!! > > Razvan N. Popescu > Research Computing Director > Office: (212) 851-9298 > razvan.popescu at columbia.edu > > Columbia Business School > At the Very Center of Business > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.turner at ed.ac.uk Tue Feb 18 09:28:31 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:28:31 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space Message-ID: Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Tue Feb 18 09:36:57 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 18 Feb 2020 09:36:57 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.turner at ed.ac.uk Tue Feb 18 09:41:24 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 09:41:24 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: No, we weren?t using snapshots. This is from a location I have just moved from so I can?t do any active investigation now, but I am curious. In the end we had a power outage and the system was fine on reboot. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Luis Bolinches Sent: 18 February 2020 09:37 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space Hi Do you have snapshots? -- Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous ----- Original message ----- From: TURNER Aaron > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] Odd behaviour with regards to reported free space Date: Tue, Feb 18, 2020 11:28 Dear All, This has happened more than once with both 4.2.3 and 5.0. The instances may not be related. In the first instance, usage was high (over 90%) and so users were encouraged to delete files. One user deleted a considerable number of files equal to around 10% of the total storage. Reported usage did not fall. There were not obviously any waiters. Has anyone seen anything similar? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Feb 18 10:50:10 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 18 Feb 2020 10:50:10 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: Message-ID: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did > not fall. There were not obviously any waiters. Has anyone seen > anything similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.turner at ed.ac.uk Tue Feb 18 11:05:41 2020 From: aaron.turner at ed.ac.uk (TURNER Aaron) Date: Tue, 18 Feb 2020 11:05:41 +0000 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Dear Jonathan, This is what I had assumed was the case. Since the system ended up with an enforced reboot before we had time for further investigation I wasn't able to confirm this. > I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. Yes, that's very likely to have been the case. Regards Aaron Turner -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 18 February 2020 10:50 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reported free space On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > Dear All, > > This has happened more than once with both 4.2.3 and 5.0. The > instances may not be related. > > In the first instance, usage was high (over 90%) and so users were > encouraged to delete files. One user deleted a considerable number of > files equal to around 10% of the total storage. Reported usage did not > fall. There were not obviously any waiters. Has anyone seen anything > similar? > I have seen similar behaviour a number of times. I my experience it is because a process somewhere has an open file handle on one or more files/directories. So you can delete the file and it goes from a directory listing; it's no long visible when you do ls. However the file has not actually gone, and will continue to count towards total file system usage, user/group/fileset quota's etc. Once the errant process is found and killed magically the space becomes free. I can be very confusing for end users, especially when what is holding onto the file is some random zombie process on another node that died last month. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From bevans at pixitmedia.com Tue Feb 18 13:30:14 2020 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 18 Feb 2020 13:30:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale Jobs Message-ID: ArcaStream/Pixit Media are hiring! We?re on the hunt for Senior Systems Architects, Systems Engineers and DevOps Engineers to be part of our amazing growth in North America. Do you believe that coming up with innovative ways of solving complex workflow challenges is the truth path to storage happiness? Does the thought of knowing you played a small role in producing a blockbuster film, saving lives by reducing diagnosis times, or even discovering new planets excite you? Have you ever thought ?wouldn?t it be cool if?? while working with Spectrum Scale but never had the sponsorship or time to implement it? Do you want to make a lasting legacy of your awesome skills by building software defined solutions that will be used by hundreds of customers, doing thousands of amazing things? Do you have solid Spectrum Scale experience in either a deployment, development, architectural, support or sales capacity? Do you enjoy taking complex concepts and communicating them in a way that is easy for anyone to understand? If the answers to the above are ?yes?, we?d love to hear from you! Send us your CV/Resume to careers at arcastream.com to find out more information and let us know what your ideal position is! Regards, Barry Evans Chief Innovation Officer/Co-Founder Pixit Media/ArcaStream http://pixitmedia.com http://arcastream.com http://arcapix.com -- ? This email is confidential in that it is? intended for the exclusive attention of?the addressee(s) indicated. If you are?not the intended recipient, this email?should not be read or disclosed to?any other person. Please notify the?sender immediately and delete this? email from your computer system.?Any opinions expressed are not?necessarily those of the company?from which this email was sent and,?whilst to the best of our knowledge no?viruses or defects exist, no?responsibility can be accepted for any?loss or damage arising from its?receipt or subsequent use of this?email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Tue Feb 18 17:37:41 2020 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 18 Feb 2020 11:37:41 -0600 Subject: [gpfsug-discuss] Odd behaviour with regards to reported free space In-Reply-To: References: <85c563bd5e4c538031376d1fe86032765033cbf7.camel@strath.ac.uk> Message-ID: Deleting a file is a two stage process. The original user thread unlinks the file from the directory and reduces the link count. If the count is zero and the file is not open, then it gets queued for the background deletion thread. The background thread then deletes the blocks and frees the space. If there is a snapshot, the data blocks may be captured and not actually freed. After a crash, the recovery code looks for files that were being deleted and restarts the deletion if necessary. -Wayne gpfsug-discuss-bounces at spectrumscale.org wrote on 02/18/2020 06:05:41 AM: > From: TURNER Aaron > To: gpfsug main discussion list > Date: 02/18/2020 06:05 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Odd behaviour with regards > to reported free space > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Dear Jonathan, > > This is what I had assumed was the case. Since the system ended up > with an enforced reboot before we had time for further investigation > I wasn't able to confirm this. > > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > Yes, that's very likely to have been the case. > > Regards > > Aaron Turner > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: 18 February 2020 10:50 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Odd behaviour with regards to reportedfree space > > On Tue, 2020-02-18 at 09:28 +0000, TURNER Aaron wrote: > > Dear All, > > > > This has happened more than once with both 4.2.3 and 5.0. The > > instances may not be related. > > > > In the first instance, usage was high (over 90%) and so users were > > encouraged to delete files. One user deleted a considerable number of > > files equal to around 10% of the total storage. Reported usage did not > > fall. There were not obviously any waiters. Has anyone seen anything > > similar? > > > > I have seen similar behaviour a number of times. > > I my experience it is because a process somewhere has an open file > handle on one or more files/directories. So you can delete the file > and it goes from a directory listing; it's no long visible when you do ls. > > However the file has not actually gone, and will continue to count > towards total file system usage, user/group/fileset quota's etc. > > Once the errant process is found and killed magically the space becomes free. > > I can be very confusing for end users, especially when what is > holding onto the file is some random zombie process on another node > that died last month. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=GtPIT10cORUM6qwFnTVtIiDUFmESkxW3I0wu8GDxmgc&m=QkF9KAzl1dxqONkEkh7ZLNsDYktsFHJCkI2oGi6qyHk&s=_Z- > E_VtMDAiXmR8oSZym4G9OIzxRhcs5rJxMEjxK1RI&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Feb 19 15:24:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 19 Feb 2020 15:24:42 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) Message-ID: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Feb 19 18:49:51 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 19 Feb 2020 10:49:51 -0800 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> Message-ID: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. If you are willing to host a UG meeting later this year, please let us know. Best, Kristy > On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: > > All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. > > Thanks, > Kristy > > >> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose > wrote: >> >> Thanks for your responses to the poll. >> >> We?re still working on a venue, but working towards: >> >> March 30 - New User Day (Tuesday) >> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >> >> Once it?s confirmed we?ll post something again. >> >> Best, >> Kristy. >> >>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose > wrote: >>> >>> Thank you to the 18 wonderful people who filled out the survey. >>> >>> However, there are well more than 18 people at any given UG meeting. >>> >>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>> >>> Happy New Year. >>> >>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>> >>> Thanks, >>> Kristy >>> >>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose > wrote: >>>> >>>> Hello, >>>> >>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>> >>>> Best wishes to all in the new year. >>>> >>>> -Kristy >>>> >>>> >>>> Please give us 2 minutes of your time here: ?https://forms.gle/NFk5q4djJWvmDurW7 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Feb 19 19:31:36 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 19 Feb 2020 19:31:36 +0000 Subject: [gpfsug-discuss] CANCELLED - Re: NEED VENUE [WAS Re: UPDATE Planning US meeting for Spring 2020] In-Reply-To: <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> References: <42F45E03-0AEC-422C-B3A9-4B5A21B1D8DF@lbl.gov> <2AF72F65-CA94-438F-9924-72E833104E10@lbl.gov> <7BE1C75B-E40D-49DF-A21F-00A29653E02C@lbl.gov> Message-ID: I believe we could do it at Rutgers in either Newark or New Brunswick. I?m not sure if that meets most people?s definitions for NYC-area, but I do consider Newark to be. Both are fairly easily accessible by public transportation (and about as close to midtown as some uptown location choices anyway). We had planned to attend the 4/1-2 meeting. Not sure what?s involved to know whether keeping the 4/1-2 date is a viable option if we were able to host. We?d have to make sure we didn?t run afoul of any vendor-ethics guidelines. We recently hosted Ray Paden for a GPFS day, though. We had some trouble with remote participation, but that could be dealt with and I actually don?t think these meetings have that as an option anyway. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 19, 2020, at 1:49 PM, Kristy Kallback-Rose wrote: > > I?m sad to report we were unable to find a suitable venue for the spring meeting in the NYC area. Given the date is nearing, we will cancel this event. > > If you are willing to host a UG meeting later this year, please let us know. > > Best, > Kristy > >> On Feb 13, 2020, at 11:37 AM, Kristy Kallback-Rose wrote: >> >> All, we are struggling to get a venue for this event. Preference, based on the pol,l was NYC area. If you would be willing to host the event in that area, please get in touch. Dates we were looking at are below. >> >> Thanks, >> Kristy >> >> >>> On Jan 23, 2020, at 2:16 PM, Kristy Kallback-Rose wrote: >>> >>> Thanks for your responses to the poll. >>> >>> We?re still working on a venue, but working towards: >>> >>> March 30 - New User Day (Tuesday) >>> April 1&2 - Regular User Group Meeting (Wednesday & Thursday) >>> >>> Once it?s confirmed we?ll post something again. >>> >>> Best, >>> Kristy. >>> >>>> On Jan 6, 2020, at 3:41 PM, Kristy Kallback-Rose wrote: >>>> >>>> Thank you to the 18 wonderful people who filled out the survey. >>>> >>>> However, there are well more than 18 people at any given UG meeting. >>>> >>>> Please submit your responses today, I promise, it?s really short and even painless. 2020 (how did *that* happen?!) is here, we need to plan the next meeting >>>> >>>> Happy New Year. >>>> >>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>>> Thanks, >>>> Kristy >>>> >>>>> On Dec 16, 2019, at 11:05 AM, Kristy Kallback-Rose wrote: >>>>> >>>>> Hello, >>>>> >>>>> It?s time already to plan for the next US event. We have a quick, seriously, should take order of 2 minutes, survey to capture your thoughts on location and date. It would help us greatly if you can please fill it out. >>>>> >>>>> Best wishes to all in the new year. >>>>> >>>>> -Kristy >>>>> >>>>> >>>>> Please give us 2 minutes of your time here: https://forms.gle/NFk5q4djJWvmDurW7 >>>> >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Feb 19 19:58:59 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 19 Feb 2020 19:58:59 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: I?m extremely curious as to this answer as well. At one point a while back I started looking into this via the KMIP side with things, but ran out of time to continue. http://docs.oasis-open.org/kmip/testcases/v1.4/kmip-testcases-v1.4.html http://docs.oasis-open.org/kmip/testcases/v1.4/cnprd01/test-cases/kmip-v1.4/ Ed From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, February 19, 2020 10:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Feb 19 22:07:50 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 19 Feb 2020 22:07:50 +0000 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Wed Feb 19 23:34:37 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Wed, 19 Feb 2020 15:34:37 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? Thanks, Renata From YARD at il.ibm.com Thu Feb 20 06:46:17 2020 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 20 Feb 2020 08:46:17 +0200 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: <35F4F5EA-4D7F-4A94-B907-8B9732BF51D2@nuance.com> Message-ID: Hi Also in case that u configure 3 SKLM servers (1 Primary - 2 Slaves, in case the Primary is not responding you will see in the logs this messages: Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: "Felipe Knop" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 20/02/2020 00:08 Subject: [EXTERNAL] Re: [gpfsug-discuss] Encryption - checking key server health (SKLM) Sent by: gpfsug-discuss-bounces at spectrumscale.org Bob, Scale does not yet have a tool to perform a health-check on a key server, or an independent mechanism to retrieve keys. One can use a command such as 'mmkeyserv key show' to retrieve the list of keys from a given SKLM server (and use that to determine whether the key server is responsive), but being able to retrieve a list of keys does not necessarily mean being able to retrieve the actual keys, as the latter goes through the KMIP port/protocol, and the former uses the REST port/API: # mmkeyserv key show --server 192.168.105.146 --server-pwd /tmp/configKeyServ_pid11403914_keyServPass --tenant sklm3Tenant KEY-ad4f3a9-01397ebf-601b-41fb-89bf-6c4ac333290b KEY-ad4f3a9-019465da-edc8-49d4-b183-80ae89635cbc KEY-ad4f3a9-0509893d-cf2a-40d3-8f79-67a444ff14d5 KEY-ad4f3a9-08d514af-ebb2-4d72-aa5c-8df46fe4c282 KEY-ad4f3a9-0d3487cb-a674-44ab-a7d0-1f68e86e2fc9 [...] Having a tool that can retrieve keys independently from mmfsd would be useful capability to have. Could you submit an RFE to request such function? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Encryption - checking key server health (SKLM) Date: Wed, Feb 19, 2020 11:35 AM I?m looking for a way to check the status/health of the encryption key servers from the client side - detecting if the key server is unavailable or can?t serve a key. I ran into a situation recently where the server was answering HTTP requests on the port but wasn?t returning they key. I can?t seem to find a way to check if the server will actually return a key. Any ideas? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=ARpfta6x0GFP8yy67RAuT4SMBrRHROGRUwCOSPVDEF8&s=aMBH47I25734lVmyzTZBiPd6a1ELRuurxoFCTf6Ij_Y&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 11736 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 10:33:57 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 10:33:57 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: Message-ID: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Feb 20 10:41:17 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 10:41:17 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> Message-ID: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Well, if you were buying some form of extended Life Support for Scale, then you might also be expecting to buy extended life for RedHat. RHEL6 has extended life support until June 2024. Sure its an add on subscription cost, but some people might be prepared to do that over OS upgrades. Simon ?On 20/02/2020, 10:34, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6 and is unsupported. > > 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients? > 2. Is gpfs 5.0 unsupported for both rhel6 servers and clients? > Given RHEL6 expires in November anyway you would only be buying yourself a couple of months which seems pointless. You need to be moving away from both. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Thu Feb 20 11:23:52 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 11:23:52 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> On 20/02/2020 10:41, Simon Thompson wrote: > Well, if you were buying some form of extended Life Support for > Scale, then you might also be expecting to buy extended life for > RedHat. RHEL6 has extended life support until June 2024. Sure its an > add on subscription cost, but some people might be prepared to do > that over OS upgrades. I would recommend anyone going down that to route to take a *very* close look at what you get for the extended support. Not all of the OS is supported, with large chunks being moved to unsupported even if you pay for the extended support. Consequently extended support is not suitable for HPC usage in my view, so start planning the upgrade now. It's not like you haven't had 10 years notice. If your GPFS is just a storage thing serving out on protocol nodes, upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. It's a relatively easy invisible to the users upgrade. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From knop at us.ibm.com Thu Feb 20 13:27:47 2020 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 20 Feb 2020 13:27:47 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 14:17:58 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 14:17:58 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> To reiterate what?s been said on this thread, and to reaffirm the official IBM position: * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. EUS for Scale is no-charge, it is included as a standard part of your S&S. Regards, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_2106701756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From stockf at us.ibm.com Thu Feb 20 14:34:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 20 Feb 2020 14:34:49 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Feb 20 15:19:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 15:19:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220151909.7rbljupfl27whdtu@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 11:23:52AM +0000, Jonathan Buzzard wrote: > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. I agree, we're having increasing difficulty running CentOS 6, not because of the lack of support from IBM/RedHat, but because the software our customers want to run has started depending on OS features that simply don't exist in CentOS 6. In particular, modern gcc and glibc, and containers are all features that many of our customers are expecting that we provide. The newer kernel available in CentOS 7 (and now 8) supports large numbers of CPUs and large amounts of memory far better than the ancient CentOS 6 kernel as well. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From renata at slac.stanford.edu Thu Feb 20 15:58:08 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 07:58:08 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> References: <6E4C595D-C645-48F6-8577-938316764D61@us.ibm.com> Message-ID: Thanks very much for your response Carl, this is the information I was looking for. Renata On Thu, 20 Feb 2020, Carl Zetie - carlz at us.ibm.com wrote: >To reiterate what?s been said on this thread, and to reaffirm the official IBM position: > > > * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and we decided that we should support a version of Scale essentially as long as RHEL6 is supported. > * You can purchase Extended Support for both Scale 4.2 and RHEL6, but (as Jonathan said) you need to look closely at what you are getting from both sides. For Scale, do not expect any fixes after EOS (unless something like a truly critical security issue with no workaround arises). > * There is no possibility of IBM supporting Scale 5.0 on RHEL6. I want to make this as clear as I possibly can so that people can focus on feasible alternatives, rather than lose precious time asking for a change to this plan and waiting on a response that will absolutely, definitely be No. > > >I would like to add: In general, in the future the ?span? of the Scale/RHEL matrix is going to get tighter than it perhaps has been in the past. You should anticipate that broadly speaking, we?re not going to support Scale on out-of-support OS versions; and we?re not going to test out-of-support (or soon-to-be out-of-support) Scale on new OS versions. > >The impact of this will be mitigated by our introduction of EUS releases, starting with 5.0.5, which will allow you to stay on a Scale release across multiple OS releases; and the combination of Scale EUS and RHEL EUS will allow you to stay on a stable environment for a long time. > >EUS for Scale is no-charge, it is included as a standard part of your S&S. > > >Regards, > > > >Carl Zetie >Program Director >Offering Management >Spectrum Scale & Spectrum Discover >---- >(919) 473 3318 ][ Research Triangle Park >carlz at us.ibm.com > >[signature_2106701756] > > > From hpc.ken.tw25qn at gmail.com Thu Feb 20 16:29:40 2020 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Thu, 20 Feb 2020 16:29:40 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: Fred, It may be that some HPC users "have to" reverify the results of their computations as being exactly the same as a previous software stack and that is not a minor task. Any change may require this verification process..... Ken Atkjnson On Thu, 20 Feb 2020, 14:35 Frederick Stock, wrote: > This is a bit off the point of this discussion but it seemed like an > appropriate context for me to post this question. IMHO the state of > software is such that it is expected to change rather frequently, for > example the OS on your laptop/tablet/smartphone and your web browser. It > is correct to say those devices are not running an HPC or enterprise > environment but I mention them because I expect none of us would think of > running those devices on software that is a version far from the latest > available. With that as background I am curious to understand why folks > would continue to run systems on software like RHEL 6.x which is now two > major releases(and many years) behind the current version of that product? > Is it simply the effort required to upgrade 100s/1000s of nodes and the > disruption that causes, or are there other factors that make keeping > current with OS releases problematic? I do understand it is not just a > matter of upgrading the OS but all the software, like Spectrum Scale, that > runs atop that OS in your environment. While they all do not remain in > lock step I would think that in some window of time, say 12-18 months > after an OS release, all software in your environment would support a > new/recent OS release that would technically permit the system to be > upgraded. > > I should add that I think you want to be on or near the latest release of > any software with the presumption that newer versions should be an > improvement over older versions, albeit with the usual caveats of new > defects. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Feb 20 16:41:59 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 20 Feb 2020 16:41:59 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <50DD3E29-5CDC-4FCB-9080-F39DE4532761@us.ibm.com> Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69557 bytes Desc: image001.png URL: From renata at slac.stanford.edu Thu Feb 20 16:57:47 2020 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Thu, 20 Feb 2020 08:57:47 -0800 (PST) Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk>, <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk><07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> Message-ID: Hi Frederick, ours is a physics research lab with a mix of new eperiments and ongoing research. While some users embrace and desire the latest that tech has to offer and are actively writing code to take advantage of it, we also have users running older code on data from older experiments which depends on features of older OS releases and they are often not the ones who wrote the code. We have a mix of systems to accomodate both groups. Renata On Thu, 20 Feb 2020, Frederick Stock wrote: >This is a bit off the point of this discussion but it seemed like an appropriate context for me to post this question.? IMHO the state of software is such that >it is expected to change rather frequently, for example the OS on your laptop/tablet/smartphone and your web browser.? It is correct to say those devices are >not running an HPC or enterprise environment but I mention them because I expect none of us would think of running those devices on software that is a version >far from the latest available.? With that as background I am curious to understand why folks would continue to run systems on software like RHEL 6.x which is >now two major releases(and many years) behind the current version of that product?? Is it simply the effort required to upgrade 100s/1000s of nodes and the >disruption that causes, or are there other factors that make keeping current with OS releases problematic?? I do understand it is not just a matter of upgrading >the OS but all the software, like Spectrum Scale, that runs atop that OS in your environment.? While they all do not remain in lock step I would? think that in >some window of time, say 12-18 months after an OS release, all software in your environment would support a new/recent OS release that would technically permit >the system to be upgraded. >? >I should add that I think you want to be on or near the latest release of any software with the presumption that newer versions should be an improvement over >older versions, albeit with the usual caveats of new defects. > >Fred >__________________________________________________ >Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >stockf at us.ibm.com >? >? > ----- Original message ----- > From: Jonathan Buzzard > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS 5 and supported rhel OS > Date: Thu, Feb 20, 2020 6:24 AM > ? On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until June 2024. Sure its an > > add on subscription cost, but some people might be prepared to do > > that over OS upgrades. > > I would recommend anyone going down that to route to take a *very* close > look at what you get for the extended support. Not all of the OS is > supported, with large chunks being moved to unsupported even if you pay > for the extended support. > > Consequently extended support is not suitable for HPC usage in my view, > so start planning the upgrade now. It's not like you haven't had 10 > years notice. > > If your GPFS is just a storage thing serving out on protocol nodes, > upgrade one node at a time to RHEL7 and then repeat upgrading to GPFS 5. > It's a relatively easy invisible to the users upgrade. > > JAB. > > -- > Jonathan A. Buzzard ? ? ? ? ? ? ? ? ? ? ? ? Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > >? > > > From skylar2 at uw.edu Thu Feb 20 16:59:53 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 16:59:53 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> Message-ID: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > Fred, > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... > Ken Atkjnson We have this problem too, but at the same time the same people require us to run supported software and remove software versions with known vulnerabilities. The compromise we've worked out for the researchers is to have them track which software versions they used for a particular run/data release. The researchers who care more will have a validation suite that will (hopefully) call out problems as we do required upgrades. At some point, it's simply unrealistic to keep legacy systems around, though we do have a lab that needs a Solaris/SPARC system just to run a 15-year-old component of a pipeline for which they don't have source code... -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From malone12 at illinois.edu Thu Feb 20 17:00:46 2020 From: malone12 at illinois.edu (Maloney, J.D.) Date: Thu, 20 Feb 2020 17:00:46 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Message-ID: <2D960263-2CF3-4834-85CE-EB0F977169CB@illinois.edu> I assisted in a migration a couple years ago when we pushed teams to RHEL 7 and the science pipeline folks weren?t really concerned with the version of Scale we were using, but more what the new OS did to their code stack with the newer version of things like gcc and other libraries. They ended up re-running pipelines from prior data releases to compare the outputs of the pipelines to make sure they were within tolerance and matched prior results. Best, J.D. Maloney HPC Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) From: on behalf of "Carl Zetie - carlz at us.ibm.com" Reply-To: gpfsug main discussion list Date: Thursday, February 20, 2020 at 10:42 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson) Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process..... How deep does ?any change? go? Mod level? PTF? Efix? OS errata? Many of our enterprise customers also have validation requirements, although not as strict as typical HPC users e.g. they require some level of testing if they take a Mod but not a PTF. Mind you, with more HPC-like workloads showing up in the enterprise, that too might change? Thanks, Carl Zetie Program Director Offering Management Spectrum Scale & Spectrum Discover ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_510537050] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From david_johnson at brown.edu Thu Feb 20 17:14:40 2020 From: david_johnson at brown.edu (David Johnson) Date: Thu, 20 Feb 2020 12:14:40 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Instead of keeping whole legacy systems around, could they achieve the same with a container built from the legacy software? > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: >> Fred, >> It may be that some HPC users "have to" >> reverify the results of their computations as being exactly the same as a >> previous software stack and that is not a minor task. Any change may >> require this verification process..... >> Ken Atkjnson > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. The compromise we've worked out for the researchers is to > have them track which software versions they used for a particular run/data > release. The researchers who care more will have a validation suite that > will (hopefully) call out problems as we do required upgrades. > > At some point, it's simply unrealistic to keep legacy systems around, > though we do have a lab that needs a Solaris/SPARC system just to run a > 15-year-old component of a pipeline for which they don't have source code... > > -- > -- Skylar Thompson (skylar2 at u.washington.edu) > -- Genome Sciences Department, System Administrator > -- Foege Building S046, (206)-685-7354 > -- University of Washington School of Medicine > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From skylar2 at uw.edu Thu Feb 20 17:20:09 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 20 Feb 2020 17:20:09 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <345A32A9-7FA2-4B42-B863-54D73F076C99@brown.edu> Message-ID: <20200220172009.gtkek3nlohathrro@utumno.gs.washington.edu> On Thu, Feb 20, 2020 at 12:14:40PM -0500, David Johnson wrote: > Instead of keeping whole legacy systems around, could they achieve the same > with a container built from the legacy software? That is our hope, at least once we can get off CentOS 6 and run containers. :) Though containers aren't quite a panacea; there's still the issue of insecure software being baked into the container, but at least we can limit what the container can access more easily than running outside a container. > > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > > > On Thu, Feb 20, 2020 at 04:29:40PM +0000, Ken Atkinson wrote: > >> Fred, > >> It may be that some HPC users "have to" > >> reverify the results of their computations as being exactly the same as a > >> previous software stack and that is not a minor task. Any change may > >> require this verification process..... > >> Ken Atkjnson > > > > We have this problem too, but at the same time the same people require us > > to run supported software and remove software versions with known > > vulnerabilities. The compromise we've worked out for the researchers is to > > have them track which software versions they used for a particular run/data > > release. The researchers who care more will have a validation suite that > > will (hopefully) call out problems as we do required upgrades. > > > > At some point, it's simply unrealistic to keep legacy systems around, > > though we do have a lab that needs a Solaris/SPARC system just to run a > > 15-year-old component of a pipeline for which they don't have source code... > > > > -- > > -- Skylar Thompson (skylar2 at u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Thu Feb 20 19:45:02 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 19:45:02 +0000 Subject: [gpfsug-discuss] Unkillable snapshots Message-ID: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.raimbach at googlemail.com Thu Feb 20 19:46:53 2020 From: luke.raimbach at googlemail.com (Luke Raimbach) Date: Thu, 20 Feb 2020 19:46:53 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: Move the file system manager :) On Thu, 20 Feb 2020, 19:45 Simon Thompson, wrote: > Hi, > > > We have a snapshot which is stuck in the state "DeleteRequired". When > deleting, it goes through the motions but eventually gives up with: > > Unable to quiesce all nodes; some processes are busy or holding required > resources. > mmdelsnapshot: Command failed. Examine previous error messages to > determine cause. > > And in the mmfslog on the FS manager there are a bunch of retries and > "failure to quesce" on nodes. However in each retry its never the same set > of nodes. I suspect we have one HPC job somewhere killing us. > > > What's interesting is that we can delete other snapshots OK, it appears to > be one particular fileset. > > > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and > waiters around just tend to point to the FS manager node. > > > So ... any suggestions? I'm assuming its some workload holding a lock open > or some such, but tracking it down is proving elusive! > > > Generally the FS is also "lumpy" ... at times it feels like a wifi > connection on a train using a terminal, I guess its all related though. > > > Thanks > > > Simon > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 20:13:14 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 20:13:14 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> Message-ID: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Feb 20 20:29:44 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 15:29:44 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: References: Message-ID: <13747.1582230584@turing-police> On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > Having a tool that can retrieve keys independently from mmfsd would be useful > capability to have. Could you submit an RFE to request such function? Note that care needs to be taken to do this in a secure manner. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ulmer at ulmer.org Thu Feb 20 20:43:11 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 20 Feb 2020 15:43:11 -0500 Subject: [gpfsug-discuss] Encryption - checking key server health (SKLM) In-Reply-To: <13747.1582230584@turing-police> References: <13747.1582230584@turing-police> Message-ID: It seems like this belongs in mmhealth if it were to be bundled. If you need to use a third party tool, maybe fetch a particular key that is only used for fetching, so it?s compromise would represent no risk. -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Feb 20, 2020, at 3:11 PM, Valdis Kl?tnieks wrote: > > ?On Wed, 19 Feb 2020 22:07:50 +0000, "Felipe Knop" said: > >> Having a tool that can retrieve keys independently from mmfsd would be useful >> capability to have. Could you submit an RFE to request such function? > > Note that care needs to be taken to do this in a secure manner. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From truston at mbari.org Thu Feb 20 20:43:03 2020 From: truston at mbari.org (Todd Ruston) Date: Thu, 20 Feb 2020 12:43:03 -0800 Subject: [gpfsug-discuss] Policy REGEX question Message-ID: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Greetings, I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') The Scale policy documentation for REGEX only mentions 2 parameters, not 3: REGEX(String,'Pattern') Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). -- Todd E. Ruston Information Systems Manager Monterey Bay Aquarium Research Institute (MBARI) 7700 Sandholdt Road, Moss Landing, CA, 95039 Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 21:26:39 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 16:26:39 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk> Message-ID: Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=rIyEAXKyzwEj_pyM9DRQ1mL3x5gHjoqSpnhqxP6Oj-8&s=ZRXJm9u1_WLClH0Xua2PeIr-cWHj8YasvQCwndgdyns&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Feb 20 21:39:10 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 20 Feb 2020 21:39:10 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk> <93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, Message-ID: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfalk at us.ibm.com Thu Feb 20 22:13:56 2020 From: nfalk at us.ibm.com (Nathan Falk) Date: Thu, 20 Feb 2020 17:13:56 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> References: <29379a8105ad4e44b074dec275d4e8f2@bham.ac.uk><93bdde85530d41bebbe24b7530e70592@bham.ac.uk>, <7cca70d64a8b4dffa3f40884a218ebfb@bham.ac.uk> Message-ID: Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson To: gpfsug main discussion list Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p3ZFejMgr8nrtvkuBSxsXg&m=eGuD3K3Va_jMinEQHJN-FU1-fi2V-VpqWjHiTVUK-L8&s=fX3QMwGX7-yxSM4VSqPqBUbkT41ntfZFRZnalg9PZBI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:17:41 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:17:41 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: Looking at the example '*/xy_survey_*/name/*.tif': that's not a "real" (POSIX) regular expression but a use of a much simpler "wildcard pattern" as commonly used in the UNIX shell when matching filenames. So I would assume that the 'f' parameter just mandates that REGEX() must apply "filename matching" rules here instead of POSIX regular expressions. makes sense? -- Peter > On Feb 20, 2020, at 21:43, Todd Ruston wrote: > > Greetings, > > I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: > > REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') > > The Scale policy documentation for REGEX only mentions 2 parameters, not 3: > > REGEX(String,'Pattern') > Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. > > (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) > > Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). > > -- > Todd E. Ruston > Information Systems Manager > Monterey Bay Aquarium Research Institute (MBARI) > 7700 Sandholdt Road, Moss Landing, CA, 95039 > Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peserocka at gmail.com Thu Feb 20 22:25:35 2020 From: peserocka at gmail.com (Peter Serocka) Date: Thu, 20 Feb 2020 23:25:35 +0100 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> Message-ID: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Sorry, I believe you had nailed it already -- I didn't read carefully to the end. > On Feb 20, 2020, at 23:17, Peter Serocka wrote: > > Looking at the example '*/xy_survey_*/name/*.tif': > that's not a "real" (POSIX) regular expression but a use of > a much simpler "wildcard pattern" as commonly used in the UNIX shell > when matching filenames. > > So I would assume that the 'f' parameter just mandates that > REGEX() must apply "filename matching" rules here instead > of POSIX regular expressions. > > makes sense? > > -- Peter > > >> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >> >> Greetings, >> >> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >> >> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >> >> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >> >> REGEX(String,'Pattern') >> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >> >> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >> >> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >> >> -- >> Todd E. Ruston >> Information Systems Manager >> Monterey Bay Aquarium Research Institute (MBARI) >> 7700 Sandholdt Road, Moss Landing, CA, 95039 >> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Feb 20 22:28:43 2020 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 20 Feb 2020 17:28:43 -0500 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad > On Feb 20, 2020, at 5:14 PM, Nathan Falk wrote: > > ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. > > > > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > Hi Nate, > So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? > But yes, essentially running this by hand to clean up. > What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: > rds-projects-b which is mounted as /rds/projects/b > But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: > /rds/projects/g which is a different file-set. > What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? > Does that sound plausible? > Thanks > Simon > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Unkillable snapshots > > Hello Simon, > > Sadly, that "1036" is not a node ID, but just a counter. > > These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. > > Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. > > You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. > > It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. > > Thanks, > Nate Falk > IBM Spectrum Scale Level 2 Support > Software Defined Infrastructure, IBM Systems > > > > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hmm ... mmdiag --tokenmgr shows: > > > Server stats: requests 195417431 ServerSideRevokes 120140 > nTokens 2146923 nranges 4124507 > designated mnode appointed 55481 mnode thrashing detected 1036 > So how do I convert "1036" to a node? > Simon > > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] Unkillable snapshots > > Hi, > We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: > > > Unable to quiesce all nodes; some processes are busy or holding required resources. > mmdelsnapshot: Command failed. Examine previous error messages to determine cause. > And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. > What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. > My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. > So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! > Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. > Thanks > Simon > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Feb 20 23:38:15 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 20 Feb 2020 23:38:15 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: On 20/02/2020 16:59, Skylar Thompson wrote: [SNIP] > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. For us, it is a Scottish government mandate that all public funded bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days from a critical vulnerability till your patched. No if's no buts, just do it. So while where are not their yet (its a work in progress to make this as seamless as possible) frankly running unpatched systems for years on end because we are too busy/lazy to validate a new system is completely unacceptable. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Fri Feb 21 02:00:59 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 20 Feb 2020 21:00:59 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> Message-ID: <36675.1582250459@turing-police> On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: > For us, it is a Scottish government mandate that all public funded > bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days > from a critical vulnerability till your patched. No if's no buts, just > do it. Is that 10 days from vuln dislosure, or from patch availability? The latter can be a headache, especially if 24-48 hours pass between when the patch actually hits the streets and you get the e-mail, or if you have other legal mandates that patches be tested before production deployment. The former is simply unworkable - you *might* be able to deploy mitigations or other work-arounds, but if it's something complicated that requires a lot of re-work of code, you may be waiting a lot more than 10 days for a patch.... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From Paul.Sanchez at deshaw.com Fri Feb 21 02:05:12 2020 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 21 Feb 2020 02:05:12 +0000 Subject: [gpfsug-discuss] Unkillable snapshots In-Reply-To: References: Message-ID: <9ca16f7634354e4db8bed681a306b714@deshaw.com> Another possibility is to try increasing the timeouts. We used to have problems with this all of the time on clusters with thousands of nodes, but now we run with the following settings increased from their [defaults]? sqtBusyThreadTimeout [10] = 120 sqtCommandRetryDelay [60] = 120 sqtCommandTimeout [300] = 500 These are in the category of undocumented configurables, so you may wish to accompany this with a PMR. And you?ll need to know the secret handshake that follows this? mmchconfig: Attention: Unknown attribute specified: sqtBusyThreadTimeout. Press the ENTER key to continue. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Sven Oehme Sent: Thursday, February 20, 2020 17:29 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots This message was sent by an external party. Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn?t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more complex than that, but good enough to make the point) - assume you have 100 nodes, each has 16 GB pagepool and your storage system can write data out at 10 GB/sec, it will take 160 seconds to flush all data data (assuming you did normal buffered I/O. If i remember correct (talking out of memory here) the default timeout is 60 seconds, given that you can?t write that fast it will always timeout under this scenario. There is one case where this can also happen which is a client is connected badly (flaky network or slow connection) and even your storage system is fast enough the node is too slow that it can?t de-stage within that time while everybody else can and the storage is not the bottleneck. Other than that only solutions are to a) buy faster storage or b) reduce pagepool and maxfilestocache which will reduce overall performance of the system. Sven Sent from my iPad On Feb 20, 2020, at 5:14 PM, Nathan Falk > wrote: ?Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 04:39 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up ? But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of the "suspect" nodes. But if I strace it, its hanging on a process which is using a different fileset. For example, the file-set we can't delete is: rds-projects-b which is mounted as /rds/projects/b But on some suspect nodes, strace lsof /rds, that hangs at a process which has open files in: /rds/projects/g which is a different file-set. What I'm wondering if its these hanging processes in the "g" fileset which is killing us rather than something in the "b" fileset. Looking at the "g" processes, they look like a weather model and look to be dumping a lot of files in a shared directory, so I wonder if the mmfsd process is busy servicing that and so whilst its not got "b" locks, its just too slow to respond? Does that sound plausible? Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of nfalk at us.ibm.com > Sent: 20 February 2020 21:26:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unkillable snapshots Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the GUI and is a command you are running, you could try some heavy-handed data collection. You suspect a particular fileset already, so maybe have a 'mmdsh -N all lsof /path/to/fileset' ready to go in one window, and the 'mmdelsnapshot' ready to go in another window? When the mmdelsnapshot times out, you can find the nodes it was waiting on in the file system manager mmfs.log.latest and see what matches up with the open files identified by lsof. It sounds like you already know this, but the type of internal node names in the log messages can be translated with 'mmfsadm dump tscomm' or also plain old 'mmdiag --network'. Thanks, Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM Systems From: Simon Thompson > To: gpfsug main discussion list > Date: 02/20/2020 03:14 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] Unkillable snapshots Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Simon Thompson > Sent: 20 February 2020 19:45:02 To: gpfsug main discussion list Subject: [gpfsug-discuss] Unkillable snapshots Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to determine cause. And in the mmfslog on the FS manager there are a bunch of retries and "failure to quesce" on nodes. However in each retry its never the same set of nodes. I suspect we have one HPC job somewhere killing us. What's interesting is that we can delete other snapshots OK, it appears to be one particular fileset. My old goto "mmfsadm dump tscomm" isn't showing any particular node, and waiters around just tend to point to the FS manager node. So ... any suggestions? I'm assuming its some workload holding a lock open or some such, but tracking it down is proving elusive! Generally the FS is also "lumpy" ... at times it feels like a wifi connection on a train using a terminal, I guess its all related though. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Feb 21 11:04:32 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 21 Feb 2020 11:04:32 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <36675.1582250459@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> Message-ID: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> On 21/02/2020 02:00, Valdis Kl?tnieks wrote: > On Thu, 20 Feb 2020 23:38:15 +0000, Jonathan Buzzard said: >> For us, it is a Scottish government mandate that all public funded >> bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days >> from a critical vulnerability till your patched. No if's no buts, just >> do it. > > Is that 10 days from vuln dislosure, or from patch availability? > Patch availability. Basically it's a response to the issue a couple of years ago now where large parts of the NHS in Scotland had serious problems due to some Windows vulnerability for which a patch had been available for some months. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From andi at christiansen.xxx Fri Feb 21 13:07:01 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Fri, 21 Feb 2020 14:07:01 +0100 (CET) Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Message-ID: <270013029.95562.1582290421465@privateemail.com> An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Feb 21 14:14:49 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 21 Feb 2020 15:14:49 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES Message-ID: Dear all, I was wondering if anybody recently encountered a similar issue (I found a related thread from 2018, but it was inconclusive). I just found that one of our production CES nodes have 28k CLOSE_WAIT tcp6 connections, I do not understand why... the second node in the same cluster does not have this issue. Both are: - GPFS 5.0.4.2 - RHEL 7.4 has anybody else encountered anything similar? In the last few days it seems it happened once on one node, and twice on the other, but never on both... Thanks for any feedback! cheers leo -- Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truston at mbari.org Fri Feb 21 16:15:54 2020 From: truston at mbari.org (Todd Ruston) Date: Fri, 21 Feb 2020 08:15:54 -0800 Subject: [gpfsug-discuss] Policy REGEX question In-Reply-To: <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> References: <3D2C4651-AD7D-40FD-A1B0-1B22D501B0F3@mbari.org> <8B31D830-F3BC-436E-89C0-811609B02289@gmail.com> Message-ID: <9E104C63-9C6D-4E46-BEFF-AEF7E1AF8EC9@mbari.org> Thanks Peter, and no worries; great minds think alike. ;-) - Todd > On Feb 20, 2020, at 2:25 PM, Peter Serocka wrote: > > Sorry, I believe you had nailed it already -- I didn't > read carefully to the end. > >> On Feb 20, 2020, at 23:17, Peter Serocka > wrote: >> >> Looking at the example '*/xy_survey_*/name/*.tif': >> that's not a "real" (POSIX) regular expression but a use of >> a much simpler "wildcard pattern" as commonly used in the UNIX shell >> when matching filenames. >> >> So I would assume that the 'f' parameter just mandates that >> REGEX() must apply "filename matching" rules here instead >> of POSIX regular expressions. >> >> makes sense? >> >> -- Peter >> >> >>> On Feb 20, 2020, at 21:43, Todd Ruston > wrote: >>> >>> Greetings, >>> >>> I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an undocumented parameter. For example, the following REGEX expression was created in the WHERE clause by mmfind when searching for a pathname pattern: >>> >>> REGEX(PATH_NAME, '*/xy_survey_*/name/*.tif','f') >>> >>> The Scale policy documentation for REGEX only mentions 2 parameters, not 3: >>> >>> REGEX(String,'Pattern') >>> Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular expression. >>> >>> (The above is from https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adv_stringfcts.htm ) >>> >>> Anyone know what that 3rd parameter is, what values are allowed there, and what they mean? My assumption is that it's some sort of selector for type of pattern matching engine, because that pattern (2nd parameter) isn't being handled as a standard regex (e.g. the *'s are treated as wildcards, not zero-or-more repeats). >>> >>> -- >>> Todd E. Ruston >>> Information Systems Manager >>> Monterey Bay Aquarium Research Institute (MBARI) >>> 7700 Sandholdt Road, Moss Landing, CA, 95039 >>> Phone 831-775-1997 Fax 831-775-1652 http://www.mbari.org >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gterryc at vmsupport.com Fri Feb 21 17:18:11 2020 From: gterryc at vmsupport.com (George Terry) Date: Fri, 21 Feb 2020 11:18:11 -0600 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 Message-ID: Hello, I've a question about upgrade of GPFS 3.5. We have an infrastructure with GSPF 3.5.0.33 and we need upgrade to Spectrum Scale 5.0.3. Can we upgrade from 3.5 to 4.1, 4.2 and 5.0.3 or can we do something additional like unistall GPFS 3.5 and install Spectrum Scale 5.0.3? Thank you George -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Fri Feb 21 17:25:12 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Fri, 21 Feb 2020 19:25:12 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: <270013029.95562.1582290421465@privateemail.com> References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=XKMIdSqQ76jf_FrIRFtAhMsgU-MkPFhxBJjte8AdeYs&s=vih7W_XcatoqN_MhS3gEK9RR6RxpNrfB2UvvQeXqyH8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Fri Feb 21 18:50:49 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 21 Feb 2020 18:50:49 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Feb 21 21:15:28 2020 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 21 Feb 2020 21:15:28 +0000 Subject: [gpfsug-discuss] Upgrade GPFS 3.5 to Spectrum Scale 5.0.3 In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Fri Feb 21 23:32:13 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Sat, 22 Feb 2020 00:32:13 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: <270013029.95562.1582290421465@privateemail.com> Message-ID: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Sat Feb 22 00:08:19 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sat, 22 Feb 2020 00:08:19 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > ?Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > >> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >> >> Hi, >> >> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >> and >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >> >> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >> >> >> >> Regards, >> >> Tomer Perry >> Scalable I/O Development (Spectrum Scale) >> email: tomp at il.ibm.com >> 1 Azrieli Center, Tel Aviv 67021, Israel >> Global Tel: +1 720 3422758 >> Israel Tel: +972 3 9188625 >> Mobile: +972 52 2554625 >> >> >> >> >> From: Andi Christiansen >> To: "gpfsug-discuss at spectrumscale.org" >> Date: 21/02/2020 15:25 >> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi all, >> >> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >> >> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >> >> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >> >> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >> >> Best Regards >> Andi Christiansen _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Sat Feb 22 05:55:54 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 22 Feb 2020 05:55:54 +0000 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: Message-ID: Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > ?Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > >>> On 22 Feb 2020, at 09:32, Andi Christiansen wrote: >>> >> ?Hi, >> >> Thanks for answering! >> >> Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. >> >> I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. >> >> Best Regards >> Andi Christiansen >> >> >> >> Sendt fra min iPhone >> >>> Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : >>> >>> Hi, >>> >>> I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. >>> So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. >>> After that, you can start looking into "how can I get multiple streams?" - for that there are two options: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm >>> and >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm >>> >>> The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. >>> >>> >>> >>> Regards, >>> >>> Tomer Perry >>> Scalable I/O Development (Spectrum Scale) >>> email: tomp at il.ibm.com >>> 1 Azrieli Center, Tel Aviv 67021, Israel >>> Global Tel: +1 720 3422758 >>> Israel Tel: +972 3 9188625 >>> Mobile: +972 52 2554625 >>> >>> >>> >>> >>> From: Andi Christiansen >>> To: "gpfsug-discuss at spectrumscale.org" >>> Date: 21/02/2020 15:25 >>> Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi all, >>> >>> i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. >>> >>> We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. >>> >>> On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? >>> >>> We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). >>> >>> Best Regards >>> Andi Christiansen _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Sat Feb 22 09:35:32 2020 From: TOMP at il.ibm.com (Tomer Perry) Date: Sat, 22 Feb 2020 11:35:32 +0200 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi, Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning . I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Luis Bolinches" To: "gpfsug main discussion list" Cc: Jake Carrol Date: 22/02/2020 07:56 Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. -- Cheers On 22. Feb 2020, at 3.08, Andrew Beattie wrote: Andi, You may want to reach out to Jake Carrol at the University of Queensland, When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. >From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. Regards Andrew Sent from my iPhone On 22 Feb 2020, at 09:32, Andi Christiansen wrote: Hi, Thanks for answering! Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. Best Regards Andi Christiansen Sendt fra min iPhone Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : Hi, I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm - and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. After that, you can start looking into "how can I get multiple streams?" - for that there are two options: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm and https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 21/02/2020 15:25 Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=vPbqr3ME98a_M4VrB5IPihvzTzG8CQUAuI0eR-kqXcs&s=kIM8S1pVtYFsFxXT3gGQ0DmcwRGBWS9IqtoYTtcahM8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sun Feb 23 04:43:37 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sat, 22 Feb 2020 23:43:37 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> Message-ID: <208376.1582433017@turing-police> On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > > Is that 10 days from vuln dislosure, or from patch availability? > > > > Patch availability. Basically it's a response to the issue a couple of That's not *quite* so bad. As long as you trust *all* your vendors to notify you when they release a patch for an issue you hadn't heard about. (And that no e-mail servers along the way don't file it under 'spam') -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Sun Feb 23 12:20:48 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sun, 23 Feb 2020 12:20:48 +0000 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <208376.1582433017@turing-police> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> Message-ID: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> On 23/02/2020 04:43, Valdis Kl?tnieks wrote: > On Fri, 21 Feb 2020 11:04:32 +0000, Jonathan Buzzard said: > >>> Is that 10 days from vuln dislosure, or from patch availability? >>> >> >> Patch availability. Basically it's a response to the issue a couple of > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of any systems administrator to know when any critical patch becomes available for any software/hardware that they are using. To not be actively monitoring it is IMHO a dereliction of duty, worthy of a verbal and then written warning. I also feel that the old practice of leaving HPC systems unpatched for years on end is no longer acceptable. From a personal perspective I have in now over 20 years never had a system that I have been responsible for knowingly compromised. I would like it to stay that way because I have no desire to be explaining to higher ups why the HPC facility was hacked. The fact that the Scottish government have mandated I apply patches just makes my life easier because any push back from the users is killed dead instantly; I have too, go moan at your elective representative if you want it changed. In the meantime suck it up :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valdis.kletnieks at vt.edu Sun Feb 23 21:58:03 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sun, 23 Feb 2020 16:58:03 -0500 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> References: <9cbb0028-a162-8c33-b603-f37f1dee2c51@strath.ac.uk> <07A885E4-7CA1-4D2E-A68C-018CE2E1058C@bham.ac.uk> <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> Message-ID: <272151.1582495083@turing-police> On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > you when they release a patch for an issue you hadn't heard about. > Er, what do you think I am paid for? Specifically it is IMHO the job of > any systems administrator to know when any critical patch becomes > available for any software/hardware that they are using. You missed the point. Unless you spend your time constantly e-mailing *all* of your vendors "Are there new patches I don't know about?", you're relying on them to notify you when there's a known issue, and when a patch comes out. Redhat is good about notification. IBM is. But how about things like your Infiniband stack? OFED? The firmware in all your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you get notified about security issues in the Management Engine stuff (and there's been plenty of them). Do *all* of those vendors have security lists? Are you subscribed to *all* of them? Do *all* of them actually post to those lists? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From andi at christiansen.xxx Mon Feb 24 22:31:45 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Mon, 24 Feb 2020 23:31:45 +0100 Subject: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? In-Reply-To: References: Message-ID: Hi all, Thank you for all your suggestions! The latency is 30ms between the sites (1600km to be exact). So if I have entered correctly in the calculator 1Gb is actually what is expected on that distance. I had a meeting today with IBM where we were able to push that from the 1Gb to about 4Gb on one link with minimal tuning, more tuning will come the next few days! We are also looking to implement the feature afmParallelMounts which should give us the full bandwidth we have between the sites :-) Thanks! Best Regards Andi Christiansen Sendt fra min iPhone > Den 22. feb. 2020 kl. 10.35 skrev Tomer Perry : > > Hi, > > Its implied in the tcp tuning suggestions ( as one needs bandwidth and latency in order to calculate the BDP). > The overall theory is documented in multiple places (tcp window, congestion control etc.) - nice place to start is https://en.wikipedia.org/wiki/TCP_tuning. > I tend to use this calculator in order to find out the right values https://www.switch.ch/network/tools/tcp_throughput/ > > The parallel IO and multiple mounts are on top of the above - not instead ( even though it could be seen that it makes things better - but multiple of the small numbers we're getting initially). > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Luis Bolinches" > To: "gpfsug main discussion list" > Cc: Jake Carrol > Date: 22/02/2020 07:56 > Subject: [EXTERNAL] Re: [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > While I agree with what es already mention here and it is really spot on, I think Andi missed to reveal what is the latency between sites. Latency is as key if not more than ur pipe link speed to throughput results. > > -- > Cheers > > On 22. Feb 2020, at 3.08, Andrew Beattie wrote: > > Andi, > > You may want to reach out to Jake Carrol at the University of Queensland, > > When UQ first started exploring with AFM, and global AFM transfers they did extensive testing around tuning for the NFS stack. > > From memory they got to a point where they could pretty much saturate a 10GBit link, but they had to do a lot of tuning to get there. > > We are now effectively repeating the process, with AFM but using 100GB links, which brings about its own sets of interesting challenges. > > > > > > Regards > > Andrew > > Sent from my iPhone > > On 22 Feb 2020, at 09:32, Andi Christiansen wrote: > > Hi, > > Thanks for answering! > > Yes possible, I?m not too much into NFS and AFM so I might have used the wrong term.. > > I looked at what you suggested (very interesting reading) and setup multiple cache gateways to our home nfs server with the new afmParallelMount feature. It was as I suspected, for each gateway that does a write it gets 50-60MB/s bandwidth so although this utilizes more when adding it up (4 x gateways = 4 x 50-60MB/s) I?m still confused to why one server with one link cannot utilize more than the 50-60MB/s on 10Gb links ? Even 200-240MB/s is much slower than a regular 10Gbit interface. > > Best Regards > Andi Christiansen > > > > Sendt fra min iPhone > > Den 21. feb. 2020 kl. 18.25 skrev Tomer Perry : > > Hi, > > I believe the right term is not multithreaded, but rather multistream. NFS will submit multiple requests in parallel, but without using large enough window you won't be able to get much of each stream. > So, the first place to look is here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_tuningbothnfsclientnfsserver.htm- and while its talking about "Kernel NFS" the same apply to any TCP socket based communication ( including Ganesha). I tend to test the performance using iperf/nsdperf ( just make sure to use single stream) in order to see what is the expected maximum performance. > After that, you can start looking into "how can I get multiple streams?" - for that there are two options: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm > and > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm > > The former enhance large file transfer, while the latter ( new in 5.0.4) will help with multiple small files as well. > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 21/02/2020 15:25 > Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale Ganesha NFS multi threaded AFM? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi all, > > i have searched the internet for a good time now with no answer to this.. So i hope someone can tell me if this is possible or not. > > We use NFS from our Cluster1 to a AFM enabled fileset on Cluster2. That is working as intended. But when AFM transfers files from one site to another it caps out at about 5-700Mbit/s which isnt impressive.. The sites are connected on 10Gbit links but the distance/round-trip is too far/high to use the NSD protocol with AFM. > > On the cluster where the fileset is exported we can only see 1 session against the client cluster, is there a way to either tune Ganesha or AFM to use more threads/sessions? > > We have about 7.7Gbit bandwidth between the sites from the 10Gbit links and with multiple NFS sessions we can reach the maximum bandwidth(each using about 50-60MBits per session). > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Mon Feb 24 23:58:15 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Mon, 24 Feb 2020 15:58:15 -0800 Subject: [gpfsug-discuss] GPFS 5 and supported rhel OS In-Reply-To: <272151.1582495083@turing-police> References: <5419064c-72ca-eb55-3e29-7ccea4f42928@strath.ac.uk> <20200220165953.mz7yanfi2py6mvd7@utumno.gs.washington.edu> <36675.1582250459@turing-police> <34fc11d1-a241-4ac5-36f9-6d3eeeee58cf@strath.ac.uk> <208376.1582433017@turing-police> <6a970c16-ac34-8c1d-edaa-96a3befaa304@strath.ac.uk> <272151.1582495083@turing-police> Message-ID: <20200224235815.mjecsge35rqseoq5@hithlum> On Sun, Feb 23, 2020 at 04:58:03PM -0500, Valdis Kl?tnieks wrote: > On Sun, 23 Feb 2020 12:20:48 +0000, Jonathan Buzzard said: > > > > That's not *quite* so bad. As long as you trust *all* your vendors to notify > > > you when they release a patch for an issue you hadn't heard about. > > > Er, what do you think I am paid for? Specifically it is IMHO the job of > > any systems administrator to know when any critical patch becomes > > available for any software/hardware that they are using. > > You missed the point. > > Unless you spend your time constantly e-mailing *all* of your vendors > "Are there new patches I don't know about?", you're relying on them to > notify you when there's a known issue, and when a patch comes out. > > Redhat is good about notification. IBM is. > > But how about things like your Infiniband stack? OFED? The firmware in all > your devices? The BIOS/UEFI on the servers? If you're an Intel shop, how do you > get notified about security issues in the Management Engine stuff (and there's > been plenty of them). Do *all* of those vendors have security lists? Are you > subscribed to *all* of them? Do *all* of them actually post to those lists? We put our notification sources (Nessus, US-CERT, etc.) into our response plan. Of course it's still a problem if we don't get notified, but part of the plan is to make it clear where we're willing to accept risk, and to limit our own liability. No process is going to be perfect, but we at least know and accept where those imperfections are. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From stockf at us.ibm.com Tue Feb 25 14:01:20 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 25 Feb 2020 14:01:20 +0000 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Tue Feb 25 20:32:10 2020 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 25 Feb 2020 21:32:10 +0100 Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT IPV6 connections on CES In-Reply-To: References: Message-ID: Hi Frederick, thanks for the answer! Unfortunately it seems not the case :( [root at xbl-ces-4 ~]# netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l 0 In our case, Zimon does not directly interact with Grafana over the bridge, but we have a small python script that (through Telegraf) polls the collector and ingest data into InfluxDB, which acts as data source for Grafana. An example of the opened port is: tcp6?????? 1????? 0 129.129.95.84:40038 129.129.99.247:39707??? CLOSE_WAIT? 39131/gpfs.ganesha. We opened a PMR to check what's happening, let's see :) But possibly first thing to do is to disable IPv6 cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/036 Forschungstrasse 111 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 25.02.20 15:01, Frederick Stock wrote: > netstat -ntp | grep "\:9094 .*CLOSE_WAIT" | wc -l -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 12:58:40 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 13:58:40 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? Message-ID: <313052288.162314.1582721920742@privateemail.com> An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Feb 26 13:04:52 2020 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 26 Feb 2020 13:04:52 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> Message-ID: Why don?t you look at packaging your small files into larger files which will be handled more effectively. There is no simple way to replicate / move billions of small files, But surely you can build your work flow to package the files up into a zip or tar format which will simplify not only the number of IO transactions but also make the whole process more palatable to the NFS protocol Sent from my iPhone > On 26 Feb 2020, at 22:58, Andi Christiansen wrote: > > ? > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=BDsYqP0is2zoDGYU5Ej1lSJ4s9DJhMsW40equi5dqCs&s=22KcLJbUqsq3nfr3qWnxDqA3kuHnFxSDeiENVUITmdA&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Feb 26 13:27:32 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 26 Feb 2020 13:27:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <313052288.162314.1582721920742@privateemail.com> References: <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 13:33:51 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 13:33:51 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:18 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:18 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <688463139.162864.1582724298905@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:38:59 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:38:59 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> Message-ID: <673673077.162875.1582724339498@privateemail.com> An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Feb 26 13:39:22 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 26 Feb 2020 14:39:22 +0100 (CET) Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: , <313052288.162314.1582721920742@privateemail.com> Message-ID: <262580944.162883.1582724362722@privateemail.com> An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Feb 26 14:24:32 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 26 Feb 2020 14:24:32 +0000 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: <262580944.162883.1582724362722@privateemail.com> References: <262580944.162883.1582724362722@privateemail.com>, , <313052288.162314.1582721920742@privateemail.com> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Feb 26 15:49:45 2020 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 26 Feb 2020 08:49:45 -0700 Subject: [gpfsug-discuss] AFM Alternative? In-Reply-To: References: <313052288.162314.1582721920742@privateemail.com> <262580944.162883.1582724362722@privateemail.com> Message-ID: if you are looking for a commercial supported solution, our Dataflow product is purpose build for this kind of task. a presentation that covers some high level aspects of it was given by me last year at one of the spectrum scale meetings in the UK --> https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSUG19UK-Day-1-05-DDN-Optimizing-storage-stacks-for-AI.pdf. its at the end of the deck. if you want more infos, please let me know and i can get you in contact with the right person. Sven On Wed, Feb 26, 2020 at 7:24 AM Frederick Stock wrote: > > What sources are you using to help you with configuring AFM? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Andi Christiansen > To: Frederick Stock , gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] RE: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:39 AM > > 5.0.4-2.1 (home and cache) > > On February 26, 2020 2:33 PM Frederick Stock wrote: > > > Andi, what version of Spectrum Scale do you have installed? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "Olaf Weiser" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: andi at christiansen.xxx, gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 8:27 AM > > you may consider WatchFolder ... (cluster wider inotify --> kafka) .. and then you go from there > > > > ----- Original message ----- > From: Andi Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] AFM Alternative? > Date: Wed, Feb 26, 2020 1:59 PM > > Hi all, > > Does anyone know of an alternative to AFM ? > > We have been working on tuning AFM for a few weeks now and see little to no improvement.. And now we are searching for an alternative.. So if anyone knows of a product that can implement with Spectrum Scale i am open to any suggestions :) > > We have a good mix of files but primarily billions of very small files which AFM does not handle well on long distances. > > > Best Regards > A. Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chris.schlipalius at pawsey.org.au Thu Feb 27 00:23:56 2020 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Thu, 27 Feb 2020 08:23:56 +0800 Subject: [gpfsug-discuss] AFM Alternative? Aspera? Message-ID: Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? From vpuvvada at in.ibm.com Fri Feb 28 05:22:56 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 28 Feb 2020 10:52:56 +0530 Subject: [gpfsug-discuss] AFM Alternative? Aspera? In-Reply-To: References: Message-ID: Transferring the small files with AFM + NFS over high latency networks is always a challenge. For example, for each small file replication AFM performs a lookup, create, write and set mtime operation. If the latency is 10ms, replication of each file takes minimum (10 * 4 = 40 ms) amount of time. AFM is not a network acceleration tool and also it does not use compression. If the file sizes are big, AFM parallel IO and parallel mounts feature can be used. Aspera can be used to transfer the small files over high latency network with better utilization of the network bandwidth. https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_afmparalleldatatransferwithremotemounts.htm https://www.ibm.com/support/knowledgecenter/no/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1ins_paralleldatatransfersafm.htm ~Venkat (vpuvvada at in.ibm.com) From: Chris Schlipalius To: Date: 02/27/2020 05:54 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] AFM Alternative? Aspera? Sent by: gpfsug-discuss-bounces at spectrumscale.org Maybe the following would assist? I do think tarring up files first is best, but you could always check out: http://www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spectrumscaleug.org_wp-2Dcontent_uploads_2019_05_SSSD19DE-2DDay-2D2-2DB02-2DIntegration-2Dof-2DSpectrum-2DScale-2Dand-2DAspera-2DSync.pdf&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=tRCxd4SimJH_eycqekhzM0Qp3TB3NtaIYWBvyQnrIiM&e= Aspera sync integration (non html links added for your use ? how they don?t get scrubbed: www.spectrumscaleug.org/wp-content/uploads/2019/05/SSSD19DE-Day-2-B02-Integration-of-Spectrum-Scale-and-Aspera-Sync.pdf www.redbooks.ibm.com/redpapers/pdfs/redp5527.pdf ) Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au < https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pawsey.org.au_&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=Xkm8VFy3l6nyD40yhONihsKcqmwRhy4SZyd0lwHf1GA&e= > On 26/2/20, 9:39 pm, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Re: AFM Alternative? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=1pVcjKeZ7gCaDtLoJFbfKCETe1XOmol6d2ryoccqC1A&s=mYK1ZsVgtsM6HntRMLPS49tKvEhhgGAdWF2qniyn9Ko&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Fri Feb 28 08:55:06 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 28 Feb 2020 08:55:06 +0000 Subject: [gpfsug-discuss] SSUG Events 2020 update Message-ID: <780D9B15-E329-45B7-B62E-1F880512CE7E@spectrumscale.org> Hi All, I thought it might be giving a little bit of an update on where we are with events this year. As you may know, SCAsia was cancelled in its entirety due to Covid-19 in Singapore and so there was no SSUG meeting. In the US, we struggled to find a venue to host the spring meeting and now time is a little short to arrange something for the end of March planned date. The IBM Spectrum Scale Strategy Days in Germany in March are currently still planned to happen next week. For the UK meeting (May), we haven?t yet opened registration but are planning to do so next week. We currently believe that as an event with 120-130 attendees, this is probably very low risk, but we?ll keep the current government advice under review as we approach the date. I would suggest that if you are planning to travel internationally to the UK event that you delay booking flights/book refundable transport and ensure you have adequate insurance in place in the event we have to cancel the event. For ISC in June, we currently don?t have a date, nor any firm plans to run an event this year. Simon Thompson UK group chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 15:12:31 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 10:12:31 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers Message-ID: Hello Everyone, I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. One of our users run a deep learning application on GPUs, that does the following IO pattern: It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. I observed that the performance varies widely depending on what maxblocksize parameter I use. For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. We have different file system block sizes in our environment ranging from 512K, 1M and 16M. We also use storage clusters and compute clusters design. Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. I am thinking if there is anything else that can do the same job as maxblocksize parameter. I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? Thank you, Lohit From anobre at br.ibm.com Fri Feb 28 17:58:22 2020 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 28 Feb 2020 17:58:22 +0000 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Fri Feb 28 21:53:25 2020 From: valleru at cbio.mskcc.org (Valleru, Lohit/Information Systems) Date: Fri, 28 Feb 2020 16:53:25 -0500 Subject: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers In-Reply-To: References: Message-ID: <2B1F9901-0712-44EB-9D0A-8B40F7BE58EA@cbio.mskcc.org> Hello Anderson, This application requires minimum throughput of about 10-13MB/s initially and almost no IOPS during first phase where it opens all the files and reads the headers and about 30MB/s throughput during the second phase. The issue that I face is during the second phase where it tries to randomly read about 4K of block size from random files from 20000 to about 100000. In this phase - I see a big difference in maxblocksize parameter changing the performance of the reads, with almost no throughput and may be around 2-4K IOPS. This issue is a follow up to the previous issue that I had mentioned about an year ago - where I see differences in performance - ?though there is practically no IO to the storage? I mean - I see a difference in performance between different FS block-sizes even if all data is cached in pagepool. Sven had replied to that thread mentioning that it could be because of buffer locking issue. The info requested is as below: 4 Storage clusters: Storage cluster for compute: 5.0.3-2 GPFS version FS version: 19.01 (5.0.1.0) Subblock size: 16384 Blocksize : 16M Flash Storage Cluster for compute: 5.0.4-2 GPFS version FS version: 18.00 (5.0.0.0) Subblock size: 8192 Blocksize: 512K Storage cluster for admin tools: 5.0.4-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 131072 Blocksize: 4M Storage cluster for archival: 5.0.3-2 GPFS version FS version: 16.00 (4.2.2.0) Subblock size: 32K Blocksize: 1M The only two clusters that users do/will do compute on is the 16M filesystem and the 512K Filesystem. When you ask what is the throughput/IOPS and block size - it varies a lot and has not been recorded. The 16M FS is capable of doing about 27GB/s seq read for about 1.8 PB of storage. The 512K FS is capable of doing about 10-12GB/s seq read for about 100T of storage. Now as I mentioned previously - the issue that I am seeing has been related to different FS block sizes on the same storage. For example: On the Flash Storage cluster: Block size of 512K with maxblocksize of 16M gives worse performance than Block size of 512K with maxblocksize of 512K. It is the maxblocksize that is affecting the performance, on the same storage with same block size and everything else being the same. I am thinking the above is because of the number of buffers involved, but would like to learn if it happens to be anything else. I have debugged the same with IBM GPFS techs and it has been found that there is no issue with the storage itself or any of the other GPFS tuning parameters. Now since we do know that maxblocksize is making a big difference. I would like to keep it as low as possible but still be able to mount other remote GPFS filesystems with higher block sizes. Or since it is required to keep the maxblocksize the same across all storage - I would like to know if there is any other parameters that could do the same change as maxblocksize. Thank you, Lohit > On Feb 28, 2020, at 12:58 PM, Anderson Ferreira Nobre wrote: > > Hi Lohit, > > First, a few questions to understand better your problem: > - What is the minimum release level of both clusters? > - What is the version of filesystem layout for 16MB, 1MB and 512KB? > - What is the subblocksize of each filesystem? > - How many IOPS, block size and throughput are you doing on each filesystem? > > Abra?os / Regards / Saludos, > > Anderson Nobre > Power and Storage Consultant > IBM Systems Hardware Client Technical Team ? IBM Systems Lab Services > > > > Phone: 55-19-2132-4317 > E-mail: anobre at br.ibm.com > > > ----- Original message ----- > From: "Valleru, Lohit/Information Systems" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers > Date: Fri, Feb 28, 2020 12:30 > > Hello Everyone, > > I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. > > One of our users run a deep learning application on GPUs, that does the following IO pattern: > > It needs to read random small sections about 4K in size from about 20,000 to 100,000 files of each 100M to 200M size. > > When performance tuning for the above application on a 16M filesystem and comparing it to various other file system block sizes - I realized that the performance degradation that I see might be related to the number of buffers. > > I observed that the performance varies widely depending on what maxblocksize parameter I use. > For example, using a 16M maxblocksize for a 512K or a 1M block size filesystem differs widely from using a 512K or 1M maxblocksize for a 512K or a 1M block size filesystem. > > The reason I believe might be related to the number of buffers that I could keep on the client side, but I am not sure if that is the all that the maxblocksize is affecting. > > We have different file system block sizes in our environment ranging from 512K, 1M and 16M. > > We also use storage clusters and compute clusters design. > > Now in order to mount the 16M filesystem along with the other filesystems on compute clusters - we had to keep the maxblocksize to be 16M - no matter what the file system block size. > > I see that I get maximum performance for this application from a 512K block size filesystem and a 512K maxblocksize. > However, I will not be able to mount this filesystem along with the other filesystems because I will need to change the maxblocksize to 16M in order to mount the other filesystems of 16M block size. > > I am thinking if there is anything else that can do the same job as maxblocksize parameter. > > I was thinking about the parameters like maxBufferDescs for a 16M maxblocksize, but I believe it would need a lot more pagepool to keep the same number of buffers as would be needed for a 512k maxblocksize. > > May I know if there is any other parameter that could help me the same as maxblocksize, and the side effects of the same? > > Thank you, > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: