From alvise.dorigo at psi.ch Tue Dec 7 13:44:24 2021 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Tue, 7 Dec 2021 13:44:24 +0000 Subject: [gpfsug-discuss] Question on changing mode on many files Message-ID: Dear users/developers/support, I'd like to ask if there is a fast way to manipulate the permission mask of many files (millions). I tried on 900k files and a recursive chmod (chmod 0### -R path) takes about 1000s, with about 50% usage of mmfsd daemon. I tried with the perl's internal function chmod that can operate on an array of files, and it takes about 1/3 of the previous method. Which is already a good result. I've seen the possibility to run a policy to execute commands, but I would avoid to execute external commands through mmxargs, 1M of times; would you ? Does anybody have any suggestion to do this operation with minimum disruption on the system ? Thank you, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Dec 7 14:01:41 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 7 Dec 2021 14:01:41 +0000 Subject: [gpfsug-discuss] Question on changing mode on many files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Tue Dec 7 14:10:20 2021 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Tue, 7 Dec 2021 14:10:20 +0000 Subject: [gpfsug-discuss] R: Question on changing mode on many files In-Reply-To: References:

Message-ID: I have 5.0.4 for the moment (planned to be updated next year) and what I see is: [root at sf-dss-1 tmp]# locate mmfind /usr/lpp/mmfs/samples/ilm/mmfind /usr/lpp/mmfs/samples/ilm/mmfind.README /usr/lpp/mmfs/samples/ilm/mmfindUtil_processOutputFile.c /usr/lpp/mmfs/samples/ilm/mmfindUtil_processOutputFile.sampleMakefile Is that what you are talking about ? Thanks, Alvise Da: gpfsug-discuss-bounces at spectrumscale.org Per conto di Frederick Stock Inviato: marted? 7 dicembre 2021 15:02 A: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Oggetto: Re: [gpfsug-discuss] Question on changing mode on many files If you are running on a more recent version of Scale you might want to look at the mmfind command. It provides a find-like wrapper around the execution of policy rules. Fred _______________________________________________________ Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "'gpfsug main discussion list'" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] Question on changing mode on many files Date: Tue, Dec 7, 2021 8:53 AM Dear users/developers/support, I?d like to ask if there is a fast way to manipulate the permission mask of many files (millions). I tried on 900k files and a recursive chmod (chmod 0### -R path) takes about 1000s, with about 50% usage of mmfsd daemon. I tried with the perl?s internal function chmod that can operate on an array of files, and it takes about 1/3 of the previous method. Which is already a good result. I?ve seen the possibility to run a policy to execute commands, but I would avoid to execute external commands through mmxargs, 1M of times; would you ? Does anybody have any suggestion to do this operation with minimum disruption on the system ? Thank you, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Dec 7 14:19:42 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 7 Dec 2021 14:19:42 +0000 Subject: [gpfsug-discuss] R: Question on changing mode on many files In-Reply-To: References: ,

Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Dec 7 14:28:54 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 7 Dec 2021 14:28:54 +0000 Subject: [gpfsug-discuss] Question on changing mode on many files In-Reply-To: References:

Message-ID: <15a0cd66-7a61-15ff-15eb-2613979b48b6@strath.ac.uk> On 07/12/2021 14:01, Frederick Stock wrote: > If you are running on a more recent version of Scale you might want to > look at the mmfind command.? It provides a find-like wrapper around the > execution of policy rules. > I am not sure that will be any faster than a "chmod -R" as it will exec millions of instances of chmod. What you gain on the swings you are going to loose on the roundabouts. TL;DR is you want to change permissions on millions of files expect it to take a considerable period of time. Even a modern NVMe SSD probably does around 50k IOPS per second, so best case scenario is one million files taking 40 seconds, at one read and one write per file and that is frankly unlikely. Also get ready to back them up again. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From s.j.thompson at bham.ac.uk Tue Dec 7 14:55:15 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Tue, 7 Dec 2021 14:55:15 +0000 Subject: [gpfsug-discuss] Question on changing mode on many files In-Reply-To: <15a0cd66-7a61-15ff-15eb-2613979b48b6@strath.ac.uk> References:

<15a0cd66-7a61-15ff-15eb-2613979b48b6@strath.ac.uk> Message-ID: Or add: UPDATECTIME yes SKIPACLUPDATECHECK yes To you dsm.opt file to skip checking for those updates and don?t back them up again. Actually I thought TSM only updated the metadata if the mode/owner changed, not re-backed the file? Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard Date: Tuesday, 7 December 2021 at 14:29 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Question on changing mode on many files On 07/12/2021 14:01, Frederick Stock wrote: > If you are running on a more recent version of Scale you might want to > look at the mmfind command. It provides a find-like wrapper around the > execution of policy rules. > I am not sure that will be any faster than a "chmod -R" as it will exec millions of instances of chmod. What you gain on the swings you are going to loose on the roundabouts. TL;DR is you want to change permissions on millions of files expect it to take a considerable period of time. Even a modern NVMe SSD probably does around 50k IOPS per second, so best case scenario is one million files taking 40 seconds, at one read and one write per file and that is frankly unlikely. Also get ready to back them up again. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Dec 7 15:42:58 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 7 Dec 2021 15:42:58 +0000 Subject: [gpfsug-discuss] Question on changing mode on many files In-Reply-To: References:

<15a0cd66-7a61-15ff-15eb-2613979b48b6@strath.ac.uk> Message-ID: On 07/12/2021 14:55, Simon Thompson wrote: > > Or add: > ? UPDATECTIME?????????????? yes > ? SKIPACLUPDATECHECK??????? yes > > To you dsm.opt file to skip checking for those updates and don?t back > them up again. Yeah, but then a restore gives you potentially an unusable file system as the ownership of the files and ACL's are all wrong. Better to bite the bullet and back them up again IMHO. > > Actually I thought TSM only updated the metadata if the mode/owner > changed, not re-backed the file? That was my understanding but I have seen TSM rebacked up large amounts of data where the owner of the file changed in the past, so your mileage may vary. Also ACL's are stored in extended attributes which are stored with the files and changes will definitely cause the file to be backed up again. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Walter.Sklenka at EDV-Design.at Thu Dec 9 09:26:40 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Thu, 9 Dec 2021 09:26:40 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Message-ID: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> Dear spectrum scale users! May I ask you a design question? We have an IB environment which is very mixed at the moment ( connecX3 ... connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also HDR100 and HDR switches. We still have some big troubles in this fabric when using RDMA , a case at Mellanox and IBM is open . The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where we want to migrate the data to ess5000 , ( mmdelvdisk +qos) Due to the current problems with RDMA we though eventually we could try a workaround : If you are interested there is Maybe you can find the attachment ? We build 2 separate fabrics , the ess-IO servers attached to both blue and green and all other cluster members and all remote clusters only to fabric blue The daemon interfaces (IPoIP) are on fabric blue It is the aim to setup rdma only on the ess-ioServers in the fabric green , in the blue we must use IPoIB (tcp) Do you think datamigration would work between ess01,ess02,... to ess07,ess08 via RDMA ? Or is it principally not possible to make a rdma network only for a subset of a cluster (though this subset would be reachable via other fabric) ? Thank you very much for any input ! Best regards walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Visio-eodc-2-fabs.pdf Type: application/pdf Size: 35768 bytes Desc: Visio-eodc-2-fabs.pdf URL: From janfrode at tanso.net Thu Dec 9 10:25:17 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 9 Dec 2021 11:25:17 +0100 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> References: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> Message-ID: I believe this should be a fully working solution. I see no problem enabling RDMA between a subset of nodes -- just disable verbsRdma on the nodes you want to use plain IP. -jf On Thu, Dec 9, 2021 at 11:04 AM Walter Sklenka wrote: > Dear spectrum scale users! > > May I ask you a design question? > > We have an IB environment which is very mixed at the moment ( connecX3 ? > connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also > HDR100 and HDR switches. We still have some big troubles in this fabric > when using RDMA , a case at Mellanox and IBM is open . > > The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where > we want to migrate the data to ess5000 , ( mmdelvdisk +qos) > > Due to the current problems with RDMA we though eventually we could try a > workaround : > > If you are interested there is Maybe you can find the attachment ? > > We build 2 separate fabrics , the ess-IO servers attached to both blue and > green and all other cluster members and all remote clusters only to fabric > blue > > The daemon interfaces (IPoIP) are on fabric blue > > > > It is the aim to setup rdma only on the ess-ioServers in the fabric green > , in the blue we must use IPoIB (tcp) > > Do you think datamigration would work between ess01,ess02,? to ess07,ess08 > via RDMA ? > > Or is it principally not possible to make a rdma network only for a > subset of a cluster (though this subset would be reachable via other > fabric) ? > > > > Thank you very much for any input ! > > Best regards walter > > > > > > > > Mit freundlichen Gr??en > *Walter Sklenka* > *Technical Consultant* > > > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6/1/2, A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: sklenka at edv-design.at > Internet: www.edv-design.at > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Thu Dec 9 10:41:29 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Thu, 9 Dec 2021 10:41:29 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: References: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> Message-ID: Hi Jan! That great to hear So we will try this Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at Von: Jan-Frode Myklebust Gesendet: Thursday, December 9, 2021 11:25 AM An: Walter Sklenka Cc: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration I believe this should be a fully working solution. I see no problem enabling RDMA between a subset of nodes -- just disable verbsRdma on the nodes you want to use plain IP. -jf On Thu, Dec 9, 2021 at 11:04 AM Walter Sklenka > wrote: Dear spectrum scale users! May I ask you a design question? We have an IB environment which is very mixed at the moment ( connecX3 ? connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also HDR100 and HDR switches. We still have some big troubles in this fabric when using RDMA , a case at Mellanox and IBM is open . The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where we want to migrate the data to ess5000 , ( mmdelvdisk +qos) Due to the current problems with RDMA we though eventually we could try a workaround : If you are interested there is Maybe you can find the attachment ? We build 2 separate fabrics , the ess-IO servers attached to both blue and green and all other cluster members and all remote clusters only to fabric blue The daemon interfaces (IPoIP) are on fabric blue It is the aim to setup rdma only on the ess-ioServers in the fabric green , in the blue we must use IPoIB (tcp) Do you think datamigration would work between ess01,ess02,? to ess07,ess08 via RDMA ? Or is it principally not possible to make a rdma network only for a subset of a cluster (though this subset would be reachable via other fabric) ? Thank you very much for any input ! Best regards walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Dec 9 12:04:28 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 9 Dec 2021 12:04:28 +0000 Subject: [gpfsug-discuss] =?utf-8?q?alternate_path_between_ESS_Servers_for?= =?utf-8?q?=09Datamigration?= In-Reply-To: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> References: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Dec 9 12:36:08 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 9 Dec 2021 12:36:08 +0000 Subject: [gpfsug-discuss] Adding a quorum node Message-ID: <73c81130-c120-d5f3-395f-4695e56905e1@strath.ac.uk> I am looking to replace the quorum node in our cluster. The RAID card in the server we are currently using is a casualty of the RHEL8 SAS card purge :-( I have a "new" dual core server that is fully supported by RHEL8. After some toing and throwing with IBM they agreed a Pentium G6400 is 70PVU a core and two cores :-) That said it is currently running RHEL7 because that's what the DSS-G nodes are running. The upgrade to RHEL8 is planned for next year. Anyway I have added it into the GPFS cluster all well and good and GPFS is mounted just fine. However when I ran the command to make it a quorum node I got the following error (sanitized to remove actual DNS names and IP addresses initialize (113, '', ('', 1191)) failed (err 79) server initialization failed (err 79) mmchnode: Unexpected error from chnodes -n 1=:1191,2:1191,3=:1191,113=:1191 -f 1 -P 1191 . Return code: 149 mmchnode: Unable to change the CCR quorum node configuration. mmchnode: Command failed. Examine previous error messages to determine cause. fqdn-new is the new node and fqdn1/2/3 are the existing quorum nodes. I want to remove fqdn3 in due course. Anyone any idea what is going on? I thought you could change the quorum nodes on the fly? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From douglasof at us.ibm.com Thu Dec 9 16:04:28 2021 From: douglasof at us.ibm.com (Douglas O'flaherty) Date: Thu, 9 Dec 2021 16:04:28 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: Message-ID: Walter: Though not directly about your design, our work with NVIDIA on GPUdirect Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both MOFED and Firmware version compatibility can be. I would suggest anyone debugging RDMA issues should look at those closely. Doug by carrier pigeon On Dec 9, 2021, 5:04:36 AM, gpfsug-discuss-request at spectrumscale.org wrote: From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Date: Dec 9, 2021, 5:04:36 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 119, Issue 5 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.orgYou can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..." Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.orgYou can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..."Today's Topics: 1. alternate path between ESS Servers for Datamigration (Walter Sklenka) Dear spectrum scale users! May I ask you a design question? We have an IB environment which is very mixed at the moment ( connecX3 ? connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also HDR100 and HDR switches. We still have some big troubles in this fabric when using RDMA , a case at Mellanox and IBM is open . The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where we want to migrate the data to ess5000 , ( mmdelvdisk +qos) Due to the current problems with RDMA we though eventually we could try a workaround : If you are interested there is Maybe you can find the attachment ? We build 2 separate fabrics , the ess-IO servers attached to both blue and green and all other cluster members and all remote clusters only to fabric blue The daemon interfaces (IPoIP) are on fabric blue It is the aim to setup rdma only on the ess-ioServers in the fabric green , in the blue we must use IPoIB (tcp) Do you think datamigration would work between ess01,ess02,? to ess07,ess08 via RDMA ? Or is it principally not possible to make a rdma network only for a subset of a cluster (though this subset would be reachable via other fabric) ? Thank you very much for any input ! Best regards walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.eberhard at de.ibm.com Thu Dec 9 16:43:26 2021 From: ralf.eberhard at de.ibm.com (Ralf Eberhard) Date: Thu, 9 Dec 2021 16:43:26 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 119, Issue 7 - Adding a quorum node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ewahl at osc.edu Thu Dec 9 19:09:44 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Thu, 9 Dec 2021 19:09:44 +0000 Subject: [gpfsug-discuss] Adding a quorum node In-Reply-To: <73c81130-c120-d5f3-395f-4695e56905e1@strath.ac.uk> References: <73c81130-c120-d5f3-395f-4695e56905e1@strath.ac.uk> Message-ID: I frequently change quorum on the fly on both our 4.x and 5.0 clusters during upgrades/maintenance. You have sanity in the CCR to start with? (mmccr query, lsnodes, etc,etc) Anything useful in the logs or if you drop debug on it? ('export DEBUG=1'and then re-run command) Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Thursday, December 9, 2021 7:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Adding a quorum node I am looking to replace the quorum node in our cluster. The RAID card in the server we are currently using is a casualty of the RHEL8 SAS card purge :-( I have a "new" dual core server that is fully supported by RHEL8. After some toing and throwing with IBM they agreed a Pentium G6400 is 70PVU a core and two cores :-) That said it is currently running RHEL7 because that's what the DSS-G nodes are running. The upgrade to RHEL8 is planned for next year. Anyway I have added it into the GPFS cluster all well and good and GPFS is mounted just fine. However when I ran the command to make it a quorum node I got the following error (sanitized to remove actual DNS names and IP addresses initialize (113, '', ('', 1191)) failed (err 79) server initialization failed (err 79) mmchnode: Unexpected error from chnodes -n 1=:1191,2:1191,3=:1191,113=:1191 -f 1 -P 1191 . Return code: 149 mmchnode: Unable to change the CCR quorum node configuration. mmchnode: Command failed. Examine previous error messages to determine cause. fqdn-new is the new node and fqdn1/2/3 are the existing quorum nodes. I want to remove fqdn3 in due course. Anyone any idea what is going on? I thought you could change the quorum nodes on the fly? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!hO7wULtfr6n28eBJ0BB8sYyRMFo6Xl5_XDpsNZz3GiD_3nXlPf6nKHNR-X99$ From Walter.Sklenka at EDV-Design.at Thu Dec 9 19:38:45 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Thu, 9 Dec 2021 19:38:45 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: References: <203c51ce5d6c4cb9992ebc26f1b503cf@Mail.EDVDesign.cloudia> Message-ID: Hi Olaf!! Many thanks OK well we will do mmvdisk vs delete So #mmvdisk vs delete ? -N ess01,ess02?.. would be correct , or? Best regards walter From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Olaf Weiser Sent: Donnerstag, 9. Dezember 2021 13:04 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Hallo Walter, ;-) yes !AND! no .. for sure , you can specifiy a subset of nodes to use RDMA and other nodes just communicating TCPIP But that's only half of the truth . The other half is.. who and how , you are going to migrate/copy the data in case you 'll use mmrestripe .... you will have to make sure , that only nodes, connected(green) and configured for RDMA doing the work otherwise.. if will also work to migrate the data, but then data is send throught the Ethernet as well , (as long all those nodes are in the same cluster) laff ----- Urspr?ngliche Nachricht ----- Von: "Walter Sklenka" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: "'gpfsug-discuss at spectrumscale.org'" > CC: Betreff: [EXTERNAL] [gpfsug-discuss] alternate path between ESS Servers for Datamigration Datum: Do, 9. Dez 2021 11:04 Dear spectrum scale users! May I ask you a design question? We have an IB environment which is very mixed at the moment ( connecX3 ? connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also HDR100 and HDR switches. We still have some big troubles in this fabric when using RDMA , a case at Mellanox and IBM is open . The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where we want to migrate the data to ess5000 , ( mmdelvdisk +qos) Due to the current problems with RDMA we though eventually we could try a workaround : If you are interested there is Maybe you can find the attachment ? We build 2 separate fabrics , the ess-IO servers attached to both blue and green and all other cluster members and all remote clusters only to fabric blue The daemon interfaces (IPoIP) are on fabric blue It is the aim to setup rdma only on the ess-ioServers in the fabric green , in the blue we must use IPoIB (tcp) Do you think datamigration would work between ess01,ess02,? to ess07,ess08 via RDMA ? Or is it principally not possible to make a rdma network only for a subset of a cluster (though this subset would be reachable via other fabric) ? Thank you very much for any input ! Best regards walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Thu Dec 9 19:43:31 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Thu, 9 Dec 2021 19:43:31 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: References:

Message-ID: <4f6b41f6a3b44c7a80cb588add2056dd@Mail.EDVDesign.cloudia> Hello Douglas! Many thanks for your advice ! Well we are in a horrible situation regarding firmware and MOFED of old equipment Mellanox advised us to use a special version of subnetmanager 5.0-2.1.8.0 from MOFED I hope this helps Let?s see how we can proceed Best regards Walter From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Douglas O'flaherty Sent: Donnerstag, 9. Dezember 2021 17:04 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Walter: Though not directly about your design, our work with NVIDIA on GPUdirect Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both MOFED and Firmware version compatibility can be. I would suggest anyone debugging RDMA issues should look at those closely. Doug by carrier pigeon ________________________________ On Dec 9, 2021, 5:04:36 AM, gpfsug-discuss-request at spectrumscale.org wrote: From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Date: Dec 9, 2021, 5:04:36 AM Subject: [EXTERNAL] gpfsug-discuss Digest, Vol 119, Issue 5 ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.orgYou can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..." Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.orgYou can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..."Today's Topics: 1. alternate path between ESS Servers for Datamigration (Walter Sklenka) Dear spectrum scale users! May I ask you a design question? We have an IB environment which is very mixed at the moment ( connecX3 ? connect-X6 with FDR , even FDR10 and with arrive of ESS5000SC7 now also HDR100 and HDR switches. We still have some big troubles in this fabric when using RDMA , a case at Mellanox and IBM is open . The environment has 3 old Building blocks 2xESSGL6 and 1x GL4 , from where we want to migrate the data to ess5000 , ( mmdelvdisk +qos) Due to the current problems with RDMA we though eventually we could try a workaround : If you are interested there is Maybe you can find the attachment ? We build 2 separate fabrics , the ess-IO servers attached to both blue and green and all other cluster members and all remote clusters only to fabric blue The daemon interfaces (IPoIP) are on fabric blue It is the aim to setup rdma only on the ess-ioServers in the fabric green , in the blue we must use IPoIB (tcp) Do you think datamigration would work between ess01,ess02,? to ess07,ess08 via RDMA ? Or is it principally not possible to make a rdma network only for a subset of a cluster (though this subset would be reachable via other fabric) ? Thank you very much for any input ! Best regards walter Mit freundlichen Gr??en Walter Sklenka Technical Consultant EDV-Design Informationstechnologie GmbH Giefinggasse 6/1/2, A-1210 Wien Tel: +43 1 29 22 165-31 Fax: +43 1 29 22 165-90 E-Mail: sklenka at edv-design.at Internet: www.edv-design.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Dec 9 20:19:41 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 9 Dec 2021 20:19:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 119, Issue 7 - Adding a quorum node In-Reply-To: References:

Message-ID: On 09/12/2021 16:43, Ralf Eberhard wrote: > Jonathan, > > my suspicion is that?the GPFS daemon on fqdn-new is not reachable via > port 1191. > You can double check that by?sending a lightweight CCR RPC to this > daemon from another quorum node by attempting: > > mmccr echo -n fqdn-new;echo $? > > If this echo returns with a non-zero exit code the network settings must > be verified. And even?the other direction must > work: Node fqdn-new must?reach another quorum node, like (attempting on > fqdn-new): > > mmccr echo -n ;echo $? > Duh, that's my Homer Simpson moment for today. I forgotten to move the relevant network interfaces on the new server to the trusted zone in the firewall. So of course my normal testing with ping and ssh was working just fine. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Fri Dec 10 00:27:23 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 10 Dec 2021 00:27:23 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: References: Message-ID: On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Fri Dec 10 01:09:57 2021 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Fri, 10 Dec 2021 01:09:57 +0000 Subject: [gpfsug-discuss] alternate path between ESS Servers for Datamigration In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16390972812300.png Type: image/png Size: 98384 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16390972812301.png Type: image/png Size: 101267 bytes Desc: not available URL: From douglasof at us.ibm.com Fri Dec 10 04:24:21 2021 From: douglasof at us.ibm.com (Douglas O'flaherty) Date: Fri, 10 Dec 2021 00:24:21 -0400 Subject: [gpfsug-discuss] WAS: alternative path; Now: RDMA In-Reply-To: References: Message-ID: Jonathan: You posed a reasonable question, which was "when is RDMA worth the hassle?" I agree with part of your premises, which is that it only matters when the bottleneck isn't somewhere else. With a parallel file system, like Scale/GPFS, the absolute performance bottleneck is not the throughput of a single drive. In a majority of Scale/GPFS clusters the network data path is the performance limitation. If they deploy HDR or 100/200/400Gbps Ethernet... At that point, the buffer copy time inside the server matters. When the device is an accelerator, like a GPU, the benefit of RDMA (GDS) is easily demonstrated because it eliminates the bounce copy through the system memory. In our NVIDIA DGX A100 server testing testing we were able to get around 2x the per system throughput by using RDMA direct to GPU (GUP Direct Storage). (Tested on 2 DGX system with 4x HDR links per storage node.) However, your question remains. Synthetic benchmarks are good indicators of technical benefit, but do your users and applications need that extra performance? These are probably only a handful of codes in organizations that need this. However, they are high-value use cases. We have client applications that either read a lot of data semi-randomly and not-cached - think mini-Epics for scaling ML training. Or, demand lowest response time, like production inference on voice recognition and NLP. If anyone has use cases for GPU accelerated codes with truly demanding data needs, please reach out directly. We are looking for more use cases to characterize the benefit for a new paper. f you can provide some code examples, we can help test if RDMA direct to GPU (GPUdirect Storage) is a benefit. Thanks, doug Douglas O'Flaherty douglasof at us.ibm.com ----- Message from Jonathan Buzzard on Fri, 10 Dec 2021 00:27:23 +0000 ----- To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ----- Original message ----- From: "Jonathan Buzzard" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Date: Fri, Dec 10, 2021 10:27 On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Walter.Sklenka at EDV-Design.at Fri Dec 10 10:17:20 2021 From: Walter.Sklenka at EDV-Design.at (Walter Sklenka) Date: Fri, 10 Dec 2021 10:17:20 +0000 Subject: [gpfsug-discuss] WAS: alternative path; Now: RDMA In-Reply-To: References:

Message-ID: <7bec39e7fe0d4aac842b59a29239522f@Mail.EDVDesign.cloudia> Hello Douglas! May I ask a basic question regarding GPUdirect Storage or all local attached storage like NVME disks. Do you think it outerperforms "classical" shared storagesystems which are attached via FC connected to NSD servers HDR attached? With FC you have also bounce copies and more delay , isn?t it? There are solutions around which work with local NVME disks building some protection level with Raid (or duplication) . I am curious if it would be a better approach than shared storage which has it?s limitation (cost intensive scale out, extra infrstructure, max 64Gb at this time ... ) Best regards Walter From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Douglas O'flaherty Sent: Freitag, 10. Dezember 2021 05:24 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA Jonathan: You posed a reasonable question, which was "when is RDMA worth the hassle?" I agree with part of your premises, which is that it only matters when the bottleneck isn't somewhere else. With a parallel file system, like Scale/GPFS, the absolute performance bottleneck is not the throughput of a single drive. In a majority of Scale/GPFS clusters the network data path is the performance limitation. If they deploy HDR or 100/200/400Gbps Ethernet... At that point, the buffer copy time inside the server matters. When the device is an accelerator, like a GPU, the benefit of RDMA (GDS) is easily demonstrated because it eliminates the bounce copy through the system memory. In our NVIDIA DGX A100 server testing testing we were able to get around 2x the per system throughput by using RDMA direct to GPU (GUP Direct Storage). (Tested on 2 DGX system with 4x HDR links per storage node.) However, your question remains. Synthetic benchmarks are good indicators of technical benefit, but do your users and applications need that extra performance? These are probably only a handful of codes in organizations that need this. However, they are high-value use cases. We have client applications that either read a lot of data semi-randomly and not-cached - think mini-Epics for scaling ML training. Or, demand lowest response time, like production inference on voice recognition and NLP. If anyone has use cases for GPU accelerated codes with truly demanding data needs, please reach out directly. We are looking for more use cases to characterize the benefit for a new paper. f you can provide some code examples, we can help test if RDMA direct to GPU (GPUdirect Storage) is a benefit. Thanks, doug Douglas O'Flaherty douglasof at us.ibm.com ----- Message from Jonathan Buzzard > on Fri, 10 Dec 2021 00:27:23 +0000 ----- To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ----- Original message ----- From: "Jonathan Buzzard" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Date: Fri, Dec 10, 2021 10:27 On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Dec 10 10:28:38 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 10 Dec 2021 10:28:38 +0000 Subject: [gpfsug-discuss] WAS: alternative path; Now: RDMA In-Reply-To: <7bec39e7fe0d4aac842b59a29239522f@Mail.EDVDesign.cloudia> References:

<7bec39e7fe0d4aac842b59a29239522f@Mail.EDVDesign.cloudia> Message-ID: Hallo Walter, we had many experiences now to change our Storage-Systems in our Backup-Environment to RDMA-IB with HDR and EDR Connections. What we see now (came from a 16Gbit FC Infrastructure) we enhance our throuhput from 7 GB/s to 30 GB/s. The main reason are the elimination of the driver-layers in the client-systems and make a Buffer to Buffer communication because of RDMA. The latency reduction are significant. Regards Renar. We use now ESS3k and ESS5k systems with 6.1.1.2-Code level. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Thomas Sehn, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Walter Sklenka Gesendet: Freitag, 10. Dezember 2021 11:17 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA Hello Douglas! May I ask a basic question regarding GPUdirect Storage or all local attached storage like NVME disks. Do you think it outerperforms ?classical? shared storagesystems which are attached via FC connected to NSD servers HDR attached? With FC you have also bounce copies and more delay , isn?t it? There are solutions around which work with local NVME disks building some protection level with Raid (or duplication) . I am curious if it would be a better approach than shared storage which has it?s limitation (cost intensive scale out, extra infrstructure, max 64Gb at this time ? ) Best regards Walter From: gpfsug-discuss-bounces at spectrumscale.org

> On Behalf Of Douglas O'flaherty Sent: Freitag, 10. Dezember 2021 05:24 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA Jonathan: You posed a reasonable question, which was "when is RDMA worth the hassle?" I agree with part of your premises, which is that it only matters when the bottleneck isn't somewhere else. With a parallel file system, like Scale/GPFS, the absolute performance bottleneck is not the throughput of a single drive. In a majority of Scale/GPFS clusters the network data path is the performance limitation. If they deploy HDR or 100/200/400Gbps Ethernet... At that point, the buffer copy time inside the server matters. When the device is an accelerator, like a GPU, the benefit of RDMA (GDS) is easily demonstrated because it eliminates the bounce copy through the system memory. In our NVIDIA DGX A100 server testing testing we were able to get around 2x the per system throughput by using RDMA direct to GPU (GUP Direct Storage). (Tested on 2 DGX system with 4x HDR links per storage node.) However, your question remains. Synthetic benchmarks are good indicators of technical benefit, but do your users and applications need that extra performance? These are probably only a handful of codes in organizations that need this. However, they are high-value use cases. We have client applications that either read a lot of data semi-randomly and not-cached - think mini-Epics for scaling ML training. Or, demand lowest response time, like production inference on voice recognition and NLP. If anyone has use cases for GPU accelerated codes with truly demanding data needs, please reach out directly. We are looking for more use cases to characterize the benefit for a new paper. f you can provide some code examples, we can help test if RDMA direct to GPU (GPUdirect Storage) is a benefit. Thanks, doug Douglas O'Flaherty douglasof at us.ibm.com ----- Message from Jonathan Buzzard > on Fri, 10 Dec 2021 00:27:23 +0000 ----- To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ----- Original message ----- From: "Jonathan Buzzard" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] alternate path between ESS Servers for Datamigration Date: Fri, Dec 10, 2021 10:27 On 09/12/2021 16:04, Douglas O'flaherty wrote: > > Though not directly about your design, our work with NVIDIA on GPUdirect > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > MOFED and Firmware version compatibility can be. > > I would suggest anyone debugging RDMA issues should look at those closely. > May I ask what are the alleged benefits of using RDMA in GPFS? I can see there would be lower latency over a plain IP Ethernet or IPoIB solution but surely disk latency is going to swamp that? I guess SSD drives might change that calculation but I have never seen proper benchmarks comparing the two, or even better yet all four connection options. Just seems a lot of complexity and fragility for very little gain to me. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Dec 10 10:37:31 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 10 Dec 2021 10:37:31 +0000 Subject: [gpfsug-discuss] Test email format / mail format Message-ID: An HTML attachment was scrubbed... URL: From Ondrej.Kosik at ibm.com Fri Dec 10 10:39:56 2021 From: Ondrej.Kosik at ibm.com (Ondrej Kosik) Date: Fri, 10 Dec 2021 10:39:56 +0000 Subject: [gpfsug-discuss] Test email format / mail format In-Reply-To: References: Message-ID: Hello all, Thank you for the test email, my reply is coming from Outlook-based infrastructure. ________________________________ From: Olaf Weiser Sent: Friday, December 10, 2021 10:37 AM To: gpfsug-discuss at spectrumscale.org Cc: Ondrej Kosik Subject: Test email format / mail format This email is just a test, because we've seen mail format issues from IBM sent emails you can ignore this email , just for internal problem determination -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Dec 10 11:10:07 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 10 Dec 2021 11:10:07 +0000 Subject: [gpfsug-discuss] WAS: alternative path; Now: RDMA In-Reply-To: References: ,

<7bec39e7fe0d4aac842b59a29239522f@Mail.EDVDesign.cloudia> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16391192376761.png Type: image/png Size: 127072 bytes Desc: not available URL: From anacreo at gmail.com Sun Dec 12 02:19:02 2021 From: anacreo at gmail.com (Alec) Date: Sat, 11 Dec 2021 18:19:02 -0800 Subject: [gpfsug-discuss] WAS: alternative path; Now: RDMA In-Reply-To: References:

Message-ID: I feel the need to respond here... I see many responses on this User Group forum that are dismissive of the fringe / extreme use cases and of the "what do you need that for '' mindset. The thing is that Spectrum Scale is for the extreme, just take the word "Parallel" in the old moniker that was already an extreme use case. If you have a standard workload, then sure most of the complex features of the file system are toys, but many of us DO have extreme workloads where shaking out every ounce of performance is a worthwhile and financially sound endeavor. It is also because of the efforts of those of us living on the cusp of technology that these technologies become mainstream and no-longer extreme. I have an AIX LPAR that traverses more than 300TB+ of data a day on a Spectrum Scale file system, it is fully virtualized, and handles a million files. If that performance level drops, regulatory reports will be late, business decisions won't be current. However, the systems of today and the future have to traverse this much data and if they are slow then they can't keep up with real-time data feeds. So the difference between an RDMA disk IO vs a non RDMA disk IO could possibly mean what level of analytics are done to perform real time fraud prevention. Or at what cost, today many systems achieve this by keeping everything in memory in HUGE farms.. Being able to perform data operations at 30GB/s means you can traverse ALL of the census bureau data for all time from the US Govt in about 2 seconds... that's a pretty substantial capability that moves the bar forward in what we can do from a technology perspective. I just did a technology garage with IBM where we were able to achieve 1.5TB/writes on an encrypted ESS off of a single VMWare Host and 4 VM's over IP... That's over 2PB of data writes a day on a single VM server. Being able to demonstrate that there are production virtualized environments capable of this type of capacity, helps to show where the point of engineering a proper storage architecture outweighs the benefits of just throwing more GPU compute farms at the problem with ever dithering disk I/O. It also helps to demonstrate how a virtual storage optimized farm could be leveraged to host many in-memory or data analytic heavy workloads in a shared configuration. Douglas's response is the right one, how much IO does the application / environment need, it's nice to see Spectrum Scale have the flexibility to deliver. I'm pretty confident that if I can't deliver the required I/O performance on Spectrum Scale, nobody else can on any other storage platform within reasonable limits. Alec Effrat On Thu, Dec 9, 2021 at 8:24 PM Douglas O'flaherty wrote: > Jonathan: > > You posed a reasonable question, which was "when is RDMA worth the > hassle?" I agree with part of your premises, which is that it only matters > when the bottleneck isn't somewhere else. With a parallel file system, like > Scale/GPFS, the absolute performance bottleneck is not the throughput of a > single drive. In a majority of Scale/GPFS clusters the network data path is > the performance limitation. If they deploy HDR or 100/200/400Gbps > Ethernet... At that point, the buffer copy time inside the server matters. > > When the device is an accelerator, like a GPU, the benefit of RDMA (GDS) > is easily demonstrated because it eliminates the bounce copy through the > system memory. In our NVIDIA DGX A100 server testing testing we were able > to get around 2x the per system throughput by using RDMA direct to GPU (GUP > Direct Storage). (Tested on 2 DGX system with 4x HDR links per storage > node.) > > However, your question remains. Synthetic benchmarks are good indicators > of technical benefit, but do your users and applications need that extra > performance? > > These are probably only a handful of codes in organizations that need > this. However, they are high-value use cases. We have client applications > that either read a lot of data semi-randomly and not-cached - think > mini-Epics for scaling ML training. Or, demand lowest response time, like > production inference on voice recognition and NLP. > > If anyone has use cases for GPU accelerated codes with truly demanding > data needs, please reach out directly. We are looking for more use cases to > characterize the benefit for a new paper. f you can provide some code > examples, we can help test if RDMA direct to GPU (GPUdirect Storage) is a > benefit. > > Thanks, > > doug > > Douglas O'Flaherty > douglasof at us.ibm.com > > > > > > > ----- Message from Jonathan Buzzard on > Fri, 10 Dec 2021 00:27:23 +0000 ----- > > *To:* > gpfsug-discuss at spectrumscale.org > > *Subject:* > Re: [gpfsug-discuss] > On 09/12/2021 16:04, Douglas O'flaherty wrote: > > > > Though not directly about your design, our work with NVIDIA on GPUdirect > > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > > MOFED and Firmware version compatibility can be. > > > > I would suggest anyone debugging RDMA issues should look at those > closely. > > > May I ask what are the alleged benefits of using RDMA in GPFS? > > I can see there would be lower latency over a plain IP Ethernet or IPoIB > solution but surely disk latency is going to swamp that? > > I guess SSD drives might change that calculation but I have never seen > proper benchmarks comparing the two, or even better yet all four > connection options. > > Just seems a lot of complexity and fragility for very little gain to me. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > > > ----- Original message ----- > From: "Jonathan Buzzard" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] alternate path between ESS > Servers for Datamigration > Date: Fri, Dec 10, 2021 10:27 > > On 09/12/2021 16:04, Douglas O'flaherty wrote: > > > > Though not directly about your design, our work with NVIDIA on GPUdirect > > Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both > > MOFED and Firmware version compatibility can be. > > > > I would suggest anyone debugging RDMA issues should look at those > closely. > > > May I ask what are the alleged benefits of using RDMA in GPFS? > > I can see there would be lower latency over a plain IP Ethernet or IPoIB > solution but surely disk latency is going to swamp that? > > I guess SSD drives might change that calculation but I have never seen > proper benchmarks comparing the two, or even better yet all four > connection options. > > Just seems a lot of complexity and fragility for very little gain to me. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Sun Dec 12 02:38:26 2021 From: anacreo at gmail.com (Alec) Date: Sat, 11 Dec 2021 18:38:26 -0800 Subject: [gpfsug-discuss] Question on changing mode on many files In-Reply-To: References: