From geoffrey_avila at brown.edu Mon Jun 1 02:44:12 2020 From: geoffrey_avila at brown.edu (Avila, Geoffrey) Date: Sun, 31 May 2020 21:44:12 -0400 Subject: [gpfsug-discuss] Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) In-Reply-To: References: <6E77A1C3-3D92-4FFC-B732-9FA56FE6C7ED@theatsgroup.com> <53c17f6a-f3fe-9217-4000-197e1b06a105@strath.ac.uk> Message-ID: The local-block-device method of I/O is what is usually termed "SAN mode"; right? On Sun, May 31, 2020 at 12:47 PM Jan-Frode Myklebust wrote: > > No, this is a common misconception. You don?t need any NSD servers. NSD > servers are only needed if you have nodes without direct block access. > > Remote cluster or not, disk access will be over local block device > (without involving NSD servers in any way), or NSD server if local access > isn?t available. NSD-servers are not ?arbitrators? over access to a disk, > they?re just stupid proxies of IO commands. > > > -jf > > s?n. 31. mai 2020 kl. 11:31 skrev Jonathan Buzzard < > jonathan.buzzard at strath.ac.uk>: > >> On 29/05/2020 20:55, Stephen Ulmer wrote: >> > I have a question about multi-cluster, but it is related to this thread >> > (it would be solving the same problem). >> > >> > Let?s say we have two clusters A and B, both clusters are normally >> > shared-everything with no NSD servers defined. >> >> Er, even in a shared-everything all nodes fibre channel attached you >> still have to define NSD servers. That is a given NSD has a server (or >> ideally a list of servers) that arbitrate the disk. Unless it has >> changed since 3.x days. Never run a 4.x or later with all the disks SAN >> attached on all the nodes. >> >> > We want cluster B to be >> > able to use a file system in cluster A. If I zone the SAN such that >> > cluster B can see all of cluster A?s disks, can I then define a >> > multi-cluster relationship between them and mount a file system from A >> on B? >> > >> > To state it another way, must B's I/O for the foreign file system pass >> > though NSD servers in A, or can B?s nodes discover that they have >> > FibreChannel paths to those disks and use them? >> > >> >> My understanding is that remote cluster mounts have to pass through the >> NSD servers. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Jun 1 03:54:11 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sun, 31 May 2020 22:54:11 -0400 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 100, Issue 32 In-Reply-To: <3597cc2c-47af-461f-af03-4f88069e1ca6@strath.ac.uk> References: <6E77A1C3-3D92-4FFC-B732-9FA56FE6C7ED@theatsgroup.com> <3597cc2c-47af-461f-af03-4f88069e1ca6@strath.ac.uk> Message-ID: <83255.1590980051@turing-police> On Fri, 29 May 2020 22:30:08 +0100, Jonathan Buzzard said: > Ethernet goes *very* fast these days you know :-) In fact *much* faster > than fibre channel. Yes, but the justification, purchase, and installation of 40G or 100G Ethernet interfaces in the machines involved, plus the routers/switches along the way, can go very slowly indeed. So finding a way to replace 10G Ether with 16G FC can be a win..... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Mon Jun 1 09:45:25 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Jun 2020 09:45:25 +0100 Subject: [gpfsug-discuss] Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) In-Reply-To: References: <6E77A1C3-3D92-4FFC-B732-9FA56FE6C7ED@theatsgroup.com> <53c17f6a-f3fe-9217-4000-197e1b06a105@strath.ac.uk> Message-ID: On 31/05/2020 17:47, Jan-Frode Myklebust wrote: > > No, this is a common misconception.? You don?t need any NSD servers. NSD > servers are only needed if you have nodes without direct block access. > I see that has changed then. In the past mmcrnsd would simply fail without a server list passed to it. If you have been a long term GPFS user (I started with 2.2 on a site that had been running since 1.x days) then we are not always aware of things that have changed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From chrisjscott at gmail.com Mon Jun 1 14:14:02 2020 From: chrisjscott at gmail.com (Chris Scott) Date: Mon, 1 Jun 2020 14:14:02 +0100 Subject: [gpfsug-discuss] Importing a Spectrum Scale a filesystem from 4.2.3 cluster to 5.0.4.3 cluster In-Reply-To: <8A9F7C61-E669-41F7-B74D-70B9BC4B3DB1@theatsgroup.com> References: <8A9F7C61-E669-41F7-B74D-70B9BC4B3DB1@theatsgroup.com> Message-ID: Sounds like it would work fine. I recently exported a 3.5 version filesystem from a GPFS 3.5 cluster to a 'Scale cluster at 5.0.2.3 software and 5.0.2.0 cluster version. I concurrently mapped the NSDs to new NSD servers in the 'Scale cluster, mmexported the filesystem and changed the NSD servers configuration of the NSDs using the mmimportfs ChangeSpecFile. The original (creation) filesystem version of this filesystem is 3.2.1.5. To my pleasant surprise the filesystem mounted and worked fine while still at 3.5 filesystem version. Plan B would have been to "mmchfs -V full" and then mmmount, but I was able to update the filesystem to 5.0.2.0 version while already mounted. This was further pleasantly successful as the filesystem in question is DMAPI-enabled, with the majority of the data on tape using Spectrum Protect for Space Management than the volume resident/pre-migrated on disk. The complexity is further compounded by this filesystem being associated to a different Spectrum Protect server than an existing DMAPI-enabled filesystem in the 'Scale cluster. Preparation of configs and subsequent commands to enable and use Spectrum Protect for Space Management multiserver for migration and backup all worked smoothly as per the docs. I was thus able to get rid of the GPFS 3.5 cluster on legacy hardware, OS, GPFS and homebrew CTDB SMB and NFS and retain the filesystem with its majority of tape-stored data on current hardware, OS and 'Scale/'Protect with CES SMB and NFS. The future objective remains to move all the data from this historical filesystem to a newer one to get the benefits of larger block and inode sizes, etc, although since the data is mostly dormant and kept for compliance/best-practice purposes, the main goal will be to head off original file system version 3.2 era going end of support. Cheers Chris On Thu, 28 May 2020 at 23:31, Prasad Surampudi < prasad.surampudi at theatsgroup.com> wrote: > We have two scale clusters, cluster-A running version Scale 4.2.3 and > RHEL6/7 and Cluster-B running Spectrum Scale 5.0.4 and RHEL 8.1. All the > nodes in both Cluster-A and Cluster-B are direct attached and no NSD > servers. We have our current filesystem gpfs_4 in Cluster-A and new > filesystem gpfs_5 in Cluster-B. We want to copy all our data from gpfs_4 > filesystem into gpfs_5 which has variable block size. So, can we map NSDs > of gpfs_4 to Cluster-B nodes and do a mmexportfs of gpfs_4 from Cluster-A > and mmimportfs into Cluster-B so that we have both filesystems available on > same node in Cluster-B for copying data across fiber channel? If > mmexportfs/mmimportfs works, can we delete nodes from Cluster-A and add > them to Cluster-B without upgrading RHEL or GPFS versions for now and plan > upgrading them at a later time? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhill at physics.ucsd.edu Mon Jun 1 16:32:09 2020 From: bhill at physics.ucsd.edu (Bryan Hill) Date: Mon, 1 Jun 2020 08:32:09 -0700 Subject: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 In-Reply-To: References:

Message-ID: Hi: Just a note on this: the pidof fix was accepted upstream but has not made its way into rhel 8.2 yet Thanks, Bryan --- Bryan Hill Lead System Administrator UCSD Physics Computing Facility 9500 Gilman Dr. # 0319 La Jolla, CA 92093 +1-858-534-5538 bhill at ucsd.edu On Mon, Feb 17, 2020 at 12:02 AM Malahal R Naineni wrote: > > I filed a defect here, let us see what Redhat says. Yes, it doesn't work for any kernel threads. It doesn't work for user level threads/processes. > > https://bugzilla.redhat.com/show_bug.cgi?id=1803640 > > Regards, Malahal. > > > ----- Original message ----- > From: Bryan Hill > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 > Date: Mon, Feb 17, 2020 8:26 AM > > Ah wait, I see what you might mean. pidof works but not specifically for processes like nfsd. That is odd. > > Thanks, > Bryan > > > > On Sun, Feb 16, 2020 at 10:19 AM Bryan Hill wrote: > > Hi Malahal: > > Just to clarify, are you saying that on your VM pidof is missing? Or that it is there and not working as it did prior to RHEL/CentOS 8? pidof is returning pid numbers on my system. I've been looking at the mmnfsmonitor script and trying to see where the check for nfsd might be failing, but I've not been able to figure it out yet. > > > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > > On Sat, Feb 15, 2020 at 2:03 AM Malahal R Naineni wrote: > > I am not familiar with CNFS but looking at git source seems to indicate that it uses 'pidof' to check if a program is running or not. "pidof nfsd" works on RHEL7.x but it fails on my centos8.1 I just created. So either we need to make sure pidof works on kernel threads or fix CNFS scripts. > > Regards, Malahal. > > > ----- Original message ----- > From: Bryan Hill > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] CNFS issue after upgrading from 4.2.3.11 to 5.0.4.2 > Date: Fri, Feb 14, 2020 11:40 PM > > Hi All: > > I'm performing a rolling upgrade of one of our GPFS clusters. This particular cluster has 2 CNFS servers for some of our NFS clients. I wiped one of the nodes and installed RHEL 8.1 and GPFS 5.0.4.2. The filesystem mounts fine on the node when I disable CNFS on the node, but with it enabled it's a no go. It appears mmnfsmonitor doesn't recognize that nfsd has started, so it assumes the worst and shuts down the file system (I currently have reboot on failure disabled to debug this). The thing is, it actually does start nfsd processes when running mmstartup on the node. Doing a "ps" shows 32 nfsd threads are running. > > Below is the CNFS-specific output from an attempt to start the node: > > CNFS[27243]: Restarting lockd to start grace > CNFS[27588]: Enabling 172.16.69.76 > CNFS[27694]: Restarting lockd to start grace > CNFS[27699]: Starting NFS services > CNFS[27764]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks > CNFS[27910]: Monitor has started pid=27787 > CNFS[28702]: Monitor detected nfsd was not running, will attempt to start it > CNFS[28705]: Starting NFS services > CNFS[28730]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks > CNFS[28755]: Monitor detected nfsd was not running, will attempt to start it > CNFS[28758]: Starting NFS services > CNFS[28789]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks > CNFS[28813]: Monitor detected nfsd was not running, will attempt to start it > CNFS[28816]: Starting NFS services > CNFS[28844]: NFS clients of node 172.16.69.122 notified to reclaim NLM locks > CNFS[28867]: Monitor detected nfsd was not running, will attempt to start it > CNFS[28874]: Monitoring detected NFSD is inactive. mmnfsmonitor: NFS server is not running or responding. Node failure initiated as configured. > CNFS[28924]: Unexporting all GPFS filesystems > > Any thoughts? My other CNFS node is handling everything for the time being, thankfully! > > Thanks, > Bryan > > --- > Bryan Hill > Lead System Administrator > UCSD Physics Computing Facility > > 9500 Gilman Dr. # 0319 > La Jolla, CA 92093 > +1-858-534-5538 > bhill at ucsd.edu > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From prasad.surampudi at theatsgroup.com Mon Jun 1 17:33:05 2020 From: prasad.surampudi at theatsgroup.com (Prasad Surampudi) Date: Mon, 1 Jun 2020 16:33:05 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 101, Issue 1 In-Reply-To: References: Message-ID: <6B872411-80A0-475B-A8A3-E2BD828BB2F6@theatsgroup.com> So, if cluster_A is running Spec Scale 4.3.2 and Cluster_B is running 5.0.4, then would I be able to mount the filesystem from Cluster_A in Cluster_B as a remote filesystem? And if cluster_B nodes have direct SAN access to the remote cluster_A filesystem, would they be sending all filesystem I/O directly to the disk via Fiber Channel? I am assuming that this should work based on IBM link below. Can anyone from IBM support please confirm this? https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adv_admmcch.htm ?On 6/1/20, 4:45 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) (Jan-Frode Myklebust) 2. Re: Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) (Avila, Geoffrey) 3. Re: gpfsug-discuss Digest, Vol 100, Issue 32 (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) 4. Re: Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) (Jonathan Buzzard) ---------------------------------------------------------------------- Message: 1 Date: Sun, 31 May 2020 18:47:40 +0200 From: Jan-Frode Myklebust To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) Message-ID: Content-Type: text/plain; charset="utf-8" No, this is a common misconception. You don?t need any NSD servers. NSD servers are only needed if you have nodes without direct block access. Remote cluster or not, disk access will be over local block device (without involving NSD servers in any way), or NSD server if local access isn?t available. NSD-servers are not ?arbitrators? over access to a disk, they?re just stupid proxies of IO commands. -jf s?n. 31. mai 2020 kl. 11:31 skrev Jonathan Buzzard < jonathan.buzzard at strath.ac.uk>: > On 29/05/2020 20:55, Stephen Ulmer wrote: > > I have a question about multi-cluster, but it is related to this thread > > (it would be solving the same problem). > > > > Let?s say we have two clusters A and B, both clusters are normally > > shared-everything with no NSD servers defined. > > Er, even in a shared-everything all nodes fibre channel attached you > still have to define NSD servers. That is a given NSD has a server (or > ideally a list of servers) that arbitrate the disk. Unless it has > changed since 3.x days. Never run a 4.x or later with all the disks SAN > attached on all the nodes. > > > We want cluster B to be > > able to use a file system in cluster A. If I zone the SAN such that > > cluster B can see all of cluster A?s disks, can I then define a > > multi-cluster relationship between them and mount a file system from A > on B? > > > > To state it another way, must B's I/O for the foreign file system pass > > though NSD servers in A, or can B?s nodes discover that they have > > FibreChannel paths to those disks and use them? > > > > My understanding is that remote cluster mounts have to pass through the > NSD servers. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sun, 31 May 2020 21:44:12 -0400 From: "Avila, Geoffrey" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) Message-ID: Content-Type: text/plain; charset="utf-8" The local-block-device method of I/O is what is usually termed "SAN mode"; right? On Sun, May 31, 2020 at 12:47 PM Jan-Frode Myklebust wrote: > > No, this is a common misconception. You don?t need any NSD servers. NSD > servers are only needed if you have nodes without direct block access. > > Remote cluster or not, disk access will be over local block device > (without involving NSD servers in any way), or NSD server if local access > isn?t available. NSD-servers are not ?arbitrators? over access to a disk, > they?re just stupid proxies of IO commands. > > > -jf > > s?n. 31. mai 2020 kl. 11:31 skrev Jonathan Buzzard < > jonathan.buzzard at strath.ac.uk>: > >> On 29/05/2020 20:55, Stephen Ulmer wrote: >> > I have a question about multi-cluster, but it is related to this thread >> > (it would be solving the same problem). >> > >> > Let?s say we have two clusters A and B, both clusters are normally >> > shared-everything with no NSD servers defined. >> >> Er, even in a shared-everything all nodes fibre channel attached you >> still have to define NSD servers. That is a given NSD has a server (or >> ideally a list of servers) that arbitrate the disk. Unless it has >> changed since 3.x days. Never run a 4.x or later with all the disks SAN >> attached on all the nodes. >> >> > We want cluster B to be >> > able to use a file system in cluster A. If I zone the SAN such that >> > cluster B can see all of cluster A?s disks, can I then define a >> > multi-cluster relationship between them and mount a file system from A >> on B? >> > >> > To state it another way, must B's I/O for the foreign file system pass >> > though NSD servers in A, or can B?s nodes discover that they have >> > FibreChannel paths to those disks and use them? >> > >> >> My understanding is that remote cluster mounts have to pass through the >> NSD servers. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Sun, 31 May 2020 22:54:11 -0400 From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 100, Issue 32 Message-ID: <83255.1590980051 at turing-police> Content-Type: text/plain; charset="us-ascii" On Fri, 29 May 2020 22:30:08 +0100, Jonathan Buzzard said: > Ethernet goes *very* fast these days you know :-) In fact *much* faster > than fibre channel. Yes, but the justification, purchase, and installation of 40G or 100G Ethernet interfaces in the machines involved, plus the routers/switches along the way, can go very slowly indeed. So finding a way to replace 10G Ether with 16G FC can be a win..... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: ------------------------------ Message: 4 Date: Mon, 1 Jun 2020 09:45:25 +0100 From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Multi-cluster question (was Re: gpfsug-discuss Digest, Vol 100, Issue 32) Message-ID: Content-Type: text/plain; charset=utf-8; format=flowed On 31/05/2020 17:47, Jan-Frode Myklebust wrote: > > No, this is a common misconception.? You don?t need any NSD servers. NSD > servers are only needed if you have nodes without direct block access. > I see that has changed then. In the past mmcrnsd would simply fail without a server list passed to it. If you have been a long term GPFS user (I started with 2.2 on a site that had been running since 1.x days) then we are not always aware of things that have changed. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 101, Issue 1 ********************************************** From stockf at us.ibm.com Mon Jun 1 17:53:33 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 1 Jun 2020 16:53:33 +0000 Subject: [gpfsug-discuss] Importing a Spectrum Scale a filesystem from 4.2.3 cluster to 5.0.4.3 cluster In-Reply-To: References: , <8A9F7C61-E669-41F7-B74D-70B9BC4B3DB1@theatsgroup.com> Message-ID: An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Jun 2 09:14:27 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 2 Jun 2020 08:14:27 +0000 Subject: [gpfsug-discuss] IJ24518: NVME SCSI EMULATION ISSUE - what do do with this announcement, all I get is an APAR number Message-ID: <7EA77DD1-8FEB-4151-838F-C8E983422BFE@id.ethz.ch> Hello, I?m quite upset of the form and usefulness of some IBM announcements like this one: IJ24518: NVME SCSI EMULATION ISSUE How do I translate an APAR number to the spectrum scale or ess release which fix it? And which versions are affected? Need I to download all Readmes and grep for the APAR number? Or do I just don?t know where to get this information? How do you deal with such announcements? I?m tempted to just open a PMR and ask ?. This probably relates to previous posts and RFE for a proper changelog. Excuse if it?s a duplicate or if I did miss the answer in a previous post. Still the quality of this announcements is not what I expect. Just for completeness, maybe someone from IBM takes notice: All I get is an APAR number and the fact that it?s CRITICAL, so I can?t just ignore, but I don?t get * Which ESS versions are affected ? all previous or only since a certain version? * What is the first ESS version fixed? * When am I vulnerable? always, or only certain hardware or configurations or ?.? * What is the impact ? crash due to temporary corruption or permanent data corruption, or metadata or filesystem structure or ..? * How do I know if I?m already affected, what is the fingerprint? * Does a workaround exist? * If this is critical and about a possible data corruption, why isn?t it already indicated in the title/subject but hidden? * why is the error description so cryptic and needs some guessing about the meaning? It?s no sentences, just quick notes. So there is no explicit statement at all. https://www.ibm.com/support/pages/node/6203365?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E Kind regards, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisjscott at gmail.com Tue Jun 2 14:31:05 2020 From: chrisjscott at gmail.com (Chris Scott) Date: Tue, 2 Jun 2020 14:31:05 +0100 Subject: [gpfsug-discuss] Importing a Spectrum Scale a filesystem from 4.2.3 cluster to 5.0.4.3 cluster In-Reply-To: References: <8A9F7C61-E669-41F7-B74D-70B9BC4B3DB1@theatsgroup.com>

Message-ID: Hi Fred The imported filesystem has ~1.5M files that are migrated to Spectrum Protect. Spot checking transparent and selective recalls of a handful of files has been successful after associating them with their correct Spectrum Protect server. They're all also backed up to primary and copy pools in the Spectrum Protect server so having to do a restore instead of recall if it wasn't working was an acceptable risk in favour of trying to persist the GPFS 3.5 cluster on dying hardware and insecure OS, etc. Cheers Chris On Mon, 1 Jun 2020 at 17:53, Frederick Stock wrote: > Chris, it was not clear to me if the file system you imported had files > migrated to Spectrum Protect, that is stub files in GPFS. If the file > system does contain files migrated to Spectrum Protect with just a stub > file in the file system, have you tried to recall any of them to see if > that still works? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: Chris Scott > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Importing a Spectrum Scale a > filesystem from 4.2.3 cluster to 5.0.4.3 cluster > Date: Mon, Jun 1, 2020 9:14 AM > > Sounds like it would work fine. > > I recently exported a 3.5 version filesystem from a GPFS 3.5 cluster to a > 'Scale cluster at 5.0.2.3 software and 5.0.2.0 cluster version. I > concurrently mapped the NSDs to new NSD servers in the 'Scale cluster, > mmexported the filesystem and changed the NSD servers configuration of the > NSDs using the mmimportfs ChangeSpecFile. The original (creation) > filesystem version of this filesystem is 3.2.1.5. > > To my pleasant surprise the filesystem mounted and worked fine while still > at 3.5 filesystem version. Plan B would have been to "mmchfs > -V full" and then mmmount, but I was able to update the filesystem to > 5.0.2.0 version while already mounted. > > This was further pleasantly successful as the filesystem in question is > DMAPI-enabled, with the majority of the data on tape using Spectrum Protect > for Space Management than the volume resident/pre-migrated on disk. > > The complexity is further compounded by this filesystem being associated > to a different Spectrum Protect server than an existing DMAPI-enabled > filesystem in the 'Scale cluster. Preparation of configs and subsequent > commands to enable and use Spectrum Protect for Space Management > multiserver for migration and backup all worked smoothly as per the docs. > > I was thus able to get rid of the GPFS 3.5 cluster on legacy hardware, OS, > GPFS and homebrew CTDB SMB and NFS and retain the filesystem with its > majority of tape-stored data on current hardware, OS and 'Scale/'Protect > with CES SMB and NFS. > > The future objective remains to move all the data from this historical > filesystem to a newer one to get the benefits of larger block and inode > sizes, etc, although since the data is mostly dormant and kept for > compliance/best-practice purposes, the main goal will be to head off > original file system version 3.2 era going end of support. > > Cheers > Chris > > On Thu, 28 May 2020 at 23:31, Prasad Surampudi < > prasad.surampudi at theatsgroup.com> wrote: > > We have two scale clusters, cluster-A running version Scale 4.2.3 and > RHEL6/7 and Cluster-B running Spectrum Scale 5.0.4 and RHEL 8.1. All the > nodes in both Cluster-A and Cluster-B are direct attached and no NSD > servers. We have our current filesystem gpfs_4 in Cluster-A and new > filesystem gpfs_5 in Cluster-B. We want to copy all our data from gpfs_4 > filesystem into gpfs_5 which has variable block size. So, can we map NSDs > of gpfs_4 to Cluster-B nodes and do a mmexportfs of gpfs_4 from Cluster-A > and mmimportfs into Cluster-B so that we have both filesystems available on > same node in Cluster-B for copying data across fiber channel? If > mmexportfs/mmimportfs works, can we delete nodes from Cluster-A and add > them to Cluster-B without upgrading RHEL or GPFS versions for now and plan > upgrading them at a later time? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yassin at us.ibm.com Wed Jun 3 00:43:52 2020 From: yassin at us.ibm.com (Mustafa Mah) Date: Tue, 2 Jun 2020 19:43:52 -0400 Subject: [gpfsug-discuss] Fw: Fwd: IJ24518: NVME SCSI EMULATION ISSUE - what do do with this announcement, all I get is an APAR number Message-ID: Heiner, Here is an alert for this APAR IJ24518 which has more details. https://www.ibm.com/support/pages/node/6210439 Regards, Mustafa > ---------- Forwarded message --------- > From: Billich Heinrich Rainer (ID SD) > Date: Tue, Jun 2, 2020 at 4:29 AM > Subject: [gpfsug-discuss] IJ24518: NVME SCSI EMULATION ISSUE - what > do do with this announcement, all I get is an APAR number > To: gpfsug main discussion list > > Hello, > > I?m quite upset of the form and usefulness of some IBM announcements > like this one: > > IJ24518: NVME SCSI EMULATION ISSUE > > How do I translate an APAR number to the spectrum scale or ess > release which fix it? And which versions are affected? Need I to > download all Readmes and grep for the APAR number? Or do I just > don?t know where to get this information? How do you deal with such > announcements? I?m tempted to just open a PMR and ask ?. > > This probably relates to previous posts and RFE for a proper > changelog. Excuse if it?s a duplicate or if I did miss the answer in > a previous post. Still the quality of ?this announcements is not > what I expect. > > Just for completeness, maybe someone from IBM takes notice: > > All I get is an APAR number and the fact that it?s CRITICAL, so I > can?t just ignore, but I don?t get > > Which ESS versions are affected ? all previous or only since a > certain version? > What is the first ESS version fixed? > When am I vulnerable? always, or only certain hardware or > configurations or ?.? > What is the impact ? crash due to temporary corruption or permanent > data corruption, or metadata or filesystem structure or ..? > How do I know if I?m already affected, what is the fingerprint? > Does a workaround exist? > If this is critical and about a possible data corruption, why isn?t > it already indicated in the title/subject but hidden? > why is the error description so cryptic and needs some guessing > about the meaning? It?s no sentences, just quick notes. So there is > no explicit statement at all. > > > https://www.ibm.com/support/pages/node/6203365? > myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > Kind regards, > > Heiner > > -- > ======================= > Heinrich Billich > ETH Z?rich > Informatikdienste > Tel.: +41 44 632 72 56 > heinrich.billich at id.ethz.ch > ======================== > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Jun 3 16:16:05 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 3 Jun 2020 16:16:05 +0100 Subject: [gpfsug-discuss] Immutible attribute Message-ID: Hum, on a "normal" Linux file system only the root user can change the immutible attribute on a file. Running on 4.2.3 I have just removed the immutible attribute as an ordinary user if I am the owner of the file. I would suggest that this is a bug as the manual page for mmchattr does not mention this. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Wed Jun 3 16:25:43 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 3 Jun 2020 15:25:43 +0000 Subject: [gpfsug-discuss] Immutible attribute In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Jun 3 16:45:02 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 3 Jun 2020 16:45:02 +0100 Subject: [gpfsug-discuss] Immutible attribute In-Reply-To: References:

Message-ID: <518af0ad-fa75-70a1-20c4-6a77e55817bb@strath.ac.uk> On 03/06/2020 16:25, Frederick Stock wrote: > Could you please provide the exact Scale version, or was it really 4.2.3.0? > 4.2.3-7 with setuid taken off a bunch of the utilities per relevant CVE while I work on the upgrade to 5.0.5 JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Achim.Rehor at de.ibm.com Wed Jun 3 17:50:37 2020 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 3 Jun 2020 18:50:37 +0200 Subject: [gpfsug-discuss] Fw: Fwd: IJ24518: NVME SCSI EMULATION ISSUE - what do do with this announcement, all I get is an APAR number In-Reply-To: References: Message-ID: Also, 5.0.4 PTF4 efix1 contains a fix for that. Mit freundlichen Gr??en / Kind regards Achim Rehor gpfsug-discuss-bounces at spectrumscale.org wrote on 03/06/2020 01:43:52: > From: "Mustafa Mah" > To: gpfsug-discuss at spectrumscale.org > Date: 03/06/2020 01:44 > Subject: [EXTERNAL] [gpfsug-discuss] Fw: Fwd: IJ24518: NVME SCSI > EMULATION ISSUE - what do do with this announcement, all I get is an > APAR number > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Heiner, > > Here is an alert for this APAR IJ24518 which has more details. > > https://www.ibm.com/support/pages/node/6210439 > > Regards, > Mustafa > > > ---------- Forwarded message --------- > > From: Billich Heinrich Rainer (ID SD) > > Date: Tue, Jun 2, 2020 at 4:29 AM > > Subject: [gpfsug-discuss] IJ24518: NVME SCSI EMULATION ISSUE - what > > do do with this announcement, all I get is an APAR number > > To: gpfsug main discussion list > > > > > Hello, > > > > I?m quite upset of the form and usefulness of some IBM announcements > > like this one: > > > > IJ24518: NVME SCSI EMULATION ISSUE > > > > How do I translate an APAR number to the spectrum scale or ess > > release which fix it? And which versions are affected? Need I to > > download all Readmes and grep for the APAR number? Or do I just > > don?t know where to get this information? How do you deal with such > > announcements? I?m tempted to just open a PMR and ask ?. > > > > This probably relates to previous posts and RFE for a proper > > changelog. Excuse if it?s a duplicate or if I did miss the answer in > > a previous post. Still the quality of this announcements is not > > what I expect. > > > > Just for completeness, maybe someone from IBM takes notice: > > > > All I get is an APAR number and the fact that it?s CRITICAL, so I > > can?t just ignore, but I don?t get > > > > Which ESS versions are affected ? all previous or only since a > > certain version? > > What is the first ESS version fixed? > > When am I vulnerable? always, or only certain hardware or > > configurations or ?.? > > What is the impact ? crash due to temporary corruption or permanent > > data corruption, or metadata or filesystem structure or ..? > > How do I know if I?m already affected, what is the fingerprint? > > Does a workaround exist? > > If this is critical and about a possible data corruption, why isn?t > > it already indicated in the title/subject but hidden? > > why is the error description so cryptic and needs some guessing > > about the meaning? It?s no sentences, just quick notes. So there is > > no explicit statement at all. > > > > > > https://www.ibm.com/support/pages/node/6203365? > > myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > Kind regards, > > > > Heiner > > > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=W0o9gusq8r9RXIck94yh8Db326oZ63-ctZOFhRGuJ9A&s=drq- > La060No88jLIMNwJCD6U67UYmALEzbQ58qyI65c&e= From Robert.Oesterlin at nuance.com Wed Jun 3 17:16:41 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 3 Jun 2020 16:16:41 +0000 Subject: [gpfsug-discuss] File heat - remote file systems? Message-ID: Is it possible to collect file heat data on a remote mounted file system? if I enable file heat in the remote cluster, will that get picked up? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Wed Jun 3 20:11:17 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Wed, 03 Jun 2020 20:11:17 +0100 Subject: [gpfsug-discuss] Introducing SSUG::Digital Message-ID: Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: San Francisco, USA at 08:00 PDT New York, USA at 11:00 EDT London, United Kingdom at 16:00 BST Frankfurt, Germany at 17:00 CEST Pune, India at 20:30 IST We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. More details at: https://www.spectrumscaleug.org/introducing-ssugdigital/ (We?ll announce the speakers and topic of the first session in the next few days ?) Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. Simon Thompson SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Wed Jun 3 22:45:05 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Wed, 3 Jun 2020 21:45:05 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Hello, Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jun 3 22:56:04 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 3 Jun 2020 21:56:04 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Wed Jun 3 23:23:40 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Wed, 3 Jun 2020 22:23:40 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Frederick, Yes on both counts! - mmdf is showing pretty uniform (ie 5 NSDs out of 30 report 65% free; All others are uniform at 58% free)... NSD servers per disks are called in round-robin fashion as well, for example: gpfs1 tier2_001 nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib gpfs1 tier2_002 nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib gpfs1 tier2_003 nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib gpfs1 tier2_004 tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib Any other potential culprits to investigate? I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): [nsd03-ib ~]# mmdiag --waiters === mmdiag: waiters === Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O completion Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O completion Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O completion nsd04-ib: Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O completion Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O completion Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O completion tsm01-ib: Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O completion Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O completion Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O completion Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O completion nsd01-ib: Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for I/O completion Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for I/O completion Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, June 3, 2020 4:56 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 101, Issue 6 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Introducing SSUG::Digital (Simon Thompson (Spectrum Scale User Group Chair)) 2. Client Latency and High NSD Server Load Average (Saula, Oluwasijibomi) 3. Re: Client Latency and High NSD Server Load Average (Frederick Stock) ---------------------------------------------------------------------- Message: 1 Date: Wed, 03 Jun 2020 20:11:17 +0100 From: "Simon Thompson (Spectrum Scale User Group Chair)" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Introducing SSUG::Digital Message-ID: Content-Type: text/plain; charset="utf-8" Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: San Francisco, USA at 08:00 PDT New York, USA at 11:00 EDT London, United Kingdom at 16:00 BST Frankfurt, Germany at 17:00 CEST Pune, India at 20:30 IST We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. More details at: https://www.spectrumscaleug.org/introducing-ssugdigital/ (We?ll announce the speakers and topic of the first session in the next few days ?) Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. Simon Thompson SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 3 Jun 2020 21:45:05 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Wed, 3 Jun 2020 21:56:04 +0000 From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 101, Issue 6 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Jun 3 23:21:54 2020 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 3 Jun 2020 15:21:54 -0700 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References:

Message-ID: <0D7E06C2-3BFC-434C-8A81-CC57D9F375B4@lbl.gov> Are you running ESS? > On Jun 3, 2020, at 2:56 PM, Frederick Stock wrote: > > Does the output of mmdf show that data is evenly distributed across your NSDs? If not that could be contributing to your problem. Also, are your NSDs evenly distributed across your NSD servers, and the NSD configured so the first NSD server for each is not the same one? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > ----- Original message ----- > From: "Saula, Oluwasijibomi" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug-discuss at spectrumscale.org" > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] Client Latency and High NSD Server Load Average > Date: Wed, Jun 3, 2020 5:45 PM > > > Hello, > > Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? > > Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? > > We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. > > > > > Thanks, > > Oluwasijibomi (Siji) Saula > HPC Systems Administrator / Information Technology > > Research 2 Building 220B / Fargo ND 58108-6050 > p: 701.231.7749 / www.ndsu.edu > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Thu Jun 4 01:16:13 2020 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 4 Jun 2020 02:16:13 +0200 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Hello, Oluwasijibomi , I suppose you are not running ESS (might be wrong on this). I'd check the IO history on the NSD servers (high IO times?) and in addition the IO traffic at the block device level , e.g. with iostat or the like (still high IO times there? Are the IO sizes ok or too low on the NSD servers with high write latencies? ). What's the picture on your storage back-end? All caches active? Is the storage backend fully loaded or rather idle? How is storage connected? SAS? FC? IB? What is the actual IO pattern when you see these high latencies? Do you run additional apps on some or all of youre NSD servers? Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist Global Technology Services / Project Services Delivery / High Performance Computing +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefalke at de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Gesch?ftsf?hrung: Dr. Thomas Wolter, Sven Schooss Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Date: 03/06/2020 23:45 Subject: [EXTERNAL] [gpfsug-discuss] Client Latency and High NSD Server Load Average Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=ql8z1YSfrzUgT8kXQBMEUuA8uyuprz6-fpvC660vG5A&s=JSYPIzNMZFNp17VaqcNWNuwwUE_nQMKu47mOOUonLp0&e= From ewahl at osc.edu Thu Jun 4 00:56:07 2020 From: ewahl at osc.edu (Wahl, Edward) Date: Wed, 3 Jun 2020 23:56:07 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: , Message-ID: I saw something EXACTLY like this way back in the 3.x days when I had a backend storage unit that had a flaky main memory issue and some enclosures were constantly flapping between controllers for ownership. Some NSDs were affected, some were not. I can imagine this could still happen in 4.x and 5.0.x with the right hardware problem. Were things working before or is this a new installation? What is the backend storage? If you are using device-mapper-multipath, look for events in the messages/syslog. Incorrect path weighting? Using ALUA when it isn't supported? (that can be comically bad! helped a friend diagnose that one at a customer once) Perhaps using the wrong rr_weight or rr_min_io so you have some wacky long io queueing issues where your path_selector cannot keep up with the IO queue? Most of this is easily fixed by using most vendor's suggested settings anymore, IF the hardware is healthy... Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Saula, Oluwasijibomi Sent: Wednesday, June 3, 2020 5:45 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average Hello, Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Jun 4 03:19:49 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 3 Jun 2020 22:19:49 -0400 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Note that if nsd02-ib is offline, that nsd03-ib is now servicing all of the NSDs for *both* servers, and that if nsd03-ib gets busy enough to appear offline, then nsd04-ib would be next in line to get the load of all 3. The two servers with the problems are in line after the one that is off. This is based on the candy striping of the NSD server order (which I think most of us do). NSD fail-over is ?straight-forward? so to speak - the last I checked, it is really fail-over in the listed order not load balancing among the servers (which is why you stripe them). I do *not* know if individual clients make the decision that the I/O for a disk should go through the ?next? NSD server, or if it is done cluster-wide (in the case of intermittently super-slow I/O). Hopefully someone with source code access will answer that, because now I?m curious... Check what path the clients are using to the NSDs, i.e. which server. See if you are surprised. :) -- Stephen > On Jun 3, 2020, at 6:03 PM, Saula, Oluwasijibomi wrote: > > ? > Frederick, > > Yes on both counts! - mmdf is showing pretty uniform (ie 5 NSDs out of 30 report 65% free; All others are uniform at 58% free)... > > NSD servers per disks are called in round-robin fashion as well, for example: > > gpfs1 tier2_001 nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib > gpfs1 tier2_002 nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib > gpfs1 tier2_003 nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib > gpfs1 tier2_004 tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib > > Any other potential culprits to investigate? > > I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): > [nsd03-ib ~]# mmdiag --waiters > === mmdiag: waiters === > Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O completion > Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O completion > Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O completion > > nsd04-ib: > Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O completion > Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O completion > Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O completion > > tsm01-ib: > Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O completion > Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O completion > Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O completion > Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O completion > > nsd01-ib: > Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for I/O completion > Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for I/O completion > > > Thanks, > > Oluwasijibomi (Siji) Saula > HPC Systems Administrator / Information Technology > > Research 2 Building 220B / Fargo ND 58108-6050 > p: 701.231.7749 / www.ndsu.edu > > > > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org > Sent: Wednesday, June 3, 2020 4:56 PM > To: gpfsug-discuss at spectrumscale.org > Subject: gpfsug-discuss Digest, Vol 101, Issue 6 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Introducing SSUG::Digital > (Simon Thompson (Spectrum Scale User Group Chair)) > 2. Client Latency and High NSD Server Load Average > (Saula, Oluwasijibomi) > 3. Re: Client Latency and High NSD Server Load Average > (Frederick Stock) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 03 Jun 2020 20:11:17 +0100 > From: "Simon Thompson (Spectrum Scale User Group Chair)" > > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Introducing SSUG::Digital > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hi All., > > > > I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. > > > > I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: > San Francisco, USA at 08:00 PDT > New York, USA at 11:00 EDT > London, United Kingdom at 16:00 BST > Frankfurt, Germany at 17:00 CEST > Pune, India at 20:30 IST > We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! > > > > (I know the times don?t work for everyone in the global community!) > > > > Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. > > More details at: > > https://www.spectrumscaleug.org/introducing-ssugdigital/ > > > > (We?ll announce the speakers and topic of the first session in the next few days ?) > > > > Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. > > > > We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. > > > > Simon Thompson > > SSUG Group Chair > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Wed, 3 Jun 2020 21:45:05 +0000 > From: "Saula, Oluwasijibomi" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Client Latency and High NSD Server Load > Average > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > > Hello, > > Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? > > Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? > > We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. > > > > Thanks, > > > Oluwasijibomi (Siji) Saula > > HPC Systems Administrator / Information Technology > > > > Research 2 Building 220B / Fargo ND 58108-6050 > > p: 701.231.7749 / www.ndsu.edu > > > > [cid:image001.gif at 01D57DE0.91C300C0] > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Wed, 3 Jun 2020 21:56:04 +0000 > From: "Frederick Stock" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load > Average > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 101, Issue 6 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Jun 4 12:08:19 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 4 Jun 2020 11:08:19 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu Jun 4 16:19:18 2020 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 Jun 2020 11:19:18 -0400 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: ,

Message-ID: Hi, >> I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): Please issue "mmlsdisk -m" in NSD client to ascertain the active NSD server serving a NSD. Since nsd02-ib is offlined, it is possible that some servers would be serving higher NSDs than the rest. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_PoorPerformanceDuetoDiskFailure.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_HealthStateOfNSDserver.htm >> From the waiters you provided I would guess there is something amiss with some of your storage systems. Please ensure there are no "disk rebuild" pertaining to certain NSDs/storage volumes in progress (in the storage subsystem) as this can sometimes impact block-level performance and thus impact latency, especially for write operations. Please ensure that the hardware components constituting the Spectrum Scale stack are healthy and performing optimally. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_pspduetosyslevelcompissue.htm Please refer to the Spectrum Scale documentation (link below) for potential causes (e.g. Scale maintenance operation such as mmapplypolicy/mmestripefs in progress, slow disks) that can be contributing to this issue: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_performanceissues.htm Thanks and Regards, -Kums Kumaran Rajaram Spectrum Scale Development, IBM Systems kums at us.ibm.com From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 06/04/2020 07:08 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Sent by: gpfsug-discuss-bounces at spectrumscale.org >From the waiters you provided I would guess there is something amiss with some of your storage systems. Since those waiters are on NSD servers they are waiting for IO requests to the kernel to complete. Generally IOs are expected to complete in milliseconds, not seconds. You could look at the output of "mmfsadm dump nsd" to see how the GPFS IO queues are working but that would be secondary to checking your storage systems. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "Saula, Oluwasijibomi" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Date: Wed, Jun 3, 2020 6:24 PM Frederick, Yes on both counts! - mmdf is showing pretty uniform (ie 5 NSDs out of 30 report 65% free; All others are uniform at 58% free)... NSD servers per disks are called in round-robin fashion as well, for example: gpfs1 tier2_001 nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib gpfs1 tier2_002 nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib gpfs1 tier2_003 nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib gpfs1 tier2_004 tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib Any other potential culprits to investigate? I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): [nsd03-ib ~]# mmdiag --waiters === mmdiag: waiters === Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O completion Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O completion Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O completion nsd04-ib: Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O completion Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O completion Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O completion tsm01-ib: Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O completion Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O completion Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O completion Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O completion nsd01-ib: Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for I/O completion Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for I/O completion Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, June 3, 2020 4:56 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 101, Issue 6 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Introducing SSUG::Digital (Simon Thompson (Spectrum Scale User Group Chair)) 2. Client Latency and High NSD Server Load Average (Saula, Oluwasijibomi) 3. Re: Client Latency and High NSD Server Load Average (Frederick Stock) ---------------------------------------------------------------------- Message: 1 Date: Wed, 03 Jun 2020 20:11:17 +0100 From: "Simon Thompson (Spectrum Scale User Group Chair)" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Introducing SSUG::Digital Message-ID: Content-Type: text/plain; charset="utf-8" Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: San Francisco, USA at 08:00 PDT New York, USA at 11:00 EDT London, United Kingdom at 16:00 BST Frankfurt, Germany at 17:00 CEST Pune, India at 20:30 IST We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. More details at: https://www.spectrumscaleug.org/introducing-ssugdigital/ (We?ll announce the speakers and topic of the first session in the next few days ?) Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. Simon Thompson SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/e839fc73/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 3 Jun 2020 21:45:05 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hello, Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/2ac14173/attachment-0001.html > ------------------------------ Message: 3 Date: Wed, 3 Jun 2020 21:56:04 +0000 From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/c252f3b9/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 101, Issue 6 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=LdN47e1J6DuQfVtCUGylXISVvrHRgD19C_zEOo8SaJ0&s=ec3M7xE47VugZito3VvpZGvrFrl0faoZl6Oq0-iB-3Y&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Thu Jun 4 16:33:18 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Thu, 4 Jun 2020 15:33:18 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Stephen, Looked into client requests, and it doesn't seem to lean heavily on any one NSD server. Of course, this is an eyeball assessment after reviewing IO request percentages to the different NSD servers from just a few nodes. By the way, I later discovered our TSM/NSD server couldn't handle restoring a read-only file and ended-up writing my output file into GBs asking for my response...that seemed to have contributed to some unnecessary high write IO. However, I still can't understand why write IO operations are 5x more latent than ready operations to the same class of disks. Maybe it's time for a GPFS support ticket... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, June 3, 2020 9:19 PM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 101, Issue 9 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Client Latency and High NSD Server Load Average (Stephen Ulmer) ---------------------------------------------------------------------- Message: 1 Date: Wed, 3 Jun 2020 22:19:49 -0400 From: Stephen Ulmer To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: Content-Type: text/plain; charset="utf-8" Note that if nsd02-ib is offline, that nsd03-ib is now servicing all of the NSDs for *both* servers, and that if nsd03-ib gets busy enough to appear offline, then nsd04-ib would be next in line to get the load of all 3. The two servers with the problems are in line after the one that is off. This is based on the candy striping of the NSD server order (which I think most of us do). NSD fail-over is ?straight-forward? so to speak - the last I checked, it is really fail-over in the listed order not load balancing among the servers (which is why you stripe them). I do *not* know if individual clients make the decision that the I/O for a disk should go through the ?next? NSD server, or if it is done cluster-wide (in the case of intermittently super-slow I/O). Hopefully someone with source code access will answer that, because now I?m curious... Check what path the clients are using to the NSDs, i.e. which server. See if you are surprised. :) -- Stephen > On Jun 3, 2020, at 6:03 PM, Saula, Oluwasijibomi wrote: > > ? > Frederick, > > Yes on both counts! - mmdf is showing pretty uniform (ie 5 NSDs out of 30 report 65% free; All others are uniform at 58% free)... > > NSD servers per disks are called in round-robin fashion as well, for example: > > gpfs1 tier2_001 nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib > gpfs1 tier2_002 nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib > gpfs1 tier2_003 nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib > gpfs1 tier2_004 tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib > > Any other potential culprits to investigate? > > I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): > [nsd03-ib ~]# mmdiag --waiters > === mmdiag: waiters === > Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O completion > Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O completion > Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O completion > > nsd04-ib: > Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O completion > Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O completion > Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O completion > > tsm01-ib: > Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O completion > Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O completion > Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O completion > Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O completion > > nsd01-ib: > Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for I/O completion > Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for I/O completion > > > Thanks, > > Oluwasijibomi (Siji) Saula > HPC Systems Administrator / Information Technology > > Research 2 Building 220B / Fargo ND 58108-6050 > p: 701.231.7749 / www.ndsu.edu > > > > > > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org > Sent: Wednesday, June 3, 2020 4:56 PM > To: gpfsug-discuss at spectrumscale.org > Subject: gpfsug-discuss Digest, Vol 101, Issue 6 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Introducing SSUG::Digital > (Simon Thompson (Spectrum Scale User Group Chair)) > 2. Client Latency and High NSD Server Load Average > (Saula, Oluwasijibomi) > 3. Re: Client Latency and High NSD Server Load Average > (Frederick Stock) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 03 Jun 2020 20:11:17 +0100 > From: "Simon Thompson (Spectrum Scale User Group Chair)" > > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Introducing SSUG::Digital > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hi All., > > > > I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. > > > > I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: > San Francisco, USA at 08:00 PDT > New York, USA at 11:00 EDT > London, United Kingdom at 16:00 BST > Frankfurt, Germany at 17:00 CEST > Pune, India at 20:30 IST > We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! > > > > (I know the times don?t work for everyone in the global community!) > > > > Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. > > More details at: > > https://www.spectrumscaleug.org/introducing-ssugdigital/ > > > > (We?ll announce the speakers and topic of the first session in the next few days ?) > > > > Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. > > > > We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. > > > > Simon Thompson > > SSUG Group Chair > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Wed, 3 Jun 2020 21:45:05 +0000 > From: "Saula, Oluwasijibomi" > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [gpfsug-discuss] Client Latency and High NSD Server Load > Average > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > > Hello, > > Anyone faced a situation where a majority of NSDs have a high load average and a minority don't? > > Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? > > We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance. > > > > Thanks, > > > Oluwasijibomi (Siji) Saula > > HPC Systems Administrator / Information Technology > > > > Research 2 Building 220B / Fargo ND 58108-6050 > > p: 701.231.7749 / www.ndsu.edu > > > > [cid:image001.gif at 01D57DE0.91C300C0] > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Wed, 3 Jun 2020 21:56:04 +0000 > From: "Frederick Stock" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load > Average > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 101, Issue 6 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 101, Issue 9 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Fri Jun 5 02:17:08 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Thu, 04 Jun 2020 21:17:08 -0400 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: <309214.1591319828@turing-police> On Thu, 04 Jun 2020 15:33:18 -0000, "Saula, Oluwasijibomi" said: > However, I still can't understand why write IO operations are 5x more latent > than ready operations to the same class of disks. Two things that may be biting you: First, on a RAID 5 or 6 LUN, most of the time you only need to do 2 physical reads (data and parity block). To do a write, you have to read the old parity block, compute the new value, and write the data block and new parity block. This is often called the "RAID write penalty". Second, if a read size is smaller than the physical block size, the storage array can read a block, and return only the fragment needed. But on a write, it has to read the whole block, splice in the new data, and write back the block - a RMW (read modify write) cycle. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From giovanni.bracco at enea.it Fri Jun 5 12:21:55 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Fri, 5 Jun 2020 13:21:55 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN Message-ID: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> In our lab we have received two storage-servers, Super micro SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID controller (2 GB cache) and before putting them in production for other purposes we have setup a small GPFS test cluster to verify if they can be used as storage (our gpfs production cluster has the licenses based on the NSD sockets, so it would be interesting to expand the storage size just by adding storage-servers in a infiniband based SAN, without changing the number of NSD servers) The test cluster consists of: 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each. 2) a Mellanox FDR switch used as a SAN switch 3) a Truescale QDR switch as GPFS cluster switch 4) two GPFS clients (Supermicro AMD nodes) one port QDR each. All the nodes run CentOS 7.7. On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been configured and it is exported via infiniband as an iSCSI target so that both appear as devices accessed by the srp_daemon on the NSD servers, where multipath (not really necessary in this case) has been configured for these two LIO-ORG devices. GPFS version 5.0.4-0 has been installed and the RDMA has been properly configured Two NSD disk have been created and a GPFS file system has been configured. Very simple tests have been performed using lmdd serial write/read. 1) storage-server local performance: before configuring the RAID6 volume as NSD disk, a local xfs file system was created and lmdd write/read performance for 100 GB file was verified to be about 1 GB/s 2) once the GPFS cluster has been created write/read test have been performed directly from one of the NSD server at a time: write performance 2 GB/s, read performance 1 GB/s for 100 GB file By checking with iostat, it was observed that the I/O in this case involved only the NSD server where the test was performed, so when writing, the double of base performances was obtained, while in reading the same performance as on a local file system, this seems correct. Values are stable when the test is repeated. 3) when the same test is performed from the GPFS clients the lmdd result for a 100 GB file are: write - 900 MB/s and stable, not too bad but half of what is seen from the NSD servers. read - 30 MB/s to 300 MB/s: very low and unstable values No tuning of any kind in all the configuration of the involved system, only default values. Any suggestion to explain the very bad read performance from a GPFS client? Giovanni here are the configuration of the virtual drive on the storage-server and the file system configuration in GPFS Virtual drive ============== Virtual Drive: 2 (Target Id: 2) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 81.856 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 18.190 TB State : Optimal Strip Size : 256 KB Number Of Drives : 11 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disabled GPFS file system from mmlsfs ============================ mmlsfs vsd_gexp2 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 22.00 (5.0.4.0) File system version --create-time Fri Apr 3 19:26:27 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 134217728 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d nsdfs4lun2;nsdfs5lun2 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gexp2 Default mount point --mount-priority 0 Mount priority -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco ================================================== Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate e la casella di posta elettronica da cui e' stata inviata e' da qualificarsi quale strumento aziendale. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679). Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione. Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE 2016/679). If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail. Thanks. ================================================== From janfrode at tanso.net Fri Jun 5 13:58:39 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 5 Jun 2020 14:58:39 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> Message-ID: Could maybe be interesting to drop the NSD servers, and let all nodes access the storage via srp ? Maybe turn off readahead, since it can cause performance degradation when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead always reads too much. This might be the cause of the slow read seen ? maybe you?ll also overflow it if reading from both NSD-servers at the same time? Plus.. it?s always nice to give a bit more pagepool to hhe clients than the default.. I would prefer to start with 4 GB. -jf fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco : > In our lab we have received two storage-servers, Super micro > SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID > controller (2 GB cache) and before putting them in production for other > purposes we have setup a small GPFS test cluster to verify if they can > be used as storage (our gpfs production cluster has the licenses based > on the NSD sockets, so it would be interesting to expand the storage > size just by adding storage-servers in a infiniband based SAN, without > changing the number of NSD servers) > > The test cluster consists of: > > 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each. > 2) a Mellanox FDR switch used as a SAN switch > 3) a Truescale QDR switch as GPFS cluster switch > 4) two GPFS clients (Supermicro AMD nodes) one port QDR each. > > All the nodes run CentOS 7.7. > > On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been > configured and it is exported via infiniband as an iSCSI target so that > both appear as devices accessed by the srp_daemon on the NSD servers, > where multipath (not really necessary in this case) has been configured > for these two LIO-ORG devices. > > GPFS version 5.0.4-0 has been installed and the RDMA has been properly > configured > > Two NSD disk have been created and a GPFS file system has been configured. > > Very simple tests have been performed using lmdd serial write/read. > > 1) storage-server local performance: before configuring the RAID6 volume > as NSD disk, a local xfs file system was created and lmdd write/read > performance for 100 GB file was verified to be about 1 GB/s > > 2) once the GPFS cluster has been created write/read test have been > performed directly from one of the NSD server at a time: > > write performance 2 GB/s, read performance 1 GB/s for 100 GB file > > By checking with iostat, it was observed that the I/O in this case > involved only the NSD server where the test was performed, so when > writing, the double of base performances was obtained, while in reading > the same performance as on a local file system, this seems correct. > Values are stable when the test is repeated. > > 3) when the same test is performed from the GPFS clients the lmdd result > for a 100 GB file are: > > write - 900 MB/s and stable, not too bad but half of what is seen from > the NSD servers. > > read - 30 MB/s to 300 MB/s: very low and unstable values > > No tuning of any kind in all the configuration of the involved system, > only default values. > > Any suggestion to explain the very bad read performance from a GPFS > client? > > Giovanni > > here are the configuration of the virtual drive on the storage-server > and the file system configuration in GPFS > > > Virtual drive > ============== > > Virtual Drive: 2 (Target Id: 2) > Name : > RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 > Size : 81.856 TB > Sector Size : 512 > Is VD emulated : Yes > Parity Size : 18.190 TB > State : Optimal > Strip Size : 256 KB > Number Of Drives : 11 > Span Depth : 1 > Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if > Bad BBU > Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if > Bad BBU > Default Access Policy: Read/Write > Current Access Policy: Read/Write > Disk Cache Policy : Disabled > > > GPFS file system from mmlsfs > ============================ > > mmlsfs vsd_gexp2 > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment > (subblock) size in bytes > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 1 Default number of metadata > replicas > -M 2 Maximum number of metadata > replicas > -r 1 Default number of data > replicas > -R 2 Maximum number of data > replicas > -j cluster Block allocation type > -D nfs4 File locking semantics in > effect > -k all ACL semantics in effect > -n 512 Estimated number of nodes > that will mount file system > -B 1048576 Block size > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 22.00 (5.0.4.0) File system version > --create-time Fri Apr 3 19:26:27 2020 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation > option > --fastea Yes Fast external attributes > enabled? > --encryption No Encryption enabled? > --inode-limit 134217728 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per > full block > -P system Disk storage pools in file > system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d nsdfs4lun2;nsdfs5lun2 Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gexp2 Default mount point > --mount-priority 0 Mount priority > > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > > > ================================================== > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate e la casella di posta elettronica da cui e' stata inviata > e' da qualificarsi quale strumento aziendale. > La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza > di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n. > 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679). > Qualora abbiate ricevuto questo documento per errore siete cortesemente > pregati di darne immediata comunicazione al mittente e di provvedere alla > sua distruzione. Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. > Dissemination, copying, printing or use by anybody else is unauthorised > (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE > 2016/679). > If you are not the intended recipient, please delete this message and any > attachments and advise the sender by return e-mail. Thanks. > > ================================================== > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Fri Jun 5 14:53:23 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Fri, 5 Jun 2020 15:53:23 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> Message-ID: <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> answer in the text On 05/06/20 14:58, Jan-Frode Myklebust wrote: > > Could maybe be interesting to drop the NSD servers, and let all nodes > access the storage via srp ? no we can not: the production clusters fabric is a mix of a QDR based cluster and a OPA based cluster and NSD nodes provide the service to both. > > Maybe turn off readahead, since it can cause performance degradation > when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead > always reads too much. This might be the cause of the slow read seen ? > maybe you?ll also overflow it if reading from both NSD-servers at the > same time? I have switched the readahead off and this produced a small (~10%) increase of performances when reading from a NSD server, but no change in the bad behaviour for the GPFS clients > > > Plus.. it?s always nice to give a bit more pagepool to hhe clients than > the default.. I would prefer to start with 4 GB. we'll do also that and we'll let you know! Giovanni > > > > ? -jf > > fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco > >: > > In our lab we have received two storage-servers, Super micro > SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID > controller (2 GB cache) and before putting them in production for other > purposes we have setup a small GPFS test cluster to verify if they can > be used as storage (our gpfs production cluster has the licenses based > on the NSD sockets, so it would be interesting to expand the storage > size just by adding storage-servers in a infiniband based SAN, without > changing the number of NSD servers) > > The test cluster consists of: > > 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale > each. > 2) a Mellanox FDR switch used as a SAN switch > 3) a Truescale QDR switch as GPFS cluster switch > 4) two GPFS clients (Supermicro AMD nodes) one port QDR each. > > All the nodes run CentOS 7.7. > > On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been > configured and it is exported via infiniband as an iSCSI target so that > both appear as devices accessed by the srp_daemon on the NSD servers, > where multipath (not really necessary in this case) has been configured > for these two LIO-ORG devices. > > GPFS version 5.0.4-0 has been installed and the RDMA has been properly > configured > > Two NSD disk have been created and a GPFS file system has been > configured. > > Very simple tests have been performed using lmdd serial write/read. > > 1) storage-server local performance: before configuring the RAID6 > volume > as NSD disk, a local xfs file system was created and lmdd write/read > performance for 100 GB file was verified to be about 1 GB/s > > 2) once the GPFS cluster has been created write/read test have been > performed directly from one of the NSD server at a time: > > write performance 2 GB/s, read performance 1 GB/s for 100 GB file > > By checking with iostat, it was observed that the I/O in this case > involved only the NSD server where the test was performed, so when > writing, the double of base performances was obtained,? while in > reading > the same performance as on a local file system, this seems correct. > Values are stable when the test is repeated. > > 3) when the same test is performed from the GPFS clients the lmdd > result > for a 100 GB file are: > > write - 900 MB/s and stable, not too bad but half of what is seen from > the NSD servers. > > read - 30 MB/s to 300 MB/s: very low and unstable values > > No tuning of any kind in all the configuration of the involved system, > only default values. > > Any suggestion to explain the very bad? read performance from a GPFS > client? > > Giovanni > > here are the configuration of the virtual drive on the storage-server > and the file system configuration in GPFS > > > Virtual drive > ============== > > Virtual Drive: 2 (Target Id: 2) > Name? ? ? ? ? ? ? ? : > RAID Level? ? ? ? ? : Primary-6, Secondary-0, RAID Level Qualifier-3 > Size? ? ? ? ? ? ? ? : 81.856 TB > Sector Size? ? ? ? ?: 512 > Is VD emulated? ? ? : Yes > Parity Size? ? ? ? ?: 18.190 TB > State? ? ? ? ? ? ? ?: Optimal > Strip Size? ? ? ? ? : 256 KB > Number Of Drives? ? : 11 > Span Depth? ? ? ? ? : 1 > Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if > Bad BBU > Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if > Bad BBU > Default Access Policy: Read/Write > Current Access Policy: Read/Write > Disk Cache Policy? ?: Disabled > > > GPFS file system from mmlsfs > ============================ > > mmlsfs vsd_gexp2 > flag? ? ? ? ? ? ? ? value? ? ? ? ? ? ? ? ? ? description > ------------------- ------------------------ > ----------------------------------- > ? -f? ? ? ? ? ? ? ? ?8192? ? ? ? ? ? ? ? ? ? ?Minimum fragment > (subblock) size in bytes > ? -i? ? ? ? ? ? ? ? ?4096? ? ? ? ? ? ? ? ? ? ?Inode size in bytes > ? -I? ? ? ? ? ? ? ? ?32768? ? ? ? ? ? ? ? ? ? Indirect block size > in bytes > ? -m? ? ? ? ? ? ? ? ?1? ? ? ? ? ? ? ? ? ? ? ? Default number of > metadata > replicas > ? -M? ? ? ? ? ? ? ? ?2? ? ? ? ? ? ? ? ? ? ? ? Maximum number of > metadata > replicas > ? -r? ? ? ? ? ? ? ? ?1? ? ? ? ? ? ? ? ? ? ? ? Default number of data > replicas > ? -R? ? ? ? ? ? ? ? ?2? ? ? ? ? ? ? ? ? ? ? ? Maximum number of data > replicas > ? -j? ? ? ? ? ? ? ? ?cluster? ? ? ? ? ? ? ? ? Block allocation type > ? -D? ? ? ? ? ? ? ? ?nfs4? ? ? ? ? ? ? ? ? ? ?File locking > semantics in > effect > ? -k? ? ? ? ? ? ? ? ?all? ? ? ? ? ? ? ? ? ? ? ACL semantics in effect > ? -n? ? ? ? ? ? ? ? ?512? ? ? ? ? ? ? ? ? ? ? Estimated number of > nodes > that will mount file system > ? -B? ? ? ? ? ? ? ? ?1048576? ? ? ? ? ? ? ? ? Block size > ? -Q? ? ? ? ? ? ? ? ?user;group;fileset? ? ? ?Quotas accounting enabled > ? ? ? ? ? ? ? ? ? ? ?user;group;fileset? ? ? ?Quotas enforced > ? ? ? ? ? ? ? ? ? ? ?none? ? ? ? ? ? ? ? ? ? ?Default quotas enabled > ? --perfileset-quota No? ? ? ? ? ? ? ? ? ? ? ?Per-fileset quota > enforcement > ? --filesetdf? ? ? ? No? ? ? ? ? ? ? ? ? ? ? ?Fileset df enabled? > ? -V? ? ? ? ? ? ? ? ?22.00 (5.0.4.0)? ? ? ? ? File system version > ? --create-time? ? ? Fri Apr? 3 19:26:27 2020 File system creation time > ? -z? ? ? ? ? ? ? ? ?No? ? ? ? ? ? ? ? ? ? ? ?Is DMAPI enabled? > ? -L? ? ? ? ? ? ? ? ?33554432? ? ? ? ? ? ? ? ?Logfile size > ? -E? ? ? ? ? ? ? ? ?Yes? ? ? ? ? ? ? ? ? ? ? Exact mtime mount option > ? -S? ? ? ? ? ? ? ? ?relatime? ? ? ? ? ? ? ? ?Suppress atime mount > option > ? -K? ? ? ? ? ? ? ? ?whenpossible? ? ? ? ? ? ?Strict replica > allocation > option > ? --fastea? ? ? ? ? ?Yes? ? ? ? ? ? ? ? ? ? ? Fast external attributes > enabled? > ? --encryption? ? ? ?No? ? ? ? ? ? ? ? ? ? ? ?Encryption enabled? > ? --inode-limit? ? ? 134217728? ? ? ? ? ? ? ? Maximum number of inodes > ? --log-replicas? ? ?0? ? ? ? ? ? ? ? ? ? ? ? Number of log replicas > ? --is4KAligned? ? ? Yes? ? ? ? ? ? ? ? ? ? ? is4KAligned? > ? --rapid-repair? ? ?Yes? ? ? ? ? ? ? ? ? ? ? rapidRepair enabled? > ? --write-cache-threshold 0? ? ? ? ? ? ? ? ? ?HAWC Threshold (max > 65536) > ? --subblocks-per-full-block 128? ? ? ? ? ? ? Number of subblocks per > full block > ? -P? ? ? ? ? ? ? ? ?system? ? ? ? ? ? ? ? ? ?Disk storage pools in > file > system > ? --file-audit-log? ?No? ? ? ? ? ? ? ? ? ? ? ?File Audit Logging > enabled? > ? --maintenance-mode No? ? ? ? ? ? ? ? ? ? ? ?Maintenance Mode enabled? > ? -d? ? ? ? ? ? ? ? ?nsdfs4lun2;nsdfs5lun2? ? Disks in file system > ? -A? ? ? ? ? ? ? ? ?yes? ? ? ? ? ? ? ? ? ? ? Automatic mount option > ? -o? ? ? ? ? ? ? ? ?none? ? ? ? ? ? ? ? ? ? ?Additional mount options > ? -T? ? ? ? ? ? ? ? ?/gexp2? ? ? ? ? ? ? ? ? ?Default mount point > ? --mount-priority? ?0? ? ? ? ? ? ? ? ? ? ? ? Mount priority > > > -- > Giovanni Bracco > phone? +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > > > ================================================== > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > alle persone indicate e la casella di posta elettronica da cui e' > stata inviata e' da qualificarsi quale strumento aziendale. > La diffusione, copia o qualsiasi altra azione derivante dalla > conoscenza di queste informazioni sono rigorosamente vietate (art. > 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679). > Qualora abbiate ricevuto questo documento per errore siete > cortesemente pregati di darne immediata comunicazione al mittente e > di provvedere alla sua distruzione. Grazie. > > This e-mail and any attachments is confidential and may contain > privileged information intended for the addressee(s) only. > Dissemination, copying, printing or use by anybody else is > unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent > amendments and GDPR UE 2016/679). > If you are not the intended recipient, please delete this message > and any attachments and advise the sender by return e-mail. Thanks. > > ================================================== > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From oluwasijibomi.saula at ndsu.edu Fri Jun 5 15:24:27 2020 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Fri, 5 Jun 2020 14:24:27 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: Vladis/Kums/Fred/Kevin/Stephen, Thanks so much for your insights, thoughts, and pointers! - Certainly increased my knowledge and understanding of potential culprits to watch for... So we finally discovered the root issue to this problem: An unattended TSM restore exercise profusely writing to a single file, over and over again into the GBs!!..I'm opening up a ticket with TSM support to learn how to mitigate this in the future. But with the RAID 6 writing costs Vladis explained, it now makes sense why the write IO was badly affected... Excerpt from output file: --- User Action is Required --- File '/gpfs1/X/Y/Z/fileABC' is write protected Select an appropriate action 1. Force an overwrite for this object 2. Force an overwrite on all objects that are write protected 3. Skip this object 4. Skip all objects that are write protected A. Abort this operation Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] ... Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Friday, June 5, 2020 6:00 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 101, Issue 12 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Client Latency and High NSD Server Load Average (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) ---------------------------------------------------------------------- Message: 1 Date: Thu, 04 Jun 2020 21:17:08 -0400 From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average Message-ID: <309214.1591319828 at turing-police> Content-Type: text/plain; charset="us-ascii" On Thu, 04 Jun 2020 15:33:18 -0000, "Saula, Oluwasijibomi" said: > However, I still can't understand why write IO operations are 5x more latent > than ready operations to the same class of disks. Two things that may be biting you: First, on a RAID 5 or 6 LUN, most of the time you only need to do 2 physical reads (data and parity block). To do a write, you have to read the old parity block, compute the new value, and write the data block and new parity block. This is often called the "RAID write penalty". Second, if a read size is smaller than the physical block size, the storage array can read a block, and return only the fragment needed. But on a write, it has to read the whole block, splice in the new data, and write back the block - a RMW (read modify write) cycle. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 101, Issue 12 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Fri Jun 5 18:02:49 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 5 Jun 2020 19:02:49 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> Message-ID: fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco : > answer in the text > > On 05/06/20 14:58, Jan-Frode Myklebust wrote: > > > > Could maybe be interesting to drop the NSD servers, and let all nodes > > access the storage via srp ? > > no we can not: the production clusters fabric is a mix of a QDR based > cluster and a OPA based cluster and NSD nodes provide the service to both. > You could potentially still do SRP from QDR nodes, and via NSD for your omnipath nodes. Going via NSD seems like a bit pointless indirection. > > > > Maybe turn off readahead, since it can cause performance degradation > > when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead > > always reads too much. This might be the cause of the slow read seen ? > > maybe you?ll also overflow it if reading from both NSD-servers at the > > same time? > > I have switched the readahead off and this produced a small (~10%) > increase of performances when reading from a NSD server, but no change > in the bad behaviour for the GPFS clients > > > > > > Plus.. it?s always nice to give a bit more pagepool to hhe clients than > > the default.. I would prefer to start with 4 GB. > > we'll do also that and we'll let you know! Could you show your mmlsconfig? Likely you should set maxMBpS to indicate what kind of throughput a client can do (affects GPFS readahead/writebehind). Would typically also increase workerThreads on your NSD servers. 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size. When you write one GPFS block, less than a half RAID stripe is written, which means you need to read back some data to calculate new parities. I would prefer 4 MB block size, and maybe also change to 8+p+q so that one GPFS is a multiple of a full 2 MB stripe. -jf -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sat Jun 6 06:38:31 2020 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Sat, 06 Jun 2020 01:38:31 -0400 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: References: Message-ID: <403018.1591421911@turing-police> On Fri, 05 Jun 2020 14:24:27 -0000, "Saula, Oluwasijibomi" said: > But with the RAID 6 writing costs Vladis explained, it now makes sense why the write IO was badly affected... > Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] > Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] > Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] > Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A] And a read-modify-write on each one.. Ouch. Stuff like that is why making sure program output goes to /var or other local file system is usually a good thing. I seem to remember us getting bit by a similar misbehavior in TSM, but I don't know the details because I was busier with GPFS and LTFS/EE than TSM. Though I have to wonder how TSM could be a decades-old product and still have misbehaviors in basic things like failed reads on input prompts... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From luis.bolinches at fi.ibm.com Sat Jun 6 07:57:06 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Sat, 6 Jun 2020 06:57:06 +0000 Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average In-Reply-To: <403018.1591421911@turing-police> References: <403018.1591421911@turing-police>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: attf6izb.dat Type: application/octet-stream Size: 849 bytes Desc: not available URL: From valleru at cbio.mskcc.org Mon Jun 8 18:44:07 2020 From: valleru at cbio.mskcc.org (Lohit Valleru) Date: Mon, 8 Jun 2020 12:44:07 -0500 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files Message-ID: Hello Everyone, We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve. May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner? We could spend some time to write a custom script, but wanted to know if a tool already exists. Please do let me know, if any one else has come across a similar situation, and the steps/tools used to resolve the same. Regards, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Tue Jun 9 01:56:45 2020 From: jjdoherty at yahoo.com (Jim Doherty) Date: Tue, 9 Jun 2020 00:56:45 +0000 (UTC) Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: Message-ID: <41260341.922388.1591664205443@mail.yahoo.com> You will need to do this with chown from the? c library functions? (could do this from perl or python).?? If you try to change this from a shell script? you will hit the Linux command? which will have a lot more overhead.???? I had a customer attempt this using the shell and it ended up taking forever due to a brain damaged NIS service :-). ?? Jim? On Monday, June 8, 2020, 2:01:39 PM EDT, Lohit Valleru wrote: #yiv6988452566 body{font-family:Helvetica, Arial;font-size:13px;}Hello Everyone, We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve. May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner?We could spend some time to write a custom script, but wanted to know if a tool already exists. Please do let me know, if any one else has come across a similar situation, and the steps/tools used to resolve the same. Regards,Lohit_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Jun 9 03:52:16 2020 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 9 Jun 2020 02:52:16 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Change_uidNumber_and_gidNumber_for_bil?= =?utf-8?q?lions=09of=09files?= In-Reply-To: <41260341.922388.1591664205443@mail.yahoo.com> References: <41260341.922388.1591664205443@mail.yahoo.com>, Message-ID: An HTML attachment was scrubbed... URL: From jtucker at pixitmedia.com Tue Jun 9 07:53:00 2020 From: jtucker at pixitmedia.com (Jez Tucker) Date: Tue, 9 Jun 2020 07:53:00 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: <41260341.922388.1591664205443@mail.yahoo.com>

Message-ID: <82800a82-b8d5-2c6e-f054-5318a770d12d@pixitmedia.com> Hi Lohit (hey Jim & Christof), ? Whilst you _could_ trawl your entire filesystem, flip uids and work out how to successfully replace ACL ids without actually pushing ACLs (which could break defined inheritance options somewhere in your file tree if you had not first audited your filesystem) the systems head in me says: "We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve." Here's the problem: to what SSSD or Centrify would resolve I've done this a few times in the past in a previous life.? In many respects it is easier (and faster!) to remap the AD side to the uids already on the filesystem. E.G. if user foo is id 1234, ensure user foo is 1234 in AD when you move your LDAP world over. Windows ldifde utility can import an ldif from openldap to take the config across. Automation or inline munging can be achieved with powershell or python. I presume there is a large technical blocker which is why you are looking at remapping the filesystem? Best, Jez On 09/06/2020 03:52, Christof Schmitt wrote: > If there are ACLs, then you also need to update all ACLs? > (gpfs_getacl(), update uids and gids in all entries, gpfs_putacl()), > in addition to the chown() call. > ? > Regards, > ? > Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ > christof.schmitt at us.ibm.com? ||? +1-520-799-2469??? (T/L: 321-2469) > ? > ? > > ----- Original message ----- > From: Jim Doherty > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Change uidNumber and > gidNumber for billions of files > Date: Mon, Jun 8, 2020 5:57 PM > ? > ? > You will need to do this with chown from the? c library functions? > (could do this from perl or python).?? If you try to change this > from a shell script? you will hit the Linux command? which will > have a lot more overhead.???? I had a customer attempt this using > the shell and it ended up taking forever due to a brain damaged > NIS service :-). ?? > ? > Jim? > ? > On Monday, June 8, 2020, 2:01:39 PM EDT, Lohit Valleru > wrote: > ? > ? > Hello Everyone, > ? > We are planning to migrate from LDAP to AD, and one of the best > solution was to change the uidNumber and gidNumber to what SSSD or > Centrify would resolve. > ? > May I know, if anyone has come across a tool/tools that can change > the uidNumbers and gidNumbers of billions of files efficiently and > in a reliable manner? > We could spend some time to write a custom script, but wanted to > know if a tool already exists. > ? > Please do let me know, if any one else has come across a similar > situation, and the steps/tools used to resolve the same. > ? > Regards, > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > ? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* VP Research and Development | Pixit Media e: jtucker at pixitmedia.com Visit www.pixitmedia.com -- ? This email is confidential in that it is? intended for the exclusive attention of?the addressee(s) indicated. If you are?not the intended recipient, this email?should not be read or disclosed to?any other person. Please notify the?sender immediately and delete this? email from your computer system.?Any opinions expressed are not?necessarily those of the company?from which this email was sent and,?whilst to the best of our knowledge no?viruses or defects exist, no?responsibility can be accepted for any?loss or damage arising from its?receipt or subsequent use of this?email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Jun 9 09:51:03 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Jun 2020 08:51:03 +0000 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <82800a82-b8d5-2c6e-f054-5318a770d12d@pixitmedia.com> References: <41260341.922388.1591664205443@mail.yahoo.com>

<82800a82-b8d5-2c6e-f054-5318a770d12d@pixitmedia.com> Message-ID: > I presume there is a large technical blocker which is why you are looking at remapping the filesystem? Like anytime there is a corporate AD with mandated attributes? ? Though isn?t there an AD thing now for doing schema view type things now which allow you to inherit certain attributes and overwrite others? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Tue Jun 9 10:03:44 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Tue, 09 Jun 2020 10:03:44 +0100 Subject: [gpfsug-discuss] Introducing SSUG::Digital Message-ID: First talk: https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talk-what-is-new-in-spectrum-scale-5-0-5/ What is new in Spectrum Scale 5.0.5? 18th June 2020. No registration required, just click the Webex link in the page above. Simon From: on behalf of "chair at spectrumscale.org" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 3 June 2020 at 20:11 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Introducing SSUG::Digital Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: San Francisco, USA at 08:00 PDT New York, USA at 11:00 EDT London, United Kingdom at 16:00 BST Frankfurt, Germany at 17:00 CEST Pune, India at 20:30 IST We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. More details at: https://www.spectrumscaleug.org/introducing-ssugdigital/ (We?ll announce the speakers and topic of the first session in the next few days ?) Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. Simon Thompson SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jun 9 12:20:45 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Jun 2020 12:20:45 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: Message-ID: <2257c9db-b311-32b1-e001-14923eccc5a7@strath.ac.uk> On 08/06/2020 18:44, Lohit Valleru wrote: > Hello Everyone, > > We are planning to migrate from LDAP to AD, and one of the best solution > was to change the uidNumber and gidNumber to what SSSD or Centrify would > resolve. > > May I know, if anyone has come across a tool/tools that can change the > uidNumbers and gidNumbers of billions of files efficiently and in a > reliable manner? Not to my knowledge. > We could spend some time to write a custom script, but wanted to know if > a tool already exists. > If you can be sure that all files under a specific directory belong to a specific user and you have no ACL's then a whole bunch of "chown -R" would be reasonable. That is you have a lot of user home directories for example. What I do in these scenarios is use a small sqlite database, say in this scenario which has the directory that I want to chown on, the target UID and GID and a status field. Initially I set the status field to -1 which indicates they have not been processed. The script sets the status field to -2 when it starts processing an entry and on completion sets the status field to the exit code of the command you are running. This way when the script is finished you can see any directory hierarchies that had a problem and if it dies early you can see where it got up to (that -2). You can also do things like set all none zero status codes back to -1 and run again with a simple SQL update on the database from the sqlite CLI. If you don't need to modify ACL's but have mixed ownership under directory hierarchies then a script is reasonable but not a shell script. The overhead of execing chown billions of times on individual files will be astronomical. You need something like Perl or Python and make use of the builtin chown facilities of the language to avoid all those exec's. That said I suspect you will see a significant speed up from using C. If you have ACL's to contend with then I would definitely spend some time and write some C code using the GPFS library. It will be a *LOT* faster than any script ever will be. Dealing with mmpgetacl and mmputacl in any script is horrendous and you will have billions of exec's of each command. As I understand it GPFS stores each ACL once and each file then points to the ACL. Theoretically it would be possible to just modify the stored ACL's for a very speedy update of all the ACL's on the files/directories. However I would imagine you need to engage IBM and bend over while they empty your wallet for that option :-) The biggest issue to take care of IMHO is do any of the input UID/GID numbers exist in the output set??? If so life just got a lot harder as you don't get a second chance to run the script/program if there is a problem. In this case I would be very tempted to remove such clashes prior to the main change. You might be able to do that incrementally before the main switch and update your LDAP in to match. Finally be aware that if you are using TSM for backup you will probably need to back every file up again after the change of ownership as far as I am aware. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ulmer at ulmer.org Tue Jun 9 14:07:32 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Tue, 9 Jun 2020 09:07:32 -0400 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <2257c9db-b311-32b1-e001-14923eccc5a7@strath.ac.uk> References: <2257c9db-b311-32b1-e001-14923eccc5a7@strath.ac.uk> Message-ID: <9F6A4DD4-E715-48C3-A431-3B159FEC5C63@ulmer.org> Jonathan brings up a good point that you?ll only get one shot at this ? if you?re using the file system as your record of who owns what. You might want to use the policy engine to record the existing file names and ownership (and then provide updates using the same policy engine for the things that changed after the last time you ran it). At that point, you?ve got the list of who should own what from before you started. You could even do some things to see how complex your problem is, like "how many directories have files owned by more than one UID?? With respect to that, it is surprising how easy the sqlite C API is to use (though I would still recommend Perl or Python), and equally surprising how *bad* the JOIN performance is. If you go with sqlite, denormalize *everything* as it?s collected. If that is too dirty for you, then just use MariaDB or something else. -- Stephen > On Jun 9, 2020, at 7:20 AM, Jonathan Buzzard wrote: > > On 08/06/2020 18:44, Lohit Valleru wrote: >> Hello Everyone, >> We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve. >> May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner? > > Not to my knowledge. > >> We could spend some time to write a custom script, but wanted to know if a tool already exists. > > If you can be sure that all files under a specific directory belong to a specific user and you have no ACL's then a whole bunch of "chown -R" would be reasonable. That is you have a lot of user home directories for example. > > What I do in these scenarios is use a small sqlite database, say in this scenario which has the directory that I want to chown on, the target UID and GID and a status field. Initially I set the status field to -1 which indicates they have not been processed. The script sets the status field to -2 when it starts processing an entry and on completion sets the status field to the exit code of the command you are running. This way when the script is finished you can see any directory hierarchies that had a problem and if it dies early you can see where it got up to (that -2). > > You can also do things like set all none zero status codes back to -1 and run again with a simple SQL update on the database from the sqlite CLI. > > If you don't need to modify ACL's but have mixed ownership under directory hierarchies then a script is reasonable but not a shell script. The overhead of execing chown billions of times on individual files will be astronomical. You need something like Perl or Python and make use of the builtin chown facilities of the language to avoid all those exec's. That said I suspect you will see a significant speed up from using C. > > If you have ACL's to contend with then I would definitely spend some time and write some C code using the GPFS library. It will be a *LOT* faster than any script ever will be. Dealing with mmpgetacl and mmputacl in any script is horrendous and you will have billions of exec's of each command. > > As I understand it GPFS stores each ACL once and each file then points to the ACL. Theoretically it would be possible to just modify the stored ACL's for a very speedy update of all the ACL's on the files/directories. However I would imagine you need to engage IBM and bend over while they empty your wallet for that option :-) > > The biggest issue to take care of IMHO is do any of the input UID/GID numbers exist in the output set??? If so life just got a lot harder as you don't get a second chance to run the script/program if there is a problem. > > In this case I would be very tempted to remove such clashes prior to the main change. You might be able to do that incrementally before the main switch and update your LDAP in to match. > > Finally be aware that if you are using TSM for backup you will probably need to back every file up again after the change of ownership as far as I am aware. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jun 9 14:57:08 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Jun 2020 14:57:08 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <9F6A4DD4-E715-48C3-A431-3B159FEC5C63@ulmer.org> References: <2257c9db-b311-32b1-e001-14923eccc5a7@strath.ac.uk> <9F6A4DD4-E715-48C3-A431-3B159FEC5C63@ulmer.org> Message-ID: <11f8feb9-1e66-75f7-72c5-90afda46cb30@strath.ac.uk> On 09/06/2020 14:07, Stephen Ulmer wrote: > Jonathan brings up a good point that you?ll only get one shot at this ? > if you?re using the file system as your record of who owns what. Not strictly true if my existing UID's are in the range 10000-19999 and my target UID's are in the range 50000-99999 for example then I get an infinite number of shots at it. It is only if the target and source ranges have any overlap that there is a problem and that should be easy to work out in advance. If it where me and there was overlap between input and output states I would go via an intermediate state where there is no overlap. Linux has had 32bit UID's since a very long time now (we are talking kernel versions <1.0 from memory) so none overlapping mappings are perfectly possible to arrange. > With respect to that, it is surprising how easy the sqlite C API is to > use (though I would still recommend Perl or Python), and equally > surprising how *bad* the JOIN performance is. If you go with sqlite, > denormalize *everything* as it?s collected. If that is too dirty for > you, then just use MariaDB or something else. I actually thinking on it more thought a generic C random UID/GID to UID/GID mapping program is a really simple piece of code and should be nearly as fast as chown -R. It will be very slightly slower as you have to look the mapping up for each file. Read the mappings in from a CSV file into memory and just use nftw/lchown calls to walk the file system and change the UID/GID as necessary. If you are willing to sacrifice some error checking on the input mapping file (not unreasonable to assume it is good) and have some hard coded site settings (avoiding processing command line arguments) then 200 lines of C tops should do it. Depending on how big your input UID/GID ranges are you could even use array indexing for the mapping. For example on our system the UID's start at just over 5000 and end just below 6000 with quite a lot of holes. Just allocate an array of 6000 int's which is only ~24KB and off you go with something like new_uid = uid_mapping[uid]; Nice super speedy lookup of mappings. If you need to manipulate ACL's then C is the only way to go anyway. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Tue Jun 9 23:40:33 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Jun 2020 23:40:33 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <11f8feb9-1e66-75f7-72c5-90afda46cb30@strath.ac.uk> References: <2257c9db-b311-32b1-e001-14923eccc5a7@strath.ac.uk> <9F6A4DD4-E715-48C3-A431-3B159FEC5C63@ulmer.org> <11f8feb9-1e66-75f7-72c5-90afda46cb30@strath.ac.uk> Message-ID: <21a0686f-f080-e81c-0e3e-6974116ba141@strath.ac.uk> On 09/06/2020 14:57, Jonathan Buzzard wrote: [SNIP] > > I actually thinking on it more thought a generic C random UID/GID to > UID/GID mapping program is a really simple piece of code and should be > nearly as fast as chown -R. It will be very slightly slower as you have > to look the mapping up for each file. Read the mappings in from a CSV > file into memory and just use nftw/lchown calls to walk the file system > and change the UID/GID as necessary. > Because I was curious I thought I would have a go this evening coding something up in C. It's standing at 213 lines of code put there is some extra fluff and some error checking and a large copyright comment. Updating ACL's would increase the size too. It would however be relatively simple I think. The public GPFS API documentation on ACL's is incomplete so some guess work and testing would be required. It's stupidly fast on my laptop changing the ownership of the latest version of gcc untarred. However there is only one user in the map file and it's an SSD. Obviously if you have billions of files it is going to take longer :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From aaron.knister at gmail.com Wed Jun 10 02:15:55 2020 From: aaron.knister at gmail.com (Aaron Knister) Date: Tue, 9 Jun 2020 21:15:55 -0400 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: Message-ID: Lohit, I did this while working @ NASA. I had two tools I used, one affectionately known as "luke file walker" (to modify traditional unix permissions) and the other known as the "milleniumfacl" (to modify posix ACLs). Stupid jokes aside, there were some real technical challenges here. I don't know if anyone from the NCCS team at NASA is on the list, but if they are perhaps they'll jump in if they're willing to share the code :) >From what I recall, I used uthash and the gpfs API's to store in-memory a hash of inodes and their uid/gid information. I then walked the filesystem using the gpfs API's and could lookup the given inode in the in-memory hash to view its ownership details. Both the inode traversal and directory walk were parallelized/threaded. They way I actually executed the chown was particularly security-minded. There is a race condition that exists if you chown /path/to/file. All it takes is either a malicious user or someone monkeying around with the filesystem while it's live to accidentally chown the wrong file if a symbolic link ends up in the file path. My work around was to use openat() and fchmod (I think that was it, I played with this quite a bit to get it right) and for every path to be chown'd I would walk the hierarchy, opening each component with the O_NOFOLLOW flags to be sure I didn't accidentally stumble across a symlink in the way. I also implemented caching of open path component file descriptors since odds are I would be chowning/chgrp'ing files in the same directory. That bought me some speed up. I opened up RFE's at one point, I believe, for gpfs API calls to do this type of operation. I would ideally have liked a mechanism to do this based on inode number rather than path which would help avoid issues of race conditions. One of the gotchas to be aware of, is quotas. My wrapper script would clone quotas from the old uid to the new uid. That's easy enough. However, keep in mind, if the uid is over their quota your chown operation will absolutely kill your cluster. Once a user is over their quota the filesystem seems to want to quiesce all of its accounting information on every filesystem operation for that user. I would check for adequate quota headroom for the user in question and abort if there wasn't enough. The ACL changes were much more tricky. There's no way, of which I'm aware, to atomically update ACL entries. You run the risk that you could clobber a user's ACL update if it occurs in the milliseconds between you reading the ACL and updating it as part of the UID/GID update. Thankfully we were using Posix ACLs which were easier for me to deal with programmatically. I still had the security concern over symbolic links appearing in paths to have their ACLs updated either maliciously or organically. I was able to deal with that by modifying libacl to implement ACL calls that used variants of xattr calls that took file descriptors as arguments and allowed me to throw nofollow flags. That code is here ( https://github.com/aaronknister/acl/commits/nofollow). I couldn't take advantage of the GPFS API's here to meet my requirements, so I just walked the filesystem tree in parallel if I recall correctly, retrieved every ACL and updated if necessary. If you're using NFS4 ACLs... I don't have an easy answer for you :) We did manage to migrate UID numbers for several hundred users and half a billion inodes in a relatively small amount of time with the filesystem active. Some of the concerns about symbolic links can be mitigated if there are no users active on the filesystem while the migration is underway. -Aaron On Mon, Jun 8, 2020 at 2:01 PM Lohit Valleru wrote: > Hello Everyone, > > We are planning to migrate from LDAP to AD, and one of the best solution > was to change the uidNumber and gidNumber to what SSSD or Centrify would > resolve. > > May I know, if anyone has come across a tool/tools that can change the > uidNumbers and gidNumbers of billions of files efficiently and in a > reliable manner? > We could spend some time to write a custom script, but wanted to know if a > tool already exists. > > Please do let me know, if any one else has come across a similar > situation, and the steps/tools used to resolve the same. > > Regards, > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Wed Jun 10 08:25:08 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Wed, 10 Jun 2020 08:25:08 +0100 Subject: [gpfsug-discuss] Introducing SSUG::Digital In-Reply-To: References: Message-ID: So someone pointed out we?re using webex events for this, in theory there is a ?join in browser? option, if you don?t have the webex client already installed. However that also doesn?t appear to work in Chrome/Ubuntu20.04 ? so you might want to check your browser/plugin works *before* next week. You can use https://www.webex.com/test-meeting.html to do a test of Webex. Simon From: on behalf of "chair at spectrumscale.org" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 June 2020 at 10:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Introducing SSUG::Digital First talk: https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talk-what-is-new-in-spectrum-scale-5-0-5/ What is new in Spectrum Scale 5.0.5? 18th June 2020. No registration required, just click the Webex link in the page above. Simon From: on behalf of "chair at spectrumscale.org" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 3 June 2020 at 20:11 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Introducing SSUG::Digital Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s: San Francisco, USA at 08:00 PDT New York, USA at 11:00 EDT London, United Kingdom at 16:00 BST Frankfurt, Germany at 17:00 CEST Pune, India at 20:30 IST We estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different Spectrum Scale Experts or Deep Dive session. More details at: https://www.spectrumscaleug.org/introducing-ssugdigital/ (We?ll announce the speakers and topic of the first session in the next few days ?) Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in getting this going. We?re keen to include some user talks and site updates later in the series, so please let me know if you might be interested in presenting in this format. Simon Thompson SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jun 10 08:33:03 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Jun 2020 07:33:03 +0000 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: Message-ID: Quota ? I thought there was a work around for this. I think it went along the lines of. Set the soft quota to what you want. Set the hard quota 150% more. Set the grace period to 1 second. I think the issue is that when you are over soft quota, each operation has to queisce each time until you hit hard/grace period. Whereas once you hit grace, it no longer does this. I was just looking for the slide deck about this, but can?t find it at the moment! Tomer spoke about it at one point. Simon From: on behalf of "aaron.knister at gmail.com" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 10 June 2020 at 02:16 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files Lohit, I did this while working @ NASA. I had two tools I used, one affectionately known as "luke file walker" (to modify traditional unix permissions) and the other known as the "milleniumfacl" (to modify posix ACLs). Stupid jokes aside, there were some real technical challenges here. I don't know if anyone from the NCCS team at NASA is on the list, but if they are perhaps they'll jump in if they're willing to share the code :) From what I recall, I used uthash and the gpfs API's to store in-memory a hash of inodes and their uid/gid information. I then walked the filesystem using the gpfs API's and could lookup the given inode in the in-memory hash to view its ownership details. Both the inode traversal and directory walk were parallelized/threaded. They way I actually executed the chown was particularly security-minded. There is a race condition that exists if you chown /path/to/file. All it takes is either a malicious user or someone monkeying around with the filesystem while it's live to accidentally chown the wrong file if a symbolic link ends up in the file path. My work around was to use openat() and fchmod (I think that was it, I played with this quite a bit to get it right) and for every path to be chown'd I would walk the hierarchy, opening each component with the O_NOFOLLOW flags to be sure I didn't accidentally stumble across a symlink in the way. I also implemented caching of open path component file descriptors since odds are I would be chowning/chgrp'ing files in the same directory. That bought me some speed up. I opened up RFE's at one point, I believe, for gpfs API calls to do this type of operation. I would ideally have liked a mechanism to do this based on inode number rather than path which would help avoid issues of race conditions. One of the gotchas to be aware of, is quotas. My wrapper script would clone quotas from the old uid to the new uid. That's easy enough. However, keep in mind, if the uid is over their quota your chown operation will absolutely kill your cluster. Once a user is over their quota the filesystem seems to want to quiesce all of its accounting information on every filesystem operation for that user. I would check for adequate quota headroom for the user in question and abort if there wasn't enough. The ACL changes were much more tricky. There's no way, of which I'm aware, to atomically update ACL entries. You run the risk that you could clobber a user's ACL update if it occurs in the milliseconds between you reading the ACL and updating it as part of the UID/GID update. Thankfully we were using Posix ACLs which were easier for me to deal with programmatically. I still had the security concern over symbolic links appearing in paths to have their ACLs updated either maliciously or organically. I was able to deal with that by modifying libacl to implement ACL calls that used variants of xattr calls that took file descriptors as arguments and allowed me to throw nofollow flags. That code is here ( https://github.com/aaronknister/acl/commits/nofollow). I couldn't take advantage of the GPFS API's here to meet my requirements, so I just walked the filesystem tree in parallel if I recall correctly, retrieved every ACL and updated if necessary. If you're using NFS4 ACLs... I don't have an easy answer for you :) We did manage to migrate UID numbers for several hundred users and half a billion inodes in a relatively small amount of time with the filesystem active. Some of the concerns about symbolic links can be mitigated if there are no users active on the filesystem while the migration is underway. -Aaron On Mon, Jun 8, 2020 at 2:01 PM Lohit Valleru > wrote: Hello Everyone, We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve. May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner? We could spend some time to write a custom script, but wanted to know if a tool already exists. Please do let me know, if any one else has come across a similar situation, and the steps/tools used to resolve the same. Regards, Lohit _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Jun 10 12:33:09 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Jun 2020 12:33:09 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: Message-ID: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk> On 10/06/2020 02:15, Aaron Knister wrote: > Lohit, > > I did this while working @ NASA. I had two tools I used, one > affectionately known as "luke file walker" (to modify traditional unix > permissions) and the other known as the "milleniumfacl" (to modify posix > ACLs). Stupid jokes aside, there were some real technical challenges here. > > I don't know if anyone from the NCCS team at NASA is on the list, but if > they are perhaps they'll jump in if they're willing to share the code :) > > From what I recall, I used uthash and the gpfs API's to store in-memory > a hash of inodes and their uid/gid information. I then walked the > filesystem using the gpfs API's and could lookup the given inode in the > in-memory hash to view its ownership details. Both the inode traversal > and directory walk were parallelized/threaded. They way I actually > executed the chown was particularly security-minded. There is a race > condition that exists if you chown /path/to/file. All it takes is either > a malicious user or someone monkeying around with the filesystem while > it's live to accidentally chown the wrong file if a symbolic link ends > up in the file path. Well I would expect this needs to be done with no user access to the system. Or at the very least no user access for the bits you are currently modifying. Otherwise you are going to end up in a complete mess. > My work around was to use openat() and fchmod (I > think that was it, I played with this quite a bit to get it right) and > for every path to be chown'd I would walk the hierarchy, opening each > component with the O_NOFOLLOW flags to be sure I didn't accidentally > stumble across a symlink in the way. Or you could just use lchown so you change the ownership of the symbolic link rather than the file it is pointing to. You need to change the ownership of the symbolic link not the file it is linking to, that will be picked up elsewhere in the scan. If you don't change the ownership of the symbolic link you are going to be left with a bunch of links owned by none existent users. No race condition exists if you are doing it properly in the first place :-) I concluded that the standard nftw system call was more suited to this than the GPFS inode scan. I could see no way to turn an inode into a path to the file which lchownn, gpfs_getacl and gpfs_putacl all use. I think the problem with the GPFS inode scan is that is is for a backup application. Consequently there are some features it is lacking for more general purpose programs looking for a quick way to traverse the file system. An other example is that the gpfs_iattr_t structure returned from gpfs_stat_inode does not contain any information as to whether the file is a symbolic link like a standard stat call does. > I also implemented caching of open > path component file descriptors since odds are I would be > chowning/chgrp'ing files in the same directory. That bought me some > speed up. > More reasons to use nftw for now, no need to open any files :-) > I opened up RFE's at one point, I believe, for gpfs API calls to do this > type of operation. I would ideally have liked a mechanism to do this > based on inode number rather than path which would help avoid issues of > race conditions. > Well lchown to the rescue, but that does require a path to the file. The biggest problem is the inability to get a path given an inode using the GPFS inode scan which is why I steered away from it. In theory you could use gpfs_igetattrsx/gpfs_iputattrsx to modify the UID/GID of the file, but they are returned in an opaque format, so it's not possible :-( > One of the gotchas to be aware of, is quotas. My wrapper script would > clone quotas from the old uid to the new uid. That's easy enough. > However, keep in mind, if the uid is over their quota your chown > operation will absolutely kill your cluster. Once a user is over their > quota the filesystem seems to want to quiesce all of its accounting > information on every filesystem operation for that user. I would check > for adequate quota headroom for the user in question and abort if there > wasn't enough. Had not thought of that one. Surely the simple solution would be to set the quota's on the mapped UID/GID's after the change has been made. Then the filesystem operation would not be for the user over quota but for the new user? The other alternative is to dump the quotas to file and remove them. Change the UID's and GID's then restore the quotas on the new UID/GID's. As I said earlier surely the end users have no access to the file system while the modifications are being made. If they do all hell is going to break loose IMHO. > > The ACL changes were much more tricky. There's no way, of which I'm > aware, to atomically update ACL entries. You run the risk that you could > clobber a user's ACL update if it occurs in the milliseconds between you > reading the ACL and updating it as part of the UID/GID update. > Thankfully we were using Posix ACLs which were easier for me to deal > with programmatically. I still had the security concern over symbolic > links appearing in paths to have their ACLs updated either maliciously > or organically. I was able to deal with that by modifying libacl to > implement ACL calls that used variants of xattr calls that took file > descriptors as arguments and allowed me to throw nofollow flags. That > code is here ( > https://github.com/aaronknister/acl/commits/nofollow > ). > I couldn't take advantage of the GPFS API's here to meet my > requirements, so I just walked the filesystem tree in parallel if I > recall correctly, retrieved every ACL and updated if necessary. > > If you're using NFS4 ACLs... I don't have an easy answer for you :) You call gpfs_getacl, walk the array of ACL's returned changing any UID/GID's as required and then call gpfs_putacl. You can modify both Posix and NFSv4 ACL's with this call. Given they only take a path to the file another reason to use nftw rather than GPFS inode scan. As I understand even if your file system is set to an ACL type of "all", any individual file/directory can only have either Posix *or* NSFv4 ACLS (ignoring the fact you can set your filesystem ACL's type to the undocumented Samba), so can all be handled automatically. Note if you are using nftw to walk the file system then you get a standard system stat structure for every file/directory and you could just skip symbolic links. I don't think you can set an ACL on a symbolic link anyway. You certainly can't set standard permissions on them. It would be sensible to wrap the main loop in gpfs_lib_init/gpfs_lib_term in this scenario. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Wed Jun 10 12:58:07 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Jun 2020 12:58:07 +0100 Subject: [gpfsug-discuss] Infiniband/Ethernet gateway Message-ID: <7e656f24-e00b-a95f-2f6e-a8223310e708@strath.ac.uk> We have a mixture of 10Gb Ethernet and Infiniband connected (using IPoIB) nodes on our compute cluster using a DSS-G for storage. Each SR650 has a bonded pair of 40Gb Ethernet connections and a 40Gb Infiniband connection. Performance and stability are *way* better than the old Lustre system. Now for this to work the Ethernet connected nodes have to be able to talk to the Infiniband connected ones so I have a server acting as a gateway. This has been running fine for a couple of years now. However it occurs to me now that instead of having a dedicated server performing these duties it would make more sense to use the SR650's of the DSS-G. It would be one less thing for me to look after :-) Can anyone think of a reason not to do this? It also occurs to me that one could do some sort of VRRP style failover to remove the single point of failure that is currently the gateway machine. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From valleru at cbio.mskcc.org Wed Jun 10 16:31:20 2020 From: valleru at cbio.mskcc.org (Lohit Valleru) Date: Wed, 10 Jun 2020 10:31:20 -0500 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk> References: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk> Message-ID: Thank you everyone for the Inputs. The answers to some of the questions are as follows: > From Jez: I've done this a few times in the past in a previous life.? In many respects it is easier (and faster!) to remap the AD side to the uids already on the filesystem. - Yes we had considered/attempted this, and it does work pretty good. It is actually much faster than using SSSD auto id mapping. But the main issue with this approach was to automate entry of uidNumbers and gidNumbers for all the enterprise users/groups across the agency. Both the approaches have there pros and cons. For now, we wanted to see the amount of effort that would be needed to change the uidNumbers and gidNumbers on the filesystem side, in case the other option of entering existing uidNumber/gidNumber data on AD does not work out. > Does the filesystem have ACLs? And which ACLs? ?Since we have CES servers that export the filesystems on SMB protocol -> The filesystems use NFS4 ACL mode. As far as we know - We know of only one fileset that is extensively using NFS4 ACLs. > Can we take a downtime to do this change? For the current GPFS storage clusters which are are production - we are thinking of taking a downtime to do the same per cluster. For new clusters/storage clusters, we are thinking of changing to AD before any new data is written to the storage.? > Do the uidNumbers/gidNumbers conflict? No. The current uidNumber and gidNumber are in 1000 - 8000 range, while the new uidNumbers,gidNumbers are above 1000000.? I was thinking of taking a backup of the current state of the filesystem, with respect to posix permissions/owner/group and the respective quotas. Disable quotas with a downtime before making changes. I might mostly start small with a single lab, and only change files without ACLs. May I know if anyone has a method/tool to find out which files/dirs have NFS4 ACLs set? As far as we know - it is just one fileset/lab, but it would be good to confirm if we have them set across any other files/dirs in the filesystem. The usual methods do not seem to work. ? Jonathan/Aaron, Thank you for the inputs regarding the scripts/APIs/symlinks and ACLs. I will try to see what I can do given the current state. I too wish GPFS API could be better at managing this kind of scenarios ?but I understand that this kind of huge changes might be pretty rare. Thank you, Lohit On June 10, 2020 at 6:33:45 AM, Jonathan Buzzard (jonathan.buzzard at strath.ac.uk) wrote: On 10/06/2020 02:15, Aaron Knister wrote: > Lohit, > > I did this while working @ NASA. I had two tools I used, one > affectionately known as "luke file walker" (to modify traditional unix > permissions) and the other known as the "milleniumfacl" (to modify posix > ACLs). Stupid jokes aside, there were some real technical challenges here. > > I don't know if anyone from the NCCS team at NASA is on the list, but if > they are perhaps they'll jump in if they're willing to share the code :) > > From what I recall, I used uthash and the gpfs API's to store in-memory > a hash of inodes and their uid/gid information. I then walked the > filesystem using the gpfs API's and could lookup the given inode in the > in-memory hash to view its ownership details. Both the inode traversal > and directory walk were parallelized/threaded. They way I actually > executed the chown was particularly security-minded. There is a race > condition that exists if you chown /path/to/file. All it takes is either > a malicious user or someone monkeying around with the filesystem while > it's live to accidentally chown the wrong file if a symbolic link ends > up in the file path. Well I would expect this needs to be done with no user access to the system. Or at the very least no user access for the bits you are currently modifying. Otherwise you are going to end up in a complete mess. > My work around was to use openat() and fchmod (I > think that was it, I played with this quite a bit to get it right) and > for every path to be chown'd I would walk the hierarchy, opening each > component with the O_NOFOLLOW flags to be sure I didn't accidentally > stumble across a symlink in the way. Or you could just use lchown so you change the ownership of the symbolic link rather than the file it is pointing to. You need to change the ownership of the symbolic link not the file it is linking to, that will be picked up elsewhere in the scan. If you don't change the ownership of the symbolic link you are going to be left with a bunch of links owned by none existent users. No race condition exists if you are doing it properly in the first place :-) I concluded that the standard nftw system call was more suited to this than the GPFS inode scan. I could see no way to turn an inode into a path to the file which lchownn, gpfs_getacl and gpfs_putacl all use. I think the problem with the GPFS inode scan is that is is for a backup application. Consequently there are some features it is lacking for more general purpose programs looking for a quick way to traverse the file system. An other example is that the gpfs_iattr_t structure returned from gpfs_stat_inode does not contain any information as to whether the file is a symbolic link like a standard stat call does. > I also implemented caching of open > path component file descriptors since odds are I would be > chowning/chgrp'ing files in the same directory. That bought me some > speed up. > More reasons to use nftw for now, no need to open any files :-) > I opened up RFE's at one point, I believe, for gpfs API calls to do this > type of operation. I would ideally have liked a mechanism to do this > based on inode number rather than path which would help avoid issues of > race conditions. > Well lchown to the rescue, but that does require a path to the file. The biggest problem is the inability to get a path given an inode using the GPFS inode scan which is why I steered away from it. In theory you could use gpfs_igetattrsx/gpfs_iputattrsx to modify the UID/GID of the file, but they are returned in an opaque format, so it's not possible :-( > One of the gotchas to be aware of, is quotas. My wrapper script would > clone quotas from the old uid to the new uid. That's easy enough. > However, keep in mind, if the uid is over their quota your chown > operation will absolutely kill your cluster. Once a user is over their > quota the filesystem seems to want to quiesce all of its accounting > information on every filesystem operation for that user. I would check > for adequate quota headroom for the user in question and abort if there > wasn't enough. Had not thought of that one. Surely the simple solution would be to set the quota's on the mapped UID/GID's after the change has been made. Then the filesystem operation would not be for the user over quota but for the new user? The other alternative is to dump the quotas to file and remove them. Change the UID's and GID's then restore the quotas on the new UID/GID's. As I said earlier surely the end users have no access to the file system while the modifications are being made. If they do all hell is going to break loose IMHO. > > The ACL changes were much more tricky. There's no way, of which I'm > aware, to atomically update ACL entries. You run the risk that you could > clobber a user's ACL update if it occurs in the milliseconds between you > reading the ACL and updating it as part of the UID/GID update. > Thankfully we were using Posix ACLs which were easier for me to deal > with programmatically. I still had the security concern over symbolic > links appearing in paths to have their ACLs updated either maliciously > or organically. I was able to deal with that by modifying libacl to > implement ACL calls that used variants of xattr calls that took file > descriptors as arguments and allowed me to throw nofollow flags. That > code is here ( > https://github.com/aaronknister/acl/commits/nofollow > ). > I couldn't take advantage of the GPFS API's here to meet my > requirements, so I just walked the filesystem tree in parallel if I > recall correctly, retrieved every ACL and updated if necessary. > > If you're using NFS4 ACLs... I don't have an easy answer for you :) You call gpfs_getacl, walk the array of ACL's returned changing any UID/GID's as required and then call gpfs_putacl. You can modify both Posix and NFSv4 ACL's with this call. Given they only take a path to the file another reason to use nftw rather than GPFS inode scan. As I understand even if your file system is set to an ACL type of "all", any individual file/directory can only have either Posix *or* NSFv4 ACLS (ignoring the fact you can set your filesystem ACL's type to the undocumented Samba), so can all be handled automatically. Note if you are using nftw to walk the file system then you get a standard system stat structure for every file/directory and you could just skip symbolic links. I don't think you can set an ACL on a symbolic link anyway. You certainly can't set standard permissions on them. It would be sensible to wrap the main loop in gpfs_lib_init/gpfs_lib_term in this scenario. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Jun 10 23:29:53 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Jun 2020 23:29:53 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk> Message-ID: On 10/06/2020 16:31, Lohit Valleru wrote: [SNIP] > I might mostly start small with a single lab, and only change files > without ACLs. May I know if anyone has a method/tool to find out which > files/dirs have NFS4 ACLs set? As far as we know - it is just one > fileset/lab, but it would be good to confirm if we have them set > across any other files/dirs in the filesystem. The usual methods do > not seem to work. Use mmgetacl a file at a time and try and do something with the output? Tools to manipulate ACL's from on GPFS mounted nodes suck donkey balls, and have been that way for over a decade. Last time I raised this with IBM I was told that was by design... If they are CES then look at it client side from a Windows node? The alternative is to write something in C that calls gpfs_getacl. However it was an evening to get a basic UID remap code working in C. It would not take much more effort to make it handle ACL's. As such I would work on the premise that there are ACL's and handle it. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ckerner at illinois.edu Wed Jun 10 23:40:41 2020 From: ckerner at illinois.edu (Kerner, Chad A) Date: Wed, 10 Jun 2020 22:40:41 +0000 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: References: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk>

Message-ID: <7775CB36-AEB5-4AFB-B8E3-64B608AAAC46@illinois.edu> You can do a policy scan though and get a list of files that have ACLs applied to them. Then you would not have to check every file with a shell utility or C, just process that list. Likewise, you can get the uid/gid as well and process that list with the new mapping(split it into multiple lists, processing multiple threads on multiple machines). While it is by no means the prettiest or possibly best way to handle the POSIX ACLs, I had whipped up a python api for it: https://github.com/ckerner/ssacl . It only does POSIX though. We use it in conjunction with acls (https://github.com/ckerner/acls), an ls replacement that shows effective user/group permissions based off of the acl's because most often the user would just look at the POSIX perms and say something is broken, without checking the acl. -- Chad Kerner, Senior Storage Engineer Storage Enabling Technologies National Center for Supercomputing Applications University of Illinois, Urbana-Champaign ?On 6/10/20, 5:30 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 10/06/2020 16:31, Lohit Valleru wrote: [SNIP] > I might mostly start small with a single lab, and only change files > without ACLs. May I know if anyone has a method/tool to find out which > files/dirs have NFS4 ACLs set? As far as we know - it is just one > fileset/lab, but it would be good to confirm if we have them set > across any other files/dirs in the filesystem. The usual methods do > not seem to work. Use mmgetacl a file at a time and try and do something with the output? Tools to manipulate ACL's from on GPFS mounted nodes suck donkey balls, and have been that way for over a decade. Last time I raised this with IBM I was told that was by design... If they are CES then look at it client side from a Windows node? The alternative is to write something in C that calls gpfs_getacl. However it was an evening to get a basic UID remap code working in C. It would not take much more effort to make it handle ACL's. As such I would work on the premise that there are ACL's and handle it. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From giovanni.bracco at enea.it Thu Jun 11 08:53:01 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Thu, 11 Jun 2020 09:53:01 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> Message-ID: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> Comments and updates in the text: On 05/06/20 19:02, Jan-Frode Myklebust wrote: > fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco > >: > > answer in the text > > On 05/06/20 14:58, Jan-Frode Myklebust wrote: > > > > Could maybe be interesting to drop the NSD servers, and let all > nodes > > access the storage via srp ? > > no we can not: the production clusters fabric is a mix of a QDR based > cluster and a OPA based cluster and NSD nodes provide the service to > both. > > > You could potentially still do SRP from QDR nodes, and via NSD for your > omnipath nodes. Going via NSD seems like a bit pointless indirection. not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share the same data lake in Spectrum Scale/GPFS so the NSD servers support the flexibility of the setup. NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at the moment 3 different generations of DDN storages are connected, 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less expensive storage, to be used when performance is not the first priority. > > > > > > > Maybe turn off readahead, since it can cause performance degradation > > when GPFS reads 1 MB blocks scattered on the NSDs, so that > read-ahead > > always reads too much. This might be the cause of the slow read > seen ? > > maybe you?ll also overflow it if reading from both NSD-servers at > the > > same time? > > I have switched the readahead off and this produced a small (~10%) > increase of performances when reading from a NSD server, but no change > in the bad behaviour for the GPFS clients > > > > > > > > Plus.. it?s always nice to give a bit more pagepool to hhe > clients than > > the default.. I would prefer to start with 4 GB. > > we'll do also that and we'll let you know! > > > Could you show your mmlsconfig? Likely you should set maxMBpS to > indicate what kind of throughput a client can do (affects GPFS > readahead/writebehind).? Would typically also increase workerThreads on > your NSD servers. At this moment this is the output of mmlsconfig # mmlsconfig Configuration data for cluster GPFSEXP.portici.enea.it: ------------------------------------------------------- clusterName GPFSEXP.portici.enea.it clusterId 13274694257874519577 autoload no dmapiFileHandleSize 32 minReleaseLevel 5.0.4.0 ccrEnabled yes cipherList AUTHONLY verbsRdma enable verbsPorts qib0/1 [cresco-gpfq7,cresco-gpfq8] verbsPorts qib0/2 [common] pagepool 4G adminMode central File systems in cluster GPFSEXP.portici.enea.it: ------------------------------------------------ /dev/vsd_gexp2 /dev/vsd_gexp3 > > > 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size. > When you write one GPFS block, less than a half RAID stripe is written, > which means you ?need to read back some data to calculate new parities. > I would prefer 4 MB block size, and maybe also change to 8+p+q so that > one GPFS is a multiple of a full 2 MB stripe. > > > ? ?-jf we have now added another file system based on 2 NSD on RAID6 8+p+q, keeping the 1MB block size just not to change too many things at the same time, but no substantial change in very low readout performances, that are still of the order of 50 MB/s while write performance are 1000MB/s Any other suggestion is welcomed! Giovanni -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From luis.bolinches at fi.ibm.com Thu Jun 11 09:01:46 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 11 Jun 2020 08:01:46 +0000 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> References: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it>, <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it><4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Jun 11 09:45:50 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 11 Jun 2020 09:45:50 +0100 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> Message-ID: On 11/06/2020 08:53, Giovanni Bracco wrote: [SNIP] > not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share > the same data lake in Spectrum Scale/GPFS so the NSD servers support the > flexibility of the setup. > > NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at > the moment 3 different generations of DDN storages are connected, > 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less > expensive storage, to be used when performance is not the first priority. > Ring up Lenovo and get a pricing on some DSS-G storage :-) They can be configured with OPA and Infiniband (though I am not sure if both at the same time) and are only slightly more expensive than the traditional DIY Lego brick approach. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From janfrode at tanso.net Thu Jun 11 11:13:36 2020 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 11 Jun 2020 12:13:36 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it> <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> Message-ID: On Thu, Jun 11, 2020 at 9:53 AM Giovanni Bracco wrote: > > > > > You could potentially still do SRP from QDR nodes, and via NSD for your > > omnipath nodes. Going via NSD seems like a bit pointless indirection. > > not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share > the same data lake in Spectrum Scale/GPFS so the NSD servers support the > flexibility of the setup. > Maybe there's something I don't understand, but couldn't you use the NSD-servers to serve to your OPA nodes, and then SRP directly for your 300 QDR-nodes?? > At this moment this is the output of mmlsconfig > > # mmlsconfig > Configuration data for cluster GPFSEXP.portici.enea.it: > ------------------------------------------------------- > clusterName GPFSEXP.portici.enea.it > clusterId 13274694257874519577 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 5.0.4.0 > ccrEnabled yes > cipherList AUTHONLY > verbsRdma enable > verbsPorts qib0/1 > [cresco-gpfq7,cresco-gpfq8] > verbsPorts qib0/2 > [common] > pagepool 4G > adminMode central > > File systems in cluster GPFSEXP.portici.enea.it: > ------------------------------------------------ > /dev/vsd_gexp2 > /dev/vsd_gexp3 > > So, trivial close to default config.. assume the same for the client cluster. I would correct MaxMBpS -- put it at something reasonable, enable verbsRdmaSend=yes and ignorePrefetchLUNCount=yes. > > > > > > > 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size. > > When you write one GPFS block, less than a half RAID stripe is written, > > which means you need to read back some data to calculate new parities. > > I would prefer 4 MB block size, and maybe also change to 8+p+q so that > > one GPFS is a multiple of a full 2 MB stripe. > > > > > > -jf > > we have now added another file system based on 2 NSD on RAID6 8+p+q, > keeping the 1MB block size just not to change too many things at the > same time, but no substantial change in very low readout performances, > that are still of the order of 50 MB/s while write performance are 1000MB/s > > Any other suggestion is welcomed! > > Maybe rule out the storage, and check if you get proper throughput from nsdperf? Maybe also benchmark using "gpfsperf" instead of "lmdd", and show your full settings -- so that we see that the benchmark is sane :-) -jf -------------- next part -------------- An HTML attachment was scrubbed... URL: From giovanni.bracco at enea.it Thu Jun 11 15:06:45 2020 From: giovanni.bracco at enea.it (Giovanni Bracco) Date: Thu, 11 Jun 2020 16:06:45 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: <81131f13-eef2-2a29-28a8-2de22f905a54@enea.it> <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> <4c221d1e-8531-3ee9-083c-8aa5ec62fd62@enea.it>

Message-ID: <050593b2-8256-1f84-1a3a-978583103211@enea.it> 256K Giovanni On 11/06/20 10:01, Luis Bolinches wrote: > On that RAID 6 what is the logical RAID block size? 128K, 256K, other? > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations > / Salutacions > Luis Bolinches > Consultant IT Specialist > IBM Spectrum Scale development > ESS & client adoption teams > Mobile Phone: +358503112585 > *https://www.youracclaim.com/user/luis-bolinches* > Ab IBM Finland Oy > Laajalahdentie 23 > 00330 Helsinki > Uusimaa - Finland > > *"If you always give you will always have" -- ?Anonymous* > > ----- Original message ----- > From: Giovanni Bracco > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: Jan-Frode Myklebust , gpfsug main discussion > list > Cc: Agostino Funel > Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance > in simple spectrum scale/gpfs cluster with a storage-server SAN > Date: Thu, Jun 11, 2020 10:53 > Comments and updates in the text: > > On 05/06/20 19:02, Jan-Frode Myklebust wrote: > > fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco > > >: > > > > ? ? answer in the text > > > > ? ? On 05/06/20 14:58, Jan-Frode Myklebust wrote: > > ? ? ?> > > ? ? ?> Could maybe be interesting to drop the NSD servers, and > let all > > ? ? nodes > > ? ? ?> access the storage via srp ? > > > > ? ? no we can not: the production clusters fabric is a mix of a > QDR based > > ? ? cluster and a OPA based cluster and NSD nodes provide the > service to > > ? ? both. > > > > > > You could potentially still do SRP from QDR nodes, and via NSD > for your > > omnipath nodes. Going via NSD seems like a bit pointless indirection. > > not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share > the same data lake in Spectrum Scale/GPFS so the NSD servers support the > flexibility of the setup. > > NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at > the moment 3 different generations of DDN storages are connected, > 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less > expensive storage, to be used when performance is not the first > priority. > > > > > > > > > ? ? ?> > > ? ? ?> Maybe turn off readahead, since it can cause performance > degradation > > ? ? ?> when GPFS reads 1 MB blocks scattered on the NSDs, so that > > ? ? read-ahead > > ? ? ?> always reads too much. This might be the cause of the slow > read > > ? ? seen ? > > ? ? ?> maybe you?ll also overflow it if reading from both > NSD-servers at > > ? ? the > > ? ? ?> same time? > > > > ? ? I have switched the readahead off and this produced a small > (~10%) > > ? ? increase of performances when reading from a NSD server, but > no change > > ? ? in the bad behaviour for the GPFS clients > > > > > > ? ? ?> > > ? ? ?> > > ? ? ?> Plus.. it?s always nice to give a bit more pagepool to hhe > > ? ? clients than > > ? ? ?> the default.. I would prefer to start with 4 GB. > > > > ? ? we'll do also that and we'll let you know! > > > > > > Could you show your mmlsconfig? Likely you should set maxMBpS to > > indicate what kind of throughput a client can do (affects GPFS > > readahead/writebehind).? Would typically also increase > workerThreads on > > your NSD servers. > > At this moment this is the output of mmlsconfig > > # mmlsconfig > Configuration data for cluster GPFSEXP.portici.enea.it: > ------------------------------------------------------- > clusterName GPFSEXP.portici.enea.it > clusterId 13274694257874519577 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 5.0.4.0 > ccrEnabled yes > cipherList AUTHONLY > verbsRdma enable > verbsPorts qib0/1 > [cresco-gpfq7,cresco-gpfq8] > verbsPorts qib0/2 > [common] > pagepool 4G > adminMode central > > File systems in cluster GPFSEXP.portici.enea.it: > ------------------------------------------------ > /dev/vsd_gexp2 > /dev/vsd_gexp3 > > > > > > > > 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip > size. > > When you write one GPFS block, less than a half RAID stripe is > written, > > which means you ?need to read back some data to calculate new > parities. > > I would prefer 4 MB block size, and maybe also change to 8+p+q so > that > > one GPFS is a multiple of a full 2 MB stripe. > > > > > > ?? ?-jf > > we have now added another file system based on 2 NSD on RAID6 8+p+q, > keeping the 1MB block size just not to change too many things at the > same time, but no substantial change in very low readout performances, > that are still of the order of 50 MB/s while write performance are > 1000MB/s > > Any other suggestion is welcomed! > > Giovanni > > > > -- > Giovanni Bracco > phone ?+39 351 8804788 > E-mail ?giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW http://www.afs.enea.it/bracco From luis.bolinches at fi.ibm.com Thu Jun 11 15:11:14 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 11 Jun 2020 14:11:14 +0000 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <050593b2-8256-1f84-1a3a-978583103211@enea.it> Message-ID: 8 data * 256K does not align to your 1MB Raid 6 is already not the best option for writes. I would look into use multiples of 2MB block sizes. -- Cheers > On 11. Jun 2020, at 17.07, Giovanni Bracco wrote: > > ?256K > > Giovanni > >> On 11/06/20 10:01, Luis Bolinches wrote: >> On that RAID 6 what is the logical RAID block size? 128K, 256K, other? >> -- >> Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations >> / Salutacions >> Luis Bolinches >> Consultant IT Specialist >> IBM Spectrum Scale development >> ESS & client adoption teams >> Mobile Phone: +358503112585 >> *https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e= >> Ab IBM Finland Oy >> Laajalahdentie 23 >> 00330 Helsinki >> Uusimaa - Finland >> >> *"If you always give you will always have" -- Anonymous* >> >> ----- Original message ----- >> From: Giovanni Bracco >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: Jan-Frode Myklebust , gpfsug main discussion >> list >> Cc: Agostino Funel >> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance >> in simple spectrum scale/gpfs cluster with a storage-server SAN >> Date: Thu, Jun 11, 2020 10:53 >> Comments and updates in the text: >> >>> On 05/06/20 19:02, Jan-Frode Myklebust wrote: >>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco >>> >: >>> >>> answer in the text >>> >>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote: >>> > >>> > Could maybe be interesting to drop the NSD servers, and >> let all >>> nodes >>> > access the storage via srp ? >>> >>> no we can not: the production clusters fabric is a mix of a >> QDR based >>> cluster and a OPA based cluster and NSD nodes provide the >> service to >>> both. >>> >>> >>> You could potentially still do SRP from QDR nodes, and via NSD >> for your >>> omnipath nodes. Going via NSD seems like a bit pointless indirection. >> >> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share >> the same data lake in Spectrum Scale/GPFS so the NSD servers support the >> flexibility of the setup. >> >> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at >> the moment 3 different generations of DDN storages are connected, >> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less >> expensive storage, to be used when performance is not the first >> priority. >> >>> >>> >>> >>> > >>> > Maybe turn off readahead, since it can cause performance >> degradation >>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that >>> read-ahead >>> > always reads too much. This might be the cause of the slow >> read >>> seen ? >>> > maybe you?ll also overflow it if reading from both >> NSD-servers at >>> the >>> > same time? >>> >>> I have switched the readahead off and this produced a small >> (~10%) >>> increase of performances when reading from a NSD server, but >> no change >>> in the bad behaviour for the GPFS clients >>> >>> >>> > >>> > >>> > Plus.. it?s always nice to give a bit more pagepool to hhe >>> clients than >>> > the default.. I would prefer to start with 4 GB. >>> >>> we'll do also that and we'll let you know! >>> >>> >>> Could you show your mmlsconfig? Likely you should set maxMBpS to >>> indicate what kind of throughput a client can do (affects GPFS >>> readahead/writebehind). Would typically also increase >> workerThreads on >>> your NSD servers. >> >> At this moment this is the output of mmlsconfig >> >> # mmlsconfig >> Configuration data for cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------------- >> clusterName GPFSEXP.portici.enea.it >> clusterId 13274694257874519577 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 5.0.4.0 >> ccrEnabled yes >> cipherList AUTHONLY >> verbsRdma enable >> verbsPorts qib0/1 >> [cresco-gpfq7,cresco-gpfq8] >> verbsPorts qib0/2 >> [common] >> pagepool 4G >> adminMode central >> >> File systems in cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------ >> /dev/vsd_gexp2 >> /dev/vsd_gexp3 >> >> >>> >>> >>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip >> size. >>> When you write one GPFS block, less than a half RAID stripe is >> written, >>> which means you need to read back some data to calculate new >> parities. >>> I would prefer 4 MB block size, and maybe also change to 8+p+q so >> that >>> one GPFS is a multiple of a full 2 MB stripe. >>> >>> >>> -jf >> >> we have now added another file system based on 2 NSD on RAID6 8+p+q, >> keeping the 1MB block size just not to change too many things at the >> same time, but no substantial change in very low readout performances, >> that are still of the order of 50 MB/s while write performance are >> 1000MB/s >> >> Any other suggestion is welcomed! >> >> Giovanni >> >> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bracco at enea.it >> WWW https://urldefense.proofpoint.com/v2/url?u=http-3A__www.afs.enea.it_bracco&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=q-8zfr3t0TGWOicysbq0ezzL2xpk3dzDg2m1plcsWm0&e= >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=CZv204_tsb3M3xIwxRyIyvTjptoQL-gD-VhzUkMRyrc&e= >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=CZv204_tsb3M3xIwxRyIyvTjptoQL-gD-VhzUkMRyrc&e= >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW https://urldefense.proofpoint.com/v2/url?u=http-3A__www.afs.enea.it_bracco&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=q-8zfr3t0TGWOicysbq0ezzL2xpk3dzDg2m1plcsWm0&e= > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Jun 11 16:10:06 2020 From: david_johnson at brown.edu (David Johnson) Date: Thu, 11 Jun 2020 11:10:06 -0400 Subject: [gpfsug-discuss] mmremotecluster access from SS 5.0.x to 4.2.3-x refuses id_rsa.pub Message-ID: <37B478B3-46A8-4A1F-87F1-DC949BCE84DA@brown.edu> I'm trying to access an old GPFS filesystem from a new cluster. It is good up to the point of adding the SSL keys of the old cluster on the new one. I get from mmremotecluster add command: File ...._id_rsa.pub does not contain a nist sp 800-131a compliance key Is there any way to override this? The old cluster will go away before the end of the summer. From jamervi at sandia.gov Thu Jun 11 16:13:32 2020 From: jamervi at sandia.gov (Mervini, Joseph A) Date: Thu, 11 Jun 2020 15:13:32 +0000 Subject: [gpfsug-discuss] [EXTERNAL] mmremotecluster access from SS 5.0.x to 4.2.3-x refuses id_rsa.pub In-Reply-To: <37B478B3-46A8-4A1F-87F1-DC949BCE84DA@brown.edu> References: <37B478B3-46A8-4A1F-87F1-DC949BCE84DA@brown.edu> Message-ID: mmchconfig nistCompliance=off on the newer system should work. ==== Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jamervi at sandia.gov ?On 6/11/20, 9:10 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: I'm trying to access an old GPFS filesystem from a new cluster. It is good up to the point of adding the SSL keys of the old cluster on the new one. I get from mmremotecluster add command: File ...._id_rsa.pub does not contain a nist sp 800-131a compliance key Is there any way to override this? The old cluster will go away before the end of the summer. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From UWEFALKE at de.ibm.com Thu Jun 11 21:41:52 2020 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 11 Jun 2020 22:41:52 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: <050593b2-8256-1f84-1a3a-978583103211@enea.it> Message-ID: While that point (block size should be an integer multiple of the RAID stripe width) is a good one, its violation would explain slow writes, but Giovanni talks of slow reads ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist Global Technology Services / Project Services Delivery / High Performance Computing +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefalke at de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Gesch?ftsf?hrung: Dr. Thomas Wolter, Sven Schooss Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Luis Bolinches" To: "Giovanni Bracco" Cc: gpfsug main discussion list , agostino.funel at enea.it Date: 11/06/2020 16:11 Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN Sent by: gpfsug-discuss-bounces at spectrumscale.org 8 data * 256K does not align to your 1MB Raid 6 is already not the best option for writes. I would look into use multiples of 2MB block sizes. -- Cheers > On 11. Jun 2020, at 17.07, Giovanni Bracco wrote: > > 256K > > Giovanni > >> On 11/06/20 10:01, Luis Bolinches wrote: >> On that RAID 6 what is the logical RAID block size? 128K, 256K, other? >> -- >> Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations >> / Salutacions >> Luis Bolinches >> Consultant IT Specialist >> IBM Spectrum Scale development >> ESS & client adoption teams >> Mobile Phone: +358503112585 >> *https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e= >> Ab IBM Finland Oy >> Laajalahdentie 23 >> 00330 Helsinki >> Uusimaa - Finland >> >> *"If you always give you will always have" -- Anonymous* >> >> ----- Original message ----- >> From: Giovanni Bracco >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: Jan-Frode Myklebust , gpfsug main discussion >> list >> Cc: Agostino Funel >> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance >> in simple spectrum scale/gpfs cluster with a storage-server SAN >> Date: Thu, Jun 11, 2020 10:53 >> Comments and updates in the text: >> >>> On 05/06/20 19:02, Jan-Frode Myklebust wrote: >>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco >>> >: >>> >>> answer in the text >>> >>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote: >>> > >>> > Could maybe be interesting to drop the NSD servers, and >> let all >>> nodes >>> > access the storage via srp ? >>> >>> no we can not: the production clusters fabric is a mix of a >> QDR based >>> cluster and a OPA based cluster and NSD nodes provide the >> service to >>> both. >>> >>> >>> You could potentially still do SRP from QDR nodes, and via NSD >> for your >>> omnipath nodes. Going via NSD seems like a bit pointless indirection. >> >> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share >> the same data lake in Spectrum Scale/GPFS so the NSD servers support the >> flexibility of the setup. >> >> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at >> the moment 3 different generations of DDN storages are connected, >> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less >> expensive storage, to be used when performance is not the first >> priority. >> >>> >>> >>> >>> > >>> > Maybe turn off readahead, since it can cause performance >> degradation >>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that >>> read-ahead >>> > always reads too much. This might be the cause of the slow >> read >>> seen ? >>> > maybe you?ll also overflow it if reading from both >> NSD-servers at >>> the >>> > same time? >>> >>> I have switched the readahead off and this produced a small >> (~10%) >>> increase of performances when reading from a NSD server, but >> no change >>> in the bad behaviour for the GPFS clients >>> >>> >>> > >>> > >>> > Plus.. it?s always nice to give a bit more pagepool to hhe >>> clients than >>> > the default.. I would prefer to start with 4 GB. >>> >>> we'll do also that and we'll let you know! >>> >>> >>> Could you show your mmlsconfig? Likely you should set maxMBpS to >>> indicate what kind of throughput a client can do (affects GPFS >>> readahead/writebehind). Would typically also increase >> workerThreads on >>> your NSD servers. >> >> At this moment this is the output of mmlsconfig >> >> # mmlsconfig >> Configuration data for cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------------- >> clusterName GPFSEXP.portici.enea.it >> clusterId 13274694257874519577 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 5.0.4.0 >> ccrEnabled yes >> cipherList AUTHONLY >> verbsRdma enable >> verbsPorts qib0/1 >> [cresco-gpfq7,cresco-gpfq8] >> verbsPorts qib0/2 >> [common] >> pagepool 4G >> adminMode central >> >> File systems in cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------ >> /dev/vsd_gexp2 >> /dev/vsd_gexp3 >> >> >>> >>> >>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip >> size. >>> When you write one GPFS block, less than a half RAID stripe is >> written, >>> which means you need to read back some data to calculate new >> parities. >>> I would prefer 4 MB block size, and maybe also change to 8+p+q so >> that >>> one GPFS is a multiple of a full 2 MB stripe. >>> >>> >>> -jf >> >> we have now added another file system based on 2 NSD on RAID6 8+p+q, >> keeping the 1MB block size just not to change too many things at the >> same time, but no substantial change in very low readout performances, >> that are still of the order of 50 MB/s while write performance are >> 1000MB/s >> >> Any other suggestion is welcomed! >> >> Giovanni >> >> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bracco at enea.it >> WWW http://www.afs.enea.it/bracco >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bracco at enea.it > WWW http://www.afs.enea.it/bracco > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=CPBLf7s53vCFL0esHIl8ZkeC7BiuNZUHD6JVWkcy48c&s=wfe9UKg6bKylrLyuepv2J4jNN4BEfLQK6A46yX9IB-Q&e= From UWEFALKE at de.ibm.com Thu Jun 11 21:41:52 2020 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 11 Jun 2020 22:41:52 +0200 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> References: <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> Message-ID: Hi Giovanni, how do the waiters look on your clients when reading? Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist Global Technology Services / Project Services Delivery / High Performance Computing +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefalke at de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Gesch?ftsf?hrung: Dr. Thomas Wolter, Sven Schooss Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Giovanni Bracco To: gpfsug-discuss at spectrumscale.org Cc: Agostino Funel Date: 05/06/2020 14:22 Subject: [EXTERNAL] [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN Sent by: gpfsug-discuss-bounces at spectrumscale.org In our lab we have received two storage-servers, Super micro SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID controller (2 GB cache) and before putting them in production for other purposes we have setup a small GPFS test cluster to verify if they can be used as storage (our gpfs production cluster has the licenses based on the NSD sockets, so it would be interesting to expand the storage size just by adding storage-servers in a infiniband based SAN, without changing the number of NSD servers) The test cluster consists of: 1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale each. 2) a Mellanox FDR switch used as a SAN switch 3) a Truescale QDR switch as GPFS cluster switch 4) two GPFS clients (Supermicro AMD nodes) one port QDR each. All the nodes run CentOS 7.7. On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been configured and it is exported via infiniband as an iSCSI target so that both appear as devices accessed by the srp_daemon on the NSD servers, where multipath (not really necessary in this case) has been configured for these two LIO-ORG devices. GPFS version 5.0.4-0 has been installed and the RDMA has been properly configured Two NSD disk have been created and a GPFS file system has been configured. Very simple tests have been performed using lmdd serial write/read. 1) storage-server local performance: before configuring the RAID6 volume as NSD disk, a local xfs file system was created and lmdd write/read performance for 100 GB file was verified to be about 1 GB/s 2) once the GPFS cluster has been created write/read test have been performed directly from one of the NSD server at a time: write performance 2 GB/s, read performance 1 GB/s for 100 GB file By checking with iostat, it was observed that the I/O in this case involved only the NSD server where the test was performed, so when writing, the double of base performances was obtained, while in reading the same performance as on a local file system, this seems correct. Values are stable when the test is repeated. 3) when the same test is performed from the GPFS clients the lmdd result for a 100 GB file are: write - 900 MB/s and stable, not too bad but half of what is seen from the NSD servers. read - 30 MB/s to 300 MB/s: very low and unstable values No tuning of any kind in all the configuration of the involved system, only default values. Any suggestion to explain the very bad read performance from a GPFS client? Giovanni here are the configuration of the virtual drive on the storage-server and the file system configuration in GPFS Virtual drive ============== Virtual Drive: 2 (Target Id: 2) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 81.856 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 18.190 TB State : Optimal Strip Size : 256 KB Number Of Drives : 11 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disabled GPFS file system from mmlsfs ============================ mmlsfs vsd_gexp2 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 1048576 Block size -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 22.00 (5.0.4.0) File system version --create-time Fri Apr 3 19:26:27 2020 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 134217728 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d nsdfs4lun2;nsdfs5lun2 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gexp2 Default mount point --mount-priority 0 Mount priority -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bracco at enea.it WWW https://urldefense.proofpoint.com/v2/url?u=http-3A__www.afs.enea.it_bracco&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=TbQFSz77fWm4Q3StvVLSfZ2GTQPDdwkd6S2eY5OvOME&s=CcbPtQrTI4xzh5gK0P-ol8uQcAc8yQKi5LjHZZaJBD4&e= ================================================== Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate e la casella di posta elettronica da cui e' stata inviata e' da qualificarsi quale strumento aziendale. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate (art. 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679). Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione. Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent amendments and GDPR UE 2016/679). If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail. Thanks. ================================================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=TbQFSz77fWm4Q3StvVLSfZ2GTQPDdwkd6S2eY5OvOME&s=XPiIgZtPIPdc6gXrjff_D1jtNnLkXF9i2m_gLeB0DYU&e= From luis.bolinches at fi.ibm.com Fri Jun 12 05:19:31 2020 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Fri, 12 Jun 2020 04:19:31 +0000 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: , <8c59eb9c-0eef-ccdf-7e59-5b8ea6bb9bf4@enea.it> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Fri Jun 12 11:51:52 2020 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Fri, 12 Jun 2020 11:51:52 +0100 Subject: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files In-Reply-To: <7775CB36-AEB5-4AFB-B8E3-64B608AAAC46@illinois.edu> References: <90fed97c-0178-a0f3-ae13-810518f1da2d@strath.ac.uk>

<7775CB36-AEB5-4AFB-B8E3-64B608AAAC46@illinois.edu> Message-ID: <7ae90490-e505-9823-1696-96d8b83b48b4@qsplace.co.uk> I seem to remember Marc Kaplan discussing using the ILM and mmfind for this. There is a presentation from 2018 which skims on an example http://files.gpfsug.org/presentations/2018/USA/SpectrumScalePolicyBP.pdf -- Lauz On 10/06/2020 23:40, Kerner, Chad A wrote: > You can do a policy scan though and get a list of files that have ACLs applied to them. Then you would not have to check every file with a shell utility or C, just process that list. Likewise, you can get the uid/gid as well and process that list with the new mapping(split it into multiple lists, processing multiple threads on multiple machines). > > While it is by no means the prettiest or possibly best way to handle the POSIX ACLs, I had whipped up a python api for it: https://github.com/ckerner/ssacl . It only does POSIX though. We use it in conjunction with acls (https://github.com/ckerner/acls), an ls replacement that shows effective user/group permissions based off of the acl's because most often the user would just look at the POSIX perms and say something is broken, without checking the acl. > > -- > Chad Kerner, Senior Storage Engineer > Storage Enabling Technologies > National Center for Supercomputing Applications > University of Illinois, Urbana-Champaign > > ?On 6/10/20, 5:30 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: > > On 10/06/2020 16:31, Lohit Valleru wrote: > > [SNIP] > > > I might mostly start small with a single lab, and only change files > > without ACLs. May I know if anyone has a method/tool to find out > which > files/dirs have NFS4 ACLs set? As far as we know - it is just one > > fileset/lab, but it would be good to confirm if we have them set > > across any other files/dirs in the filesystem. The usual methods do > > not seem to work. > > Use mmgetacl a file at a time and try and do something with the output? > > Tools to manipulate ACL's from on GPFS mounted nodes suck donkey balls, > and have been that way for over a decade. Last time I raised this with > IBM I was told that was by design... > > If they are CES then look at it client side from a Windows node? > > The alternative is to write something in C that calls gpfs_getacl. > > However it was an evening to get a basic UID remap code working in C. It > would not take much more effort to make it handle ACL's. As such I would > work on the premise that there are ACL's and handle it. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.knister at gmail.com Fri Jun 12 14:25:15 2020 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 12 Jun 2020 09:25:15 -0400 Subject: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN In-Reply-To: References: Message-ID: <01AF73E5-4722-4502-B7CC-60E7A62FEE65@gmail.com> I would double check your cpu frequency scaling settings in your NSD servers (cpupower frequency-info) and look at the governor. You?ll want it to be the performance governor. If it?s not what can happen is the CPUs scale back their clock rate which hurts RDMA performance. Running the I/o test on the NSD servers themselves may have been enough to kick the processors up into a higher frequency which afforded you good performance. Sent from my iPhone > On Jun 12, 2020, at 00:19, Luis Bolinches wrote: > > ? > Hi > > the block for writes increases the IOPS on those cards that might be already at the limit so I would not discard taht lowering the IOPS for writes has a positive effect on reads or not but it is a smoking gun that needs to be addressed. My experience of ignoring those is not a positive one. > > In regards of this HW I woudl love to see a baseline at RAW. run FIO (or any other tool that is not DD) on RAW device (not scale) to see what actually each drive can do AND then all the drives at the same time. We seen RAID controllers got to its needs even on reads when parallel access to many drives are put into the RAID controller. That is why we had to create a tool to get KPIs for ECE but can be applied here as way to see what the system can do. I would build numbers for RAW before I start looking into any filesystem numbers. > > you can use whatever tool you like but this one if just a FIO frontend that will do what I mention above https://github.com/IBM/SpectrumScale_ECE_STORAGE_READINESS. If you can I would also do the write part, as reads is part of the story, and you need to understand what the HW can do (+1 to Lego comment before) > -- > Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions > Luis Bolinches > Consultant IT Specialist > IBM Spectrum Scale development > ESS & client adoption teams > Mobile Phone: +358503112585 > > https://www.youracclaim.com/user/luis-bolinches > > Ab IBM Finland Oy > Laajalahdentie 23 > 00330 Helsinki > Uusimaa - Finland > > "If you always give you will always have" -- Anonymous > > > > ----- Original message ----- > From: "Uwe Falke" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: gpfsug-discuss-bounces at spectrumscale.org, Agostino Funel > Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN > Date: Thu, Jun 11, 2020 23:42 > > Hi Giovanni, how do the waiters look on your clients when reading? > > > Mit freundlichen Gr??en / Kind regards > > Dr. Uwe Falke > IT Specialist > Global Technology Services / Project Services Delivery / High Performance > Computing > +49 175 575 2877 Mobile > Rathausstr. 7, 09111 Chemnitz, Germany > uwefalke at de.ibm.com > > IBM Services > > IBM Data Privacy Statement > > IBM Deutschland Business & Technology Services GmbH > Gesch?ftsf?hrung: Dr. Thomas Wolter, Sven Schooss > Sitz der Gesellschaft: Ehningen > Registergericht: Amtsgericht Stuttgart, HRB 17122 > > > > From: Giovanni Bracco > To: gpfsug-discuss at spectrumscale.org > Cc: Agostino Funel