From jonathan.buzzard at strath.ac.uk Mon Mar 1 07:58:43 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 07:58:43 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > any performance difference. That's encouraging. > > Usually we create 1 vdisk per enclosure per RG, ? thinking this will > allow us to grow with same size vdisks when adding additional enclosures > in the future. > > Don?t think mmvdisk can be told to create multiple vdisks per RG > directly, so you have to manually create multiple vdisk sets each with > the apropriate size. > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings that you needed a minimum of six NSD's for optimal performance. I have sat in presentations where IBM employees have said so. What we where told back then is that GPFS needs a minimum number of NSD's in order to be able to spread the I/O's out. So if an NSD is being pounded for reads and a write comes in it. can direct it to a less busy NSD. Now I can imagine that in a ESS/DSS-G that as it's being scattered to the winds under the hood this is no longer relevant. But some notes to the effect for us old timers would be nice if that is the case to put our minds to rest. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Achim.Rehor at de.ibm.com Mon Mar 1 08:16:43 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 09:16:43 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > From S.J.Thompson at bham.ac.uk Mon Mar 1 09:06:07 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Mar 2021 09:06:07 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Or for hedging your bets about how you might want to use it in future. We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system) So yes whilst there might be no performance impact on doing this, we still do. I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that. Simon ?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" wrote: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Mon Mar 1 09:08:20 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 1 Mar 2021 09:08:20 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Mar 1 09:34:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 09:34:26 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Mon Mar 1 09:46:06 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 10:46:06 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Correct, there was. The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs. For GNR there is a differetnt queuing setup in GPFS, than there was for NSDs. See "mmfsadm dump nsd" and check for NsdQueueTraditional versus NsdQueueGNR And yes, i was too strict, with "> The only reason for having more NSDs is for using them for different > filesystems." there are other management reasons to run with a reasonable number of vdisks, just not performance reasons. Mit freundlichen Gruessen / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07: > From: Simon Thompson > To: gpfsug main discussion list > Date: 01/03/2021 10:06 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Or for hedging your bets about how you might want to use it in future. > > We are never quite sure if we want to do something different in the > future with some of the storage, sure that might mean we want to > steal some space from a file-system, but that is perfectly valid. > And we have done this, both in temporary transient states (data > migration between systems), or permanently (found we needed > something on a separate file-system) > > So yes whilst there might be no performance impact on doing this, westill do. > > I vaguely recall some of the old reasoning was around IO queues in > the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD > server, you have 16 IO queues passing to multipath, which can help > keep the data pipes full. I suspect there was some optimal number of > NSDs for different storage controllers, but I don't know if anyone > ever benchmarked that. > > Simon > > On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Achim.Rehor at de.ibm.com" bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote: > > The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > the increased parallelism, that gives you 'more spindles' and thus more > performance. > In GNR the drives are used in parallel anyway through the GNRstriping. > Therfore, you are using all drives of a ESS/GSS/DSS model under the hood > in the vdisks anyway. > > The only reason for having more NSDs is for using them for different > filesystems. > > > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > > > From: Jonathan Buzzard > > To: gpfsug-discuss at spectrumscale.org > > Date: 01/03/2021 08:58 > > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of > NSD's > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could > see > > > any performance difference. > > > > That's encouraging. > > > > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > > allow us to grow with same size vdisks when adding additional > enclosures > > > in the future. > > > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > > directly, so you have to manually create multiple vdisk setseach with > > > > the apropriate size. > > > > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > > that you needed a minimum of six NSD's for optimal performance. I have > > sat in presentations where IBM employees have said so. What we where > > told back then is that GPFS needs a minimum number of NSD's inorder to > > be able to spread the I/O's out. So if an NSD is being poundedfor reads > > > and a write comes in it. can direct it to a less busy NSD. > > > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > > the winds under the hood this is no longer relevant. But some notes to > > the effect for us old timers would be nice if that is the case to put > > our minds to rest. > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > From jonathan.buzzard at strath.ac.uk Mon Mar 1 11:45:45 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 11:45:45 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: On 01/03/2021 09:08, Luis Bolinches wrote: > Hi > > There other reasons to have more than 1. It is management of those. When > you have to add or remove NSDs of a FS having more than 1 makes it > possible to empty some space and manage those in and out. Manually but > possible. If you have one big NSD or even 1 per enclosure it might > difficult or even not possible depending the number of enclosures and FS > utilization. > > Starting some ESS version (not DSS, cant comment on that) that I do not > recall but in the last 6 months, we have change the default (for those > that use the default) to 4 NSDs per enclosure for ESS 5000. There is no > impact on performance either way on ESS, we tested it. But management of > those on the long run should be easier. Question how does one create a none default number of vdisks per enclosure then? I tried creating a stanza file and then doing mmcrvdisk but it was not happy, presumably because of the "new style" recovery group management mmcrvdisk: [E] This command is not supported by recovery groups under management of mmvdisk. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Mon Mar 1 11:53:32 2021 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 1 Mar 2021 11:53:32 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: Jonathan, You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time) Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space Mmvdisk vdisk set ... insert parameters https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm Sent from my iPhone > On 1 Mar 2021, at 21:45, Jonathan Buzzard wrote: > > ?On 01/03/2021 09:08, Luis Bolinches wrote: >> Hi >> >> There other reasons to have more than 1. It is management of those. When >> you have to add or remove NSDs of a FS having more than 1 makes it >> possible to empty some space and manage those in and out. Manually but >> possible. If you have one big NSD or even 1 per enclosure it might >> difficult or even not possible depending the number of enclosures and FS >> utilization. >> >> Starting some ESS version (not DSS, cant comment on that) that I do not >> recall but in the last 6 months, we have change the default (for those >> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no >> impact on performance either way on ESS, we tested it. But management of >> those on the long run should be easier. > Question how does one create a none default number of vdisks per > enclosure then? > > I tried creating a stanza file and then doing mmcrvdisk but it was not > happy, presumably because of the "new style" recovery group management > > mmcrvdisk: [E] This command is not supported by recovery groups under > management of mmvdisk. > > > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scl at virginia.edu Mon Mar 1 12:31:37 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Mon, 1 Mar 2021 12:31:37 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Hi folks, Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this: ... group:group1:rwxc mask::rwxc ... and I modify a different entry with: setfacl -m group:group2:r-x dirname then the "c" permissions above get cleared and I end up with ... group:group1:rwx- mask::rwx- ... I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From olaf.weiser at de.ibm.com Mon Mar 1 12:45:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 12:45:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 1 12:58:44 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 1 Mar 2021 12:58:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 13:14:38 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 13:14:38 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: On 01/03/2021 12:45, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Hallo Stephen, > behavior ... or better to say ... predicted behavior for chmod and ACLs > .. is not an easy thing or only? , if? you stay in either POSIX world or > NFSv4 world > to be POSIX compliant, a chmod overwrites ACLs One might argue that the general rubbishness of the mmputacl cammand, and if a mmsetfacl command (or similar) existed it would negate messing with Linux utilities to change ACL's on GPFS file systems Only been bringing it up for over a decade now ;-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 15:18:59 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 15:18:59 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Mon Mar 1 08:59:35 2021 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Mon, 01 Mar 2021 08:59:35 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk> Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference. As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds. -- Lauz On 1 March 2021 08:16:43 GMT, Achim Rehor wrote: >The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > >the increased parallelism, that gives you 'more spindles' and thus more > >performance. >In GNR the drives are used in parallel anyway through the GNRstriping. >Therfore, you are using all drives of a ESS/GSS/DSS model under the >hood >in the vdisks anyway. > >The only reason for having more NSDs is for using them for different >filesystems. > > >Mit freundlichen Gr??en / Kind regards > >Achim Rehor > >IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > >gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > >> From: Jonathan Buzzard >> To: gpfsug-discuss at spectrumscale.org >> Date: 01/03/2021 08:58 >> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of >NSD's >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: >> > >> > I?ve tried benchmarking many vs. few vdisks per RG, and never could > >see >> > any performance difference. >> >> That's encouraging. >> >> > >> > Usually we create 1 vdisk per enclosure per RG, thinking this >will >> > allow us to grow with same size vdisks when adding additional >enclosures >> > in the future. >> > >> > Don?t think mmvdisk can be told to create multiple vdisks per RG >> > directly, so you have to manually create multiple vdisk sets each >with > >> > the apropriate size. >> > >> >> Thing is back in the day so GPFS v2.x/v3.x there where strict >warnings >> that you needed a minimum of six NSD's for optimal performance. I >have >> sat in presentations where IBM employees have said so. What we where >> told back then is that GPFS needs a minimum number of NSD's in order >to >> be able to spread the I/O's out. So if an NSD is being pounded for >reads > >> and a write comes in it. can direct it to a less busy NSD. >> >> Now I can imagine that in a ESS/DSS-G that as it's being scattered to > >> the winds under the hood this is no longer relevant. But some notes >to >> the effect for us old timers would be nice if that is the case to put > >> our minds to rest. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url? >> >u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- >> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- >> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= >> > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 16:50:31 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 16:50:31 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> On 01/03/2021 15:18, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > JAB, > yes-this is in argument ;-) ... and personally I like the idea of having > smth like setfacl also for GPFS ..? for years... > *but* it would not take away the generic challenge , what to do, if > there are competing standards / definitions to meet > at least that is most likely just one reason, why there's no tool yet > there are several hits on RFE page for "ACL".. some of them could be > also addressed with a (mm)setfacl tool > but I was not able to find a request for a tool itself > (I quickly? searched? public but? not found it there, maybe there is > already one in private...) > So - dependent on how important this item for others? is? ... its time > to fire an RFE ?!? ... Well when I asked I was told by an IBM representative that it was by design there was no proper way to set ACLs directly from Linux. The expectation was that you would do this over NFSv4 or Samba. So filing an RFE would be pointless under those conditions and I have never bothered as a result. This was pre 2012 so IBM's outlook might have changed in the meantime. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 17:57:11 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 17:57:11 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Mar 2 09:36:48 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 2 Mar 2021 09:36:48 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920000.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920001.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920002.png Type: image/png Size: 1172 bytes Desc: not available URL: From russell at nordquist.info Tue Mar 2 19:31:24 2021 From: russell at nordquist.info (Russell Nordquist) Date: Tue, 2 Mar 2021 14:31:24 -0500 Subject: [gpfsug-discuss] Self service creation of filesets Message-ID: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell From anacreo at gmail.com Tue Mar 2 20:58:29 2021 From: anacreo at gmail.com (Alec) Date: Tue, 2 Mar 2021 12:58:29 -0800 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: This does feel like another situation where I may use a custom attribute and a periodic script to do the fileset creation. Honestly I would want the change management around fileset creation. But I could see a few custom attributes on a newly created user dir... Like maybe just setting user.quota=10TB... Then have a policy that discovers these does the work of creating the fileset, setting the quotas, migrating data to the fileset, and then mounting the fileset over the original directory. Honestly that sounds so nice I may have to implement this... Lol. Like I could see doing something like discovering directories that have user.archive=true and automatically gzipping large files within. Would be nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=. Alec On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the > admins can create them. To the users it?s just a directory so it slows > things down. Has anyone deployed a self service model for creating > filesets? Maybe using the API? This feels like shared pain that someone has > already worked on?. > > thanks > Russell > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Mar 2 22:38:17 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Mar 2021 22:38:17 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon ?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ckerner at illinois.edu Tue Mar 2 22:59:01 2021 From: ckerner at illinois.edu (Kerner, Chad A) Date: Tue, 2 Mar 2021 22:59:01 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> References: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu> We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things. Chad -- Chad Kerner, Senior Storage Engineer Storage Enabling Technologies National Center for Supercomputing Applications University of Illinois, Urbana-Champaign ?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ From tortay at cc.in2p3.fr Wed Mar 3 08:06:37 2021 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 3 Mar 2021 09:06:37 +0100 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> On 02/03/2021 20:31, Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). Delegation authorization (identifying "power-users") is external to the tool. Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From russell at nordquist.info Wed Mar 3 17:14:37 2021 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 3 Mar 2021 12:14:37 -0500 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. What I would want is to be able to grant the the following calls + maybe a few more. The related REST API calls. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm Russell > On Mar 3, 2021, at 3:06 AM, Loic Tortay wrote: > > On 02/03/2021 20:31, Russell Nordquist wrote: >> Hi all >> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, > We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. > > Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". > In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. > > This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). > > Delegation authorization (identifying "power-users") is external to the tool. > > Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). > > There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) > > The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html > > Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Thu Mar 4 09:51:45 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 4 Mar 2021 09:51:45 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk> On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem. Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 10:04:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 10:04:22 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> I am seeing that whenever I try and restore a file with an ACL I get the a ANS1589W error in /var/log/dsmerror.log ANS1589W Unable to write extended attributes for ****** due to errno: 13, reason: Permission denied But bizarrely the ACL is actually restored. At least as far as I can tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b install. The backup node makes the third quorum node for the cluster being as that it runs genuine RHEL (unlike all the compute nodes which are running CentOS). Googling I can't find any references to this being fixed in a later version of the GPFS software, though being on RHEL7 and it's derivatives I am stuck on 5.0.5 Surely root has permissions to write the extended attributes for anyone? It would seem perverse if you have to be the owner of a file to restore the ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Fri Mar 5 12:15:38 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 12:15:38 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 13:07:56 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 13:07:56 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem.? There recently was an issue with Protect and how it used the > GPFS API for ACLs.? If I recall Protect was not properly handling a > return code.? I do not know if it is relevant to your problem but? it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Fri Mar 5 18:06:43 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 5 Mar 2021 18:06:43 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> Hallo All, thge mentioned problem with protect was this: https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jonathan Buzzard Gesendet: Freitag, 5. M?rz 2021 14:08 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem. There recently was an issue with Protect and how it used the > GPFS API for ACLs. If I recall Protect was not properly handling a > return code. I do not know if it is relevant to your problem but it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Fri Mar 5 19:12:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 19:12:47 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>, <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 20:31:54 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 20:31:54 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk> On 05/03/2021 19:12, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > I was referring to this flash, > https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E > > > Spectrum Protect 8.1.11 client has the fix so this should not be an > issue for Jonathan.? Probably best to open a help case against Spectrum > Protect and begin the investigation there. > Also the fix is to stop an unchanged file with an ACL from being backed up again, but only one more time. I suspect we where hit with that issue, but given we only have ~90GB of files with ACL's on them I would not have noticed. That is significantly less than the normal daily churn. This however is an issue with the *restore*. Everything looks to get restored correctly even the ACL's. At the end of the restore all looks good given the headline report from dsmc. However there are ANS1589W warnings in dsmerror.log and dsmc exits with an error code of 8 rather than zero. Will open a case against Spectrum Protect on Monday. I am pretty confident the warnings are false. The current plan is to do carefully curated hand restores of the three afflicted users when the rest of the restore if finished to double check the ACL's are the only issue. Quite how the Spectrum Protect team have missed this bug is beyond me. Do they not have some unit tests to check this stuff before pushing out updates. I know in the past it worked, though that was many years ago now. However I restored many TB of data from backup with ACL's on them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Mon Mar 8 14:49:59 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 14:49:59 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 15:29:42 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 15:29:42 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Mar 8 15:34:21 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 15:34:21 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them. I can?t say for sure where in the file system they are (and I don?t care). Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frederick Stock Reply-To: gpfsug main discussion list Date: Monday, March 8, 2021 at 9:29 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents? CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ Could you use the PATHNAME LIKE statement to limit the location to the files of interest? Fred _______________________________________________________ Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents? Date: Mon, Mar 8, 2021 10:12 AM Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 16:07:48 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 16:07:48 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 8 20:45:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 20:45:05 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 16:07, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Presumably the only feature that would help here is if policy could > determine that the end location pointed to by a symbolic link is within > the current file system.? I am not aware of any such feature or > attribute which policy could check so I think all you can do is run > policy to find the symbolic links and then check each link to see if it > points into the same file system.? You might find the mmfind command > useful for this purpose.? I expect it would eliminate the need to create > a policy to find the symbolic links. > Unless you are using bind mounts if the symbolic link points outside the mount point of the file system it is not within the current file system. So noting that you can write very SQL like statements something like the following should in theory do it RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' Note the above is not checked in any way shape or form for working. Even if you do have bind mounts of other GPFS file systems you just need a more complicated WHERE statement. When doing policy engine stuff I find having that section of the GPFS manual printed out and bound, along with an SQL book for reference is very helpful. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Mon Mar 8 21:00:04 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 21:00:04 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 20:45, Jonathan Buzzard wrote: [SNIP] > So noting that you can write very SQL like statements something like the > following should in theory do it > > RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND > SUBSTR(PATH_NAME,0,4)='/fs1/' > > Note the above is not checked in any way shape or form for working. Even > if you do have bind mounts of other GPFS file systems you just need a > more complicated WHERE statement. Duh, of course as soon as I sent it, I realized there is a missing SHOW RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' You could replace the SUBSTR with a REGEX if you prefer JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ulmer at ulmer.org Mon Mar 8 22:33:38 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 8 Mar 2021 17:33:38 -0500 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard wrote: > > ?On 08/03/2021 20:45, Jonathan Buzzard wrote: > > [SNIP] > >> So noting that you can write very SQL like statements something like the >> following should in theory do it >> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND >> SUBSTR(PATH_NAME,0,4)='/fs1/' >> Note the above is not checked in any way shape or form for working. Even >> if you do have bind mounts of other GPFS file systems you just need a >> more complicated WHERE statement. > > Duh, of course as soon as I sent it, I realized there is a missing SHOW > > RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' > > You could replace the SUBSTR with a REGEX if you prefer > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Tue Mar 9 12:25:56 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 9 Mar 2021 12:25:56 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links with contents? In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> References: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" wrote: CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ---------------------------------------------------------------------- Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). From bill.burke.860 at gmail.com Wed Mar 10 02:19:02 2021 From: bill.burke.860 at gmail.com (William Burke) Date: Tue, 9 Mar 2021 21:19:02 -0500 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Mar 10 02:21:54 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 10 Mar 2021 02:21:54 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From anacreo at gmail.com Wed Mar 10 02:59:18 2021 From: anacreo at gmail.com (Alec) Date: Tue, 9 Mar 2021 18:59:18 -0800 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: You would definitely be able to search by inode creation date and find the files you want... our 1.25m file filesystem takes about 47 seconds to query... One thing I would worry about though is inode deletion and inter-fileset file moves. The SQL based engine wouldn't be able to identify those changes and so you'd not be able to replicate deletes and such. Alternatively.... I have a script that runs in about 4 minutes and it pulls all the data out of the backup indexes, and compares the pre-built hourly file index on our system and identifies files that don't exist in the backup, so I have a daily backup validation... I filter the file list using ksh's printf date manipulation to filter out files that are less than 2 days old, to reduce the noise. A modification to this could simply compare a daily file index with the previous day's index, and send rsync a list of files (existing or deleted) based on just a delta of the two indexes (sort|diff), then you could properly account for all the changes. If you don't care about file modifications just produce both lists based on creation time instead of modification time. The mmfind command or GPFS policy engine should be able to produce a full file list/index very rapidly. In another thread there was a conversation with ACL's... I don't think our backup system backs up ACL's so I just have GPFS produce a list of all ACL applied objects on the daily, and have a script that just makes a null delimited backup file of every single ACL on our file system... and have a script to apply the ACL's as a "restore". It's a pretty simple thing to write-up and keeping 90 day history on this lets me compare the ACL evolution on a file very easily. Alec MVH Most Victorious Hunting (Why should Scandinavians own this cool sign off) On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski wrote: > Yup, you want to use the policy engine: > > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m > reluctant to provide examples as I?m actually suspicious that we don?t have > it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > > On Mar 9, 2021, at 9:19 PM, William Burke > wrote: > > > > I would like to know what files were modified/created/deleted (only for > the current day) on the GPFS's file system so that I could rsync ONLY those > files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not > have to traverse the filesystem looking for these files? If i use the rsync > tool it will scan the file system which is 400+ million files. Obviously > this will be problematic to complete a scan in a day, if it would ever > complete single-threaded. There are tools or scripts that run multithreaded > rsync but it's still a brute force attempt. and it would be nice to know > where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not > sure if this is the best approach to looking at the GPFS metadata - inodes, > modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Mar 10 15:15:58 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Mar 2021 15:15:58 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk> On 10/03/2021 02:59, Alec wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > You would definitely be able to search by inode creation date and find > the files you want... our 1.25m file filesystem takes about 47 seconds > to query...? One thing I would worry about though is inode deletion and > inter-fileset file moves.? ?The SQL based engine wouldn't be able to > identify those changes and so you'd not be able to replicate deletes and > such. > This is the problem with rsync "backups", you need to run it with --delete otherwise any restore will "upset" your users as they find large numbers of file they had deleted unhelpfully "restored" > Alternatively.... > I have a script that runs in about 4 minutes and it pulls all the data > out of the backup indexes, and compares the pre-built hourly file index > on our system and identifies files that don't exist in the backup, so I > have a daily backup validation...? I filter the file list using > ksh's?printf date manipulation to filter out files that are less than 2 > days old, to reduce the noise.? A modification to this could simply > compare a daily file index with the previous day's index, and send rsync > a list of files (existing or deleted) based on just a delta of the two > indexes (sort|diff), then you could properly account for all the > changes.? If you don't care about file modifications just produce both > lists based on creation time instead of modification time.? The mmfind > command or GPFS policy engine should be able to produce a full file > list/index very rapidly. > My view would be somewhere along the lines of this is a lot of work and if you have the space to rsync your GPFS file system to, presumably with a server attached to said storage then for under 500 PVU of Spectrum Protect licensing you can have a fully supported client/server Spectrum Protect/TSM backup solution and just use mmbackup. You need to play the game and use older hardware ;-) I use an ancient pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be quad core and 70 PVU per core which would cost an additional ~$1000 in licensing. If it breaks down they are under $100 on eBay. It's never skipped a beat and I have just finished a complete planned restore of our DSS-G using it. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Mar 10 19:09:13 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Mar 2021 19:09:13 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: I was looking for the original source for this, but it was on dev works ... which is now dead. But you can use something like: tsbuhelper clustermigdiff \ $migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist "mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today "mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented... We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses: RULE EXTERNAL LIST 'latest.filelist' EXEC '' \ RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \ SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \ VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \ ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \ WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \ ELSE 'resdnt' END )) \ WHERE \ ( \ NOT \ ( (PATH_NAME LIKE '/%/.mmbackup%') OR \ (PATH_NAME LIKE '/%/.mmmigrate%') OR \ (PATH_NAME LIKE '/%/.afm%') OR \ (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \ (PATH_NAME LIKE '/%/.mmLockDir/%') OR \ (MODE LIKE 's%') \ ) \ ) \ AND \ (MISC_ATTRIBUTES LIKE '%u%') \ AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \ AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%')) On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files. This comes with no warranty ... We don't use this for backup, Spectrum Protect and mmbackup are our friends ... Simon ?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" wrote: Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From enrico.tagliavini at fmi.ch Thu Mar 11 09:22:46 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 09:22:46 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch> Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org? On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > > > \\UTGERS,?? |---------------------------*O*--------------------------- > > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu > > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark > ???? `' > > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > > > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a > > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > > brute force attempt. and it would be nice to know where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > > metadata - inodes, modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu Mar 11 13:17:30 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:17:30 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >> Sent: Wednesday, March 10, 2021 3:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >> >> Yup, you want to use the policy engine: >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >> >> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >> don?t have it quite right and are passing far too much stuff to rsync). >> >> -- >> #BlackLivesMatter >> ____ >>>> \\UTGERS, |---------------------------*O*--------------------------- >>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >> `' >> >>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>> >>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>> >>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>> >>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>> metadata - inodes, modify times, creation times, etc. >>> >>> >>> >>> -- >>> >>> Best Regards, >>> >>> William Burke (he/him) >>> Lead HPC Engineer >>> Advance Research Computing >>> 860.255.8832 m | LinkedIn >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From enrico.tagliavini at fmi.ch Thu Mar 11 13:24:47 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 13:24:47 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> Message-ID: Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Mar 11 13:47:44 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:47:44 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > >> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: >> I?m going to ask what may be a dumb question: >> >> Given that you have GPFS on both ends, what made you decide to NOT use AFM? >> >> -- >> Stephen >> >> >>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: >>> >>> ?Hello William, >>> >>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents. >>> >>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is >>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example >>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. >>> >>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me >>> enough not to go that route. >>> >>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just >>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which >>> the ctime changes in the last couple of days (to update metadata info). >>> >>> Good luck. >>> Kind regards. >>> >>> -- >>> >>> Enrico Tagliavini >>> Systems / Software Engineer >>> >>> enrico.tagliavini at fmi.ch >>> >>> Friedrich Miescher Institute for Biomedical Research >>> Infomatics >>> >>> Maulbeerstrasse 66 >>> 4058 Basel >>> Switzerland >>> >>> >>> >>> >>> -------- Forwarded Message -------- >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >>>> Sent: Wednesday, March 10, 2021 3:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >>>> >>>> Yup, you want to use the policy engine: >>>> >>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >>>> >>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >>>> don?t have it quite right and are passing far too much stuff to rsync). >>>> >>>> -- >>>> #BlackLivesMatter >>>> ____ >>>>>> \\UTGERS, |---------------------------*O*--------------------------- >>>>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >>>> `' >>>> >>>>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>>>> >>>>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>>>> >>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>>>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>>>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>>>> >>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>>>> metadata - inodes, modify times, creation times, etc. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best Regards, >>>>> >>>>> William Burke (he/him) >>>>> Lead HPC Engineer >>>>> Advance Research Computing >>>>> 860.255.8832 m | LinkedIn >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Mar 11 14:20:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 11 Mar 2021 14:20:05 +0000 Subject: [gpfsug-discuss] Synchronization/Restore of file systems Message-ID: As promised last year I having just completed a storage upgrade, I have sanitized my scripts and put them up on Github for other people to have a look at the methodology I use in these sorts of scenarios. This time the upgrade involved pulling out all the existing disks and fitting large ones then restoring from backup, rather than synchronizing to a new system, but the principles are the same. Bear in mind the code is written in Perl because it's history is ancient now and with few opportunities to test it in anger, rewriting it in the latest fashionable scripting language is unappealing. https://github.com/digitalcabbage/syncrestore JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From enrico.tagliavini at fmi.ch Thu Mar 11 14:24:43 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 14:24:43 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM. The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM. The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling. As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage. And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sadaniel at us.ibm.com Thu Mar 11 16:08:11 2021 From: sadaniel at us.ibm.com (Steven Daniels) Date: Thu, 11 Mar 2021 09:08:11 -0700 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com http://www.ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Thu Mar 11 16:28:57 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 11 Mar 2021 16:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu> Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 11, 2021, at 11:08 AM, Steven Daniels wrote: > > Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. > > I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. > > The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. > > I'll leave it to Venkat and others on the development team to share more details about improvements. > > > Steven A. Daniels > Cross-brand Client Architect > Senior Certified IT Specialist > National Programs > Fax and Voice: 3038101229 > sadaniel at us.ibm.com > http://www.ibm.com > <1A816397.jpg> > > Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y > > From: Stephen Ulmer > To: gpfsug main discussion list > Cc: bill.burke.860 at gmail.com > Date: 03/11/2021 06:47 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Thank you! Would you mind letting me know in what era you made your evaluation? > > I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. > > Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. > > Your original post was very thoughtful, and I appreciate your time. > > -- > Stephen > > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > > On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: > I?m going to ask what may be a dumb question: > > Given that you have GPFS on both ends, what made you decide to NOT use AFM? > > -- > Stephen > > > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > \\UTGERS, |---------------------------*O*--------------------------- > _// the State | Ryan Novosielski - novosirj at rutgers.edu > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From honwai.leong at sydney.edu.au Thu Mar 11 22:28:57 2021 From: honwai.leong at sydney.edu.au (Honwai Leong) Date: Thu, 11 Mar 2021 22:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: This paper might provide some ideas, not the best solution but works fine https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Friday, March 12, 2021 3:08 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 20 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Fwd: FW: Backing up GPFS with Rsync (Steven Daniels) ---------------------------------------------------------------------- Message: 1 Date: Thu, 11 Mar 2021 09:08:11 -0700 From: "Steven Daniels" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com Subject: Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Message-ID: Content-Type: text/plain; charset="utf-8" Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org End of gpfsug-discuss Digest, Vol 110, Issue 20 *********************************************** From juergen.hannappel at desy.de Mon Mar 15 16:20:51 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET) Subject: [gpfsug-discuss] Detecting open files Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de> Hi, when unlinking filesets that sometimes fails because some open files on that fileset still exist. Is there a way to find which files are open, and from which node? Without running a mmdsh -N all lsof on serveral (big) remote clusters, that is. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From Robert.Oesterlin at nuance.com Wed Mar 17 11:59:57 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 11:59:57 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it? Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 14:18:56 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 14:18:56 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898090.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898091.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898092.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 14:30:36 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 14:30:36 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Can you give me details on how to do this? I tried this: [root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"' could not change directory to "/root" psql: FATAL: Peer authentication failed for user "postgres" Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 9:19 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 15:09:51 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 15:09:51 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898093.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898094.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898095.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 15:33:54 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 15:33:54 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way. [root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts" DELETE 20 /usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug 10:32 AM Operation Failed 10:32 AM Error: debug: locale=en_US debug: Running 'mmlsmount 'fs1' -Y ' on node localhost debug: Running 'mmlsmount 'fs2' -Y ' on node localhost debug: Running 'mmlsmount 'fs3' -Y ' on node localhost debug: Running 'mmlsmount 'fs4' -Y ' on node localhost debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 10:10 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ I think psql postgres postgres -c "delete from fscc.filesystem_mounts"' ran as root should do the trick. Mit freundlichen Gr??en / Kind regards [cid:image001.png at 01D71B19.07732D00] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1135 bytes Desc: image001.png URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 17:05:11 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 17:05:11 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898096.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898097.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D71B19.07732D00.png Type: image/png Size: 1135 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898098.png Type: image/png Size: 1172 bytes Desc: not available URL: From robert.horton at icr.ac.uk Thu Mar 18 15:47:07 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 18 Mar 2021 15:47:07 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Message-ID: Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 06:32:00 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 12:02:00 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Mar 19 09:42:22 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 19 Mar 2021 09:42:22 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 09:50:04 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 15:20:04 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> References: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Message-ID: Hi Robert, So you might have started seeing problem after upgrading the gateway nodes to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would resolve this problem. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/19/2021 03:13 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 09:32:10 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 10:32:10 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Mar 22 09:54:28 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 22 Mar 2021 10:54:28 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller < u.sibiller at science-computing.de>: > Hello, > > we usually create filesets for project dirs and homes. > > Unfortunately we have discovered that this convention has been ignored for > some dirs and their data > no resides in the root fileset. We would like to move the data to > independent filesets. > > Is there a way to do this without having to schedule a downtime for the > dirs in question? > > I mean, is there a way to transparently move data to an independent > fileset at the same path? > > > Kind regards, > > Ulrich Sibiller > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Mar 22 12:24:59 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Mar 2021 12:24:59 +0000 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: You could maybe create the new file-set, link in a different place, copy the data ? Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy. Simon From: on behalf of "janfrode at tanso.net" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 22 March 2021 at 09:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller >: Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 13:20:46 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:20:46 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: On 22.03.21 13:24, Simon Thompson wrote: > You could maybe create the new file-set, link in a different place, copy the data ? > > Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially > reducing the time to do the copy. Yes, but this does not help if a file is open all the time, e.g. during a long-running job. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Mon Mar 22 13:41:39 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:41:39 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de> On 22.03.21 10:54, Jan-Frode Myklebust wrote: > No ? all copying between filesets require full data copy. No simple rename. > > This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. Yes, your are right. So please vote here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429 Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From robert.horton at icr.ac.uk Tue Mar 23 19:02:05 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Tue, 23 Mar 2021 19:02:05 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From vpuvvada at in.ibm.com Wed Mar 24 02:36:31 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Wed, 24 Mar 2021 08:06:31 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: ># mmafmlocal rm /.afm/.afmctl >/bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can ignore this issue, and restart AFM filesets at cache. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 12:33 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasad.surampudi at theatsgroup.com Wed Mar 24 14:32:30 2021 From: prasad.surampudi at theatsgroup.com (Prasad Surampudi) Date: Wed, 24 Mar 2021 14:32:30 +0000 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Message-ID: Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Mar 25 16:33:48 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 25 Mar 2021 11:33:48 -0500 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems In-Reply-To: References: Message-ID: Prasad, This is unexpected. Please open a PMR so that data can be collected and looked at. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Prasad Surampudi To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 10:32 AM Subject: [EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Sent by: gpfsug-discuss-bounces at spectrumscale.org Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Mon Mar 29 19:38:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Mon, 29 Mar 2021 18:38:00 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Mar 30 07:06:54 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Mar 2021 06:06:54 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Tue Mar 30 19:24:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Tue, 30 Mar 2021 18:24:00 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34 In-Reply-To: References: Message-ID: Hey Olaf, We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight. As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem. We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, March 30, 2021 1:07 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 34 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Filesystem mount attempt hangs GPFS client node (Saula, Oluwasijibomi) 2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 29 Mar 2021 18:38:00 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="utf-8" Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 30 Mar 2021 06:06:54 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 110, Issue 34 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 07:58:43 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 07:58:43 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > any performance difference. That's encouraging. > > Usually we create 1 vdisk per enclosure per RG, ? thinking this will > allow us to grow with same size vdisks when adding additional enclosures > in the future. > > Don?t think mmvdisk can be told to create multiple vdisks per RG > directly, so you have to manually create multiple vdisk sets each with > the apropriate size. > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings that you needed a minimum of six NSD's for optimal performance. I have sat in presentations where IBM employees have said so. What we where told back then is that GPFS needs a minimum number of NSD's in order to be able to spread the I/O's out. So if an NSD is being pounded for reads and a write comes in it. can direct it to a less busy NSD. Now I can imagine that in a ESS/DSS-G that as it's being scattered to the winds under the hood this is no longer relevant. But some notes to the effect for us old timers would be nice if that is the case to put our minds to rest. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Achim.Rehor at de.ibm.com Mon Mar 1 08:16:43 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 09:16:43 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > From S.J.Thompson at bham.ac.uk Mon Mar 1 09:06:07 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Mar 2021 09:06:07 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Or for hedging your bets about how you might want to use it in future. We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system) So yes whilst there might be no performance impact on doing this, we still do. I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that. Simon ?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" wrote: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Mon Mar 1 09:08:20 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 1 Mar 2021 09:08:20 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Mar 1 09:34:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 09:34:26 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Mon Mar 1 09:46:06 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 10:46:06 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Correct, there was. The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs. For GNR there is a differetnt queuing setup in GPFS, than there was for NSDs. See "mmfsadm dump nsd" and check for NsdQueueTraditional versus NsdQueueGNR And yes, i was too strict, with "> The only reason for having more NSDs is for using them for different > filesystems." there are other management reasons to run with a reasonable number of vdisks, just not performance reasons. Mit freundlichen Gruessen / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07: > From: Simon Thompson > To: gpfsug main discussion list > Date: 01/03/2021 10:06 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Or for hedging your bets about how you might want to use it in future. > > We are never quite sure if we want to do something different in the > future with some of the storage, sure that might mean we want to > steal some space from a file-system, but that is perfectly valid. > And we have done this, both in temporary transient states (data > migration between systems), or permanently (found we needed > something on a separate file-system) > > So yes whilst there might be no performance impact on doing this, westill do. > > I vaguely recall some of the old reasoning was around IO queues in > the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD > server, you have 16 IO queues passing to multipath, which can help > keep the data pipes full. I suspect there was some optimal number of > NSDs for different storage controllers, but I don't know if anyone > ever benchmarked that. > > Simon > > On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Achim.Rehor at de.ibm.com" bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote: > > The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > the increased parallelism, that gives you 'more spindles' and thus more > performance. > In GNR the drives are used in parallel anyway through the GNRstriping. > Therfore, you are using all drives of a ESS/GSS/DSS model under the hood > in the vdisks anyway. > > The only reason for having more NSDs is for using them for different > filesystems. > > > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > > > From: Jonathan Buzzard > > To: gpfsug-discuss at spectrumscale.org > > Date: 01/03/2021 08:58 > > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of > NSD's > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could > see > > > any performance difference. > > > > That's encouraging. > > > > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > > allow us to grow with same size vdisks when adding additional > enclosures > > > in the future. > > > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > > directly, so you have to manually create multiple vdisk setseach with > > > > the apropriate size. > > > > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > > that you needed a minimum of six NSD's for optimal performance. I have > > sat in presentations where IBM employees have said so. What we where > > told back then is that GPFS needs a minimum number of NSD's inorder to > > be able to spread the I/O's out. So if an NSD is being poundedfor reads > > > and a write comes in it. can direct it to a less busy NSD. > > > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > > the winds under the hood this is no longer relevant. But some notes to > > the effect for us old timers would be nice if that is the case to put > > our minds to rest. > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > From jonathan.buzzard at strath.ac.uk Mon Mar 1 11:45:45 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 11:45:45 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: On 01/03/2021 09:08, Luis Bolinches wrote: > Hi > > There other reasons to have more than 1. It is management of those. When > you have to add or remove NSDs of a FS having more than 1 makes it > possible to empty some space and manage those in and out. Manually but > possible. If you have one big NSD or even 1 per enclosure it might > difficult or even not possible depending the number of enclosures and FS > utilization. > > Starting some ESS version (not DSS, cant comment on that) that I do not > recall but in the last 6 months, we have change the default (for those > that use the default) to 4 NSDs per enclosure for ESS 5000. There is no > impact on performance either way on ESS, we tested it. But management of > those on the long run should be easier. Question how does one create a none default number of vdisks per enclosure then? I tried creating a stanza file and then doing mmcrvdisk but it was not happy, presumably because of the "new style" recovery group management mmcrvdisk: [E] This command is not supported by recovery groups under management of mmvdisk. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Mon Mar 1 11:53:32 2021 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 1 Mar 2021 11:53:32 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: Jonathan, You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time) Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space Mmvdisk vdisk set ... insert parameters https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm Sent from my iPhone > On 1 Mar 2021, at 21:45, Jonathan Buzzard wrote: > > ?On 01/03/2021 09:08, Luis Bolinches wrote: >> Hi >> >> There other reasons to have more than 1. It is management of those. When >> you have to add or remove NSDs of a FS having more than 1 makes it >> possible to empty some space and manage those in and out. Manually but >> possible. If you have one big NSD or even 1 per enclosure it might >> difficult or even not possible depending the number of enclosures and FS >> utilization. >> >> Starting some ESS version (not DSS, cant comment on that) that I do not >> recall but in the last 6 months, we have change the default (for those >> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no >> impact on performance either way on ESS, we tested it. But management of >> those on the long run should be easier. > Question how does one create a none default number of vdisks per > enclosure then? > > I tried creating a stanza file and then doing mmcrvdisk but it was not > happy, presumably because of the "new style" recovery group management > > mmcrvdisk: [E] This command is not supported by recovery groups under > management of mmvdisk. > > > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scl at virginia.edu Mon Mar 1 12:31:37 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Mon, 1 Mar 2021 12:31:37 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Hi folks, Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this: ... group:group1:rwxc mask::rwxc ... and I modify a different entry with: setfacl -m group:group2:r-x dirname then the "c" permissions above get cleared and I end up with ... group:group1:rwx- mask::rwx- ... I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From olaf.weiser at de.ibm.com Mon Mar 1 12:45:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 12:45:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 1 12:58:44 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 1 Mar 2021 12:58:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 13:14:38 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 13:14:38 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: On 01/03/2021 12:45, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Hallo Stephen, > behavior ... or better to say ... predicted behavior for chmod and ACLs > .. is not an easy thing or only? , if? you stay in either POSIX world or > NFSv4 world > to be POSIX compliant, a chmod overwrites ACLs One might argue that the general rubbishness of the mmputacl cammand, and if a mmsetfacl command (or similar) existed it would negate messing with Linux utilities to change ACL's on GPFS file systems Only been bringing it up for over a decade now ;-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 15:18:59 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 15:18:59 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Mon Mar 1 08:59:35 2021 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Mon, 01 Mar 2021 08:59:35 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk> Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference. As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds. -- Lauz On 1 March 2021 08:16:43 GMT, Achim Rehor wrote: >The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > >the increased parallelism, that gives you 'more spindles' and thus more > >performance. >In GNR the drives are used in parallel anyway through the GNRstriping. >Therfore, you are using all drives of a ESS/GSS/DSS model under the >hood >in the vdisks anyway. > >The only reason for having more NSDs is for using them for different >filesystems. > > >Mit freundlichen Gr??en / Kind regards > >Achim Rehor > >IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > >gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > >> From: Jonathan Buzzard >> To: gpfsug-discuss at spectrumscale.org >> Date: 01/03/2021 08:58 >> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of >NSD's >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: >> > >> > I?ve tried benchmarking many vs. few vdisks per RG, and never could > >see >> > any performance difference. >> >> That's encouraging. >> >> > >> > Usually we create 1 vdisk per enclosure per RG, thinking this >will >> > allow us to grow with same size vdisks when adding additional >enclosures >> > in the future. >> > >> > Don?t think mmvdisk can be told to create multiple vdisks per RG >> > directly, so you have to manually create multiple vdisk sets each >with > >> > the apropriate size. >> > >> >> Thing is back in the day so GPFS v2.x/v3.x there where strict >warnings >> that you needed a minimum of six NSD's for optimal performance. I >have >> sat in presentations where IBM employees have said so. What we where >> told back then is that GPFS needs a minimum number of NSD's in order >to >> be able to spread the I/O's out. So if an NSD is being pounded for >reads > >> and a write comes in it. can direct it to a less busy NSD. >> >> Now I can imagine that in a ESS/DSS-G that as it's being scattered to > >> the winds under the hood this is no longer relevant. But some notes >to >> the effect for us old timers would be nice if that is the case to put > >> our minds to rest. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url? >> >u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- >> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- >> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= >> > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 16:50:31 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 16:50:31 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> On 01/03/2021 15:18, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > JAB, > yes-this is in argument ;-) ... and personally I like the idea of having > smth like setfacl also for GPFS ..? for years... > *but* it would not take away the generic challenge , what to do, if > there are competing standards / definitions to meet > at least that is most likely just one reason, why there's no tool yet > there are several hits on RFE page for "ACL".. some of them could be > also addressed with a (mm)setfacl tool > but I was not able to find a request for a tool itself > (I quickly? searched? public but? not found it there, maybe there is > already one in private...) > So - dependent on how important this item for others? is? ... its time > to fire an RFE ?!? ... Well when I asked I was told by an IBM representative that it was by design there was no proper way to set ACLs directly from Linux. The expectation was that you would do this over NFSv4 or Samba. So filing an RFE would be pointless under those conditions and I have never bothered as a result. This was pre 2012 so IBM's outlook might have changed in the meantime. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 17:57:11 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 17:57:11 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Mar 2 09:36:48 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 2 Mar 2021 09:36:48 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920000.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920001.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920002.png Type: image/png Size: 1172 bytes Desc: not available URL: From russell at nordquist.info Tue Mar 2 19:31:24 2021 From: russell at nordquist.info (Russell Nordquist) Date: Tue, 2 Mar 2021 14:31:24 -0500 Subject: [gpfsug-discuss] Self service creation of filesets Message-ID: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell From anacreo at gmail.com Tue Mar 2 20:58:29 2021 From: anacreo at gmail.com (Alec) Date: Tue, 2 Mar 2021 12:58:29 -0800 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: This does feel like another situation where I may use a custom attribute and a periodic script to do the fileset creation. Honestly I would want the change management around fileset creation. But I could see a few custom attributes on a newly created user dir... Like maybe just setting user.quota=10TB... Then have a policy that discovers these does the work of creating the fileset, setting the quotas, migrating data to the fileset, and then mounting the fileset over the original directory. Honestly that sounds so nice I may have to implement this... Lol. Like I could see doing something like discovering directories that have user.archive=true and automatically gzipping large files within. Would be nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=. Alec On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the > admins can create them. To the users it?s just a directory so it slows > things down. Has anyone deployed a self service model for creating > filesets? Maybe using the API? This feels like shared pain that someone has > already worked on?. > > thanks > Russell > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Mar 2 22:38:17 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Mar 2021 22:38:17 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon ?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ckerner at illinois.edu Tue Mar 2 22:59:01 2021 From: ckerner at illinois.edu (Kerner, Chad A) Date: Tue, 2 Mar 2021 22:59:01 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> References: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu> We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things. Chad -- Chad Kerner, Senior Storage Engineer Storage Enabling Technologies National Center for Supercomputing Applications University of Illinois, Urbana-Champaign ?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ From tortay at cc.in2p3.fr Wed Mar 3 08:06:37 2021 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 3 Mar 2021 09:06:37 +0100 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> On 02/03/2021 20:31, Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). Delegation authorization (identifying "power-users") is external to the tool. Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From russell at nordquist.info Wed Mar 3 17:14:37 2021 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 3 Mar 2021 12:14:37 -0500 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. What I would want is to be able to grant the the following calls + maybe a few more. The related REST API calls. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm Russell > On Mar 3, 2021, at 3:06 AM, Loic Tortay wrote: > > On 02/03/2021 20:31, Russell Nordquist wrote: >> Hi all >> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, > We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. > > Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". > In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. > > This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). > > Delegation authorization (identifying "power-users") is external to the tool. > > Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). > > There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) > > The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html > > Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Thu Mar 4 09:51:45 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 4 Mar 2021 09:51:45 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk> On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem. Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 10:04:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 10:04:22 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> I am seeing that whenever I try and restore a file with an ACL I get the a ANS1589W error in /var/log/dsmerror.log ANS1589W Unable to write extended attributes for ****** due to errno: 13, reason: Permission denied But bizarrely the ACL is actually restored. At least as far as I can tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b install. The backup node makes the third quorum node for the cluster being as that it runs genuine RHEL (unlike all the compute nodes which are running CentOS). Googling I can't find any references to this being fixed in a later version of the GPFS software, though being on RHEL7 and it's derivatives I am stuck on 5.0.5 Surely root has permissions to write the extended attributes for anyone? It would seem perverse if you have to be the owner of a file to restore the ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Fri Mar 5 12:15:38 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 12:15:38 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 13:07:56 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 13:07:56 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem.? There recently was an issue with Protect and how it used the > GPFS API for ACLs.? If I recall Protect was not properly handling a > return code.? I do not know if it is relevant to your problem but? it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Fri Mar 5 18:06:43 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 5 Mar 2021 18:06:43 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> Hallo All, thge mentioned problem with protect was this: https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jonathan Buzzard Gesendet: Freitag, 5. M?rz 2021 14:08 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem. There recently was an issue with Protect and how it used the > GPFS API for ACLs. If I recall Protect was not properly handling a > return code. I do not know if it is relevant to your problem but it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Fri Mar 5 19:12:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 19:12:47 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>, <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 20:31:54 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 20:31:54 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk> On 05/03/2021 19:12, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > I was referring to this flash, > https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E > > > Spectrum Protect 8.1.11 client has the fix so this should not be an > issue for Jonathan.? Probably best to open a help case against Spectrum > Protect and begin the investigation there. > Also the fix is to stop an unchanged file with an ACL from being backed up again, but only one more time. I suspect we where hit with that issue, but given we only have ~90GB of files with ACL's on them I would not have noticed. That is significantly less than the normal daily churn. This however is an issue with the *restore*. Everything looks to get restored correctly even the ACL's. At the end of the restore all looks good given the headline report from dsmc. However there are ANS1589W warnings in dsmerror.log and dsmc exits with an error code of 8 rather than zero. Will open a case against Spectrum Protect on Monday. I am pretty confident the warnings are false. The current plan is to do carefully curated hand restores of the three afflicted users when the rest of the restore if finished to double check the ACL's are the only issue. Quite how the Spectrum Protect team have missed this bug is beyond me. Do they not have some unit tests to check this stuff before pushing out updates. I know in the past it worked, though that was many years ago now. However I restored many TB of data from backup with ACL's on them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Mon Mar 8 14:49:59 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 14:49:59 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 15:29:42 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 15:29:42 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Mar 8 15:34:21 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 15:34:21 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them. I can?t say for sure where in the file system they are (and I don?t care). Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frederick Stock Reply-To: gpfsug main discussion list Date: Monday, March 8, 2021 at 9:29 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents? CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ Could you use the PATHNAME LIKE statement to limit the location to the files of interest? Fred _______________________________________________________ Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents? Date: Mon, Mar 8, 2021 10:12 AM Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 16:07:48 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 16:07:48 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 8 20:45:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 20:45:05 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 16:07, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Presumably the only feature that would help here is if policy could > determine that the end location pointed to by a symbolic link is within > the current file system.? I am not aware of any such feature or > attribute which policy could check so I think all you can do is run > policy to find the symbolic links and then check each link to see if it > points into the same file system.? You might find the mmfind command > useful for this purpose.? I expect it would eliminate the need to create > a policy to find the symbolic links. > Unless you are using bind mounts if the symbolic link points outside the mount point of the file system it is not within the current file system. So noting that you can write very SQL like statements something like the following should in theory do it RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' Note the above is not checked in any way shape or form for working. Even if you do have bind mounts of other GPFS file systems you just need a more complicated WHERE statement. When doing policy engine stuff I find having that section of the GPFS manual printed out and bound, along with an SQL book for reference is very helpful. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Mon Mar 8 21:00:04 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 21:00:04 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 20:45, Jonathan Buzzard wrote: [SNIP] > So noting that you can write very SQL like statements something like the > following should in theory do it > > RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND > SUBSTR(PATH_NAME,0,4)='/fs1/' > > Note the above is not checked in any way shape or form for working. Even > if you do have bind mounts of other GPFS file systems you just need a > more complicated WHERE statement. Duh, of course as soon as I sent it, I realized there is a missing SHOW RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' You could replace the SUBSTR with a REGEX if you prefer JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ulmer at ulmer.org Mon Mar 8 22:33:38 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 8 Mar 2021 17:33:38 -0500 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard wrote: > > ?On 08/03/2021 20:45, Jonathan Buzzard wrote: > > [SNIP] > >> So noting that you can write very SQL like statements something like the >> following should in theory do it >> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND >> SUBSTR(PATH_NAME,0,4)='/fs1/' >> Note the above is not checked in any way shape or form for working. Even >> if you do have bind mounts of other GPFS file systems you just need a >> more complicated WHERE statement. > > Duh, of course as soon as I sent it, I realized there is a missing SHOW > > RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' > > You could replace the SUBSTR with a REGEX if you prefer > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Tue Mar 9 12:25:56 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 9 Mar 2021 12:25:56 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links with contents? In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> References: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" wrote: CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ---------------------------------------------------------------------- Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). From bill.burke.860 at gmail.com Wed Mar 10 02:19:02 2021 From: bill.burke.860 at gmail.com (William Burke) Date: Tue, 9 Mar 2021 21:19:02 -0500 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Mar 10 02:21:54 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 10 Mar 2021 02:21:54 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From anacreo at gmail.com Wed Mar 10 02:59:18 2021 From: anacreo at gmail.com (Alec) Date: Tue, 9 Mar 2021 18:59:18 -0800 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: You would definitely be able to search by inode creation date and find the files you want... our 1.25m file filesystem takes about 47 seconds to query... One thing I would worry about though is inode deletion and inter-fileset file moves. The SQL based engine wouldn't be able to identify those changes and so you'd not be able to replicate deletes and such. Alternatively.... I have a script that runs in about 4 minutes and it pulls all the data out of the backup indexes, and compares the pre-built hourly file index on our system and identifies files that don't exist in the backup, so I have a daily backup validation... I filter the file list using ksh's printf date manipulation to filter out files that are less than 2 days old, to reduce the noise. A modification to this could simply compare a daily file index with the previous day's index, and send rsync a list of files (existing or deleted) based on just a delta of the two indexes (sort|diff), then you could properly account for all the changes. If you don't care about file modifications just produce both lists based on creation time instead of modification time. The mmfind command or GPFS policy engine should be able to produce a full file list/index very rapidly. In another thread there was a conversation with ACL's... I don't think our backup system backs up ACL's so I just have GPFS produce a list of all ACL applied objects on the daily, and have a script that just makes a null delimited backup file of every single ACL on our file system... and have a script to apply the ACL's as a "restore". It's a pretty simple thing to write-up and keeping 90 day history on this lets me compare the ACL evolution on a file very easily. Alec MVH Most Victorious Hunting (Why should Scandinavians own this cool sign off) On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski wrote: > Yup, you want to use the policy engine: > > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m > reluctant to provide examples as I?m actually suspicious that we don?t have > it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > > On Mar 9, 2021, at 9:19 PM, William Burke > wrote: > > > > I would like to know what files were modified/created/deleted (only for > the current day) on the GPFS's file system so that I could rsync ONLY those > files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not > have to traverse the filesystem looking for these files? If i use the rsync > tool it will scan the file system which is 400+ million files. Obviously > this will be problematic to complete a scan in a day, if it would ever > complete single-threaded. There are tools or scripts that run multithreaded > rsync but it's still a brute force attempt. and it would be nice to know > where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not > sure if this is the best approach to looking at the GPFS metadata - inodes, > modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Mar 10 15:15:58 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Mar 2021 15:15:58 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk> On 10/03/2021 02:59, Alec wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > You would definitely be able to search by inode creation date and find > the files you want... our 1.25m file filesystem takes about 47 seconds > to query...? One thing I would worry about though is inode deletion and > inter-fileset file moves.? ?The SQL based engine wouldn't be able to > identify those changes and so you'd not be able to replicate deletes and > such. > This is the problem with rsync "backups", you need to run it with --delete otherwise any restore will "upset" your users as they find large numbers of file they had deleted unhelpfully "restored" > Alternatively.... > I have a script that runs in about 4 minutes and it pulls all the data > out of the backup indexes, and compares the pre-built hourly file index > on our system and identifies files that don't exist in the backup, so I > have a daily backup validation...? I filter the file list using > ksh's?printf date manipulation to filter out files that are less than 2 > days old, to reduce the noise.? A modification to this could simply > compare a daily file index with the previous day's index, and send rsync > a list of files (existing or deleted) based on just a delta of the two > indexes (sort|diff), then you could properly account for all the > changes.? If you don't care about file modifications just produce both > lists based on creation time instead of modification time.? The mmfind > command or GPFS policy engine should be able to produce a full file > list/index very rapidly. > My view would be somewhere along the lines of this is a lot of work and if you have the space to rsync your GPFS file system to, presumably with a server attached to said storage then for under 500 PVU of Spectrum Protect licensing you can have a fully supported client/server Spectrum Protect/TSM backup solution and just use mmbackup. You need to play the game and use older hardware ;-) I use an ancient pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be quad core and 70 PVU per core which would cost an additional ~$1000 in licensing. If it breaks down they are under $100 on eBay. It's never skipped a beat and I have just finished a complete planned restore of our DSS-G using it. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Mar 10 19:09:13 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Mar 2021 19:09:13 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: I was looking for the original source for this, but it was on dev works ... which is now dead. But you can use something like: tsbuhelper clustermigdiff \ $migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist "mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today "mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented... We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses: RULE EXTERNAL LIST 'latest.filelist' EXEC '' \ RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \ SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \ VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \ ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \ WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \ ELSE 'resdnt' END )) \ WHERE \ ( \ NOT \ ( (PATH_NAME LIKE '/%/.mmbackup%') OR \ (PATH_NAME LIKE '/%/.mmmigrate%') OR \ (PATH_NAME LIKE '/%/.afm%') OR \ (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \ (PATH_NAME LIKE '/%/.mmLockDir/%') OR \ (MODE LIKE 's%') \ ) \ ) \ AND \ (MISC_ATTRIBUTES LIKE '%u%') \ AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \ AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%')) On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files. This comes with no warranty ... We don't use this for backup, Spectrum Protect and mmbackup are our friends ... Simon ?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" wrote: Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From enrico.tagliavini at fmi.ch Thu Mar 11 09:22:46 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 09:22:46 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch> Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org? On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > > > \\UTGERS,?? |---------------------------*O*--------------------------- > > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu > > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark > ???? `' > > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > > > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a > > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > > brute force attempt. and it would be nice to know where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > > metadata - inodes, modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu Mar 11 13:17:30 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:17:30 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >> Sent: Wednesday, March 10, 2021 3:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >> >> Yup, you want to use the policy engine: >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >> >> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >> don?t have it quite right and are passing far too much stuff to rsync). >> >> -- >> #BlackLivesMatter >> ____ >>>> \\UTGERS, |---------------------------*O*--------------------------- >>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >> `' >> >>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>> >>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>> >>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>> >>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>> metadata - inodes, modify times, creation times, etc. >>> >>> >>> >>> -- >>> >>> Best Regards, >>> >>> William Burke (he/him) >>> Lead HPC Engineer >>> Advance Research Computing >>> 860.255.8832 m | LinkedIn >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From enrico.tagliavini at fmi.ch Thu Mar 11 13:24:47 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 13:24:47 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> Message-ID: Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Mar 11 13:47:44 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:47:44 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > >> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: >> I?m going to ask what may be a dumb question: >> >> Given that you have GPFS on both ends, what made you decide to NOT use AFM? >> >> -- >> Stephen >> >> >>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: >>> >>> ?Hello William, >>> >>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents. >>> >>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is >>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example >>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. >>> >>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me >>> enough not to go that route. >>> >>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just >>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which >>> the ctime changes in the last couple of days (to update metadata info). >>> >>> Good luck. >>> Kind regards. >>> >>> -- >>> >>> Enrico Tagliavini >>> Systems / Software Engineer >>> >>> enrico.tagliavini at fmi.ch >>> >>> Friedrich Miescher Institute for Biomedical Research >>> Infomatics >>> >>> Maulbeerstrasse 66 >>> 4058 Basel >>> Switzerland >>> >>> >>> >>> >>> -------- Forwarded Message -------- >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >>>> Sent: Wednesday, March 10, 2021 3:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >>>> >>>> Yup, you want to use the policy engine: >>>> >>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >>>> >>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >>>> don?t have it quite right and are passing far too much stuff to rsync). >>>> >>>> -- >>>> #BlackLivesMatter >>>> ____ >>>>>> \\UTGERS, |---------------------------*O*--------------------------- >>>>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >>>> `' >>>> >>>>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>>>> >>>>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>>>> >>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>>>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>>>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>>>> >>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>>>> metadata - inodes, modify times, creation times, etc. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best Regards, >>>>> >>>>> William Burke (he/him) >>>>> Lead HPC Engineer >>>>> Advance Research Computing >>>>> 860.255.8832 m | LinkedIn >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Mar 11 14:20:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 11 Mar 2021 14:20:05 +0000 Subject: [gpfsug-discuss] Synchronization/Restore of file systems Message-ID: As promised last year I having just completed a storage upgrade, I have sanitized my scripts and put them up on Github for other people to have a look at the methodology I use in these sorts of scenarios. This time the upgrade involved pulling out all the existing disks and fitting large ones then restoring from backup, rather than synchronizing to a new system, but the principles are the same. Bear in mind the code is written in Perl because it's history is ancient now and with few opportunities to test it in anger, rewriting it in the latest fashionable scripting language is unappealing. https://github.com/digitalcabbage/syncrestore JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From enrico.tagliavini at fmi.ch Thu Mar 11 14:24:43 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 14:24:43 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM. The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM. The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling. As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage. And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sadaniel at us.ibm.com Thu Mar 11 16:08:11 2021 From: sadaniel at us.ibm.com (Steven Daniels) Date: Thu, 11 Mar 2021 09:08:11 -0700 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com http://www.ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Thu Mar 11 16:28:57 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 11 Mar 2021 16:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu> Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 11, 2021, at 11:08 AM, Steven Daniels wrote: > > Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. > > I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. > > The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. > > I'll leave it to Venkat and others on the development team to share more details about improvements. > > > Steven A. Daniels > Cross-brand Client Architect > Senior Certified IT Specialist > National Programs > Fax and Voice: 3038101229 > sadaniel at us.ibm.com > http://www.ibm.com > <1A816397.jpg> > > Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y > > From: Stephen Ulmer > To: gpfsug main discussion list > Cc: bill.burke.860 at gmail.com > Date: 03/11/2021 06:47 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Thank you! Would you mind letting me know in what era you made your evaluation? > > I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. > > Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. > > Your original post was very thoughtful, and I appreciate your time. > > -- > Stephen > > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > > On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: > I?m going to ask what may be a dumb question: > > Given that you have GPFS on both ends, what made you decide to NOT use AFM? > > -- > Stephen > > > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > \\UTGERS, |---------------------------*O*--------------------------- > _// the State | Ryan Novosielski - novosirj at rutgers.edu > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From honwai.leong at sydney.edu.au Thu Mar 11 22:28:57 2021 From: honwai.leong at sydney.edu.au (Honwai Leong) Date: Thu, 11 Mar 2021 22:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: This paper might provide some ideas, not the best solution but works fine https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Friday, March 12, 2021 3:08 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 20 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Fwd: FW: Backing up GPFS with Rsync (Steven Daniels) ---------------------------------------------------------------------- Message: 1 Date: Thu, 11 Mar 2021 09:08:11 -0700 From: "Steven Daniels" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com Subject: Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Message-ID: Content-Type: text/plain; charset="utf-8" Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org End of gpfsug-discuss Digest, Vol 110, Issue 20 *********************************************** From juergen.hannappel at desy.de Mon Mar 15 16:20:51 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET) Subject: [gpfsug-discuss] Detecting open files Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de> Hi, when unlinking filesets that sometimes fails because some open files on that fileset still exist. Is there a way to find which files are open, and from which node? Without running a mmdsh -N all lsof on serveral (big) remote clusters, that is. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From Robert.Oesterlin at nuance.com Wed Mar 17 11:59:57 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 11:59:57 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it? Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 14:18:56 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 14:18:56 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898090.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898091.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898092.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 14:30:36 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 14:30:36 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Can you give me details on how to do this? I tried this: [root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"' could not change directory to "/root" psql: FATAL: Peer authentication failed for user "postgres" Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 9:19 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 15:09:51 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 15:09:51 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898093.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898094.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898095.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 15:33:54 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 15:33:54 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way. [root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts" DELETE 20 /usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug 10:32 AM Operation Failed 10:32 AM Error: debug: locale=en_US debug: Running 'mmlsmount 'fs1' -Y ' on node localhost debug: Running 'mmlsmount 'fs2' -Y ' on node localhost debug: Running 'mmlsmount 'fs3' -Y ' on node localhost debug: Running 'mmlsmount 'fs4' -Y ' on node localhost debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 10:10 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ I think psql postgres postgres -c "delete from fscc.filesystem_mounts"' ran as root should do the trick. Mit freundlichen Gr??en / Kind regards [cid:image001.png at 01D71B19.07732D00] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1135 bytes Desc: image001.png URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 17:05:11 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 17:05:11 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898096.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898097.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D71B19.07732D00.png Type: image/png Size: 1135 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898098.png Type: image/png Size: 1172 bytes Desc: not available URL: From robert.horton at icr.ac.uk Thu Mar 18 15:47:07 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 18 Mar 2021 15:47:07 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Message-ID: Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 06:32:00 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 12:02:00 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Mar 19 09:42:22 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 19 Mar 2021 09:42:22 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 09:50:04 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 15:20:04 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> References: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Message-ID: Hi Robert, So you might have started seeing problem after upgrading the gateway nodes to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would resolve this problem. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/19/2021 03:13 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 09:32:10 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 10:32:10 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Mar 22 09:54:28 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 22 Mar 2021 10:54:28 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller < u.sibiller at science-computing.de>: > Hello, > > we usually create filesets for project dirs and homes. > > Unfortunately we have discovered that this convention has been ignored for > some dirs and their data > no resides in the root fileset. We would like to move the data to > independent filesets. > > Is there a way to do this without having to schedule a downtime for the > dirs in question? > > I mean, is there a way to transparently move data to an independent > fileset at the same path? > > > Kind regards, > > Ulrich Sibiller > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Mar 22 12:24:59 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Mar 2021 12:24:59 +0000 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: You could maybe create the new file-set, link in a different place, copy the data ? Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy. Simon From: on behalf of "janfrode at tanso.net" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 22 March 2021 at 09:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller >: Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 13:20:46 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:20:46 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: On 22.03.21 13:24, Simon Thompson wrote: > You could maybe create the new file-set, link in a different place, copy the data ? > > Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially > reducing the time to do the copy. Yes, but this does not help if a file is open all the time, e.g. during a long-running job. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Mon Mar 22 13:41:39 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:41:39 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de> On 22.03.21 10:54, Jan-Frode Myklebust wrote: > No ? all copying between filesets require full data copy. No simple rename. > > This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. Yes, your are right. So please vote here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429 Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From robert.horton at icr.ac.uk Tue Mar 23 19:02:05 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Tue, 23 Mar 2021 19:02:05 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From vpuvvada at in.ibm.com Wed Mar 24 02:36:31 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Wed, 24 Mar 2021 08:06:31 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: ># mmafmlocal rm /.afm/.afmctl >/bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can ignore this issue, and restart AFM filesets at cache. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 12:33 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasad.surampudi at theatsgroup.com Wed Mar 24 14:32:30 2021 From: prasad.surampudi at theatsgroup.com (Prasad Surampudi) Date: Wed, 24 Mar 2021 14:32:30 +0000 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Message-ID: Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Mar 25 16:33:48 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 25 Mar 2021 11:33:48 -0500 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems In-Reply-To: References: Message-ID: Prasad, This is unexpected. Please open a PMR so that data can be collected and looked at. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Prasad Surampudi To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 10:32 AM Subject: [EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Sent by: gpfsug-discuss-bounces at spectrumscale.org Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Mon Mar 29 19:38:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Mon, 29 Mar 2021 18:38:00 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Mar 30 07:06:54 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Mar 2021 06:06:54 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Tue Mar 30 19:24:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Tue, 30 Mar 2021 18:24:00 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34 In-Reply-To: References: Message-ID: Hey Olaf, We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight. As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem. We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, March 30, 2021 1:07 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 34 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Filesystem mount attempt hangs GPFS client node (Saula, Oluwasijibomi) 2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 29 Mar 2021 18:38:00 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="utf-8" Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 30 Mar 2021 06:06:54 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 110, Issue 34 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 07:58:43 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 07:58:43 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > any performance difference. That's encouraging. > > Usually we create 1 vdisk per enclosure per RG, ? thinking this will > allow us to grow with same size vdisks when adding additional enclosures > in the future. > > Don?t think mmvdisk can be told to create multiple vdisks per RG > directly, so you have to manually create multiple vdisk sets each with > the apropriate size. > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings that you needed a minimum of six NSD's for optimal performance. I have sat in presentations where IBM employees have said so. What we where told back then is that GPFS needs a minimum number of NSD's in order to be able to spread the I/O's out. So if an NSD is being pounded for reads and a write comes in it. can direct it to a less busy NSD. Now I can imagine that in a ESS/DSS-G that as it's being scattered to the winds under the hood this is no longer relevant. But some notes to the effect for us old timers would be nice if that is the case to put our minds to rest. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Achim.Rehor at de.ibm.com Mon Mar 1 08:16:43 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 09:16:43 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > From S.J.Thompson at bham.ac.uk Mon Mar 1 09:06:07 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 1 Mar 2021 09:06:07 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Or for hedging your bets about how you might want to use it in future. We are never quite sure if we want to do something different in the future with some of the storage, sure that might mean we want to steal some space from a file-system, but that is perfectly valid. And we have done this, both in temporary transient states (data migration between systems), or permanently (found we needed something on a separate file-system) So yes whilst there might be no performance impact on doing this, we still do. I vaguely recall some of the old reasoning was around IO queues in the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD server, you have 16 IO queues passing to multipath, which can help keep the data pipes full. I suspect there was some optimal number of NSDs for different storage controllers, but I don't know if anyone ever benchmarked that. Simon ?On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com" wrote: The reason for having multiple NSDs in legacy NSD (non-GNR) handling is the increased parallelism, that gives you 'more spindles' and thus more performance. In GNR the drives are used in parallel anyway through the GNRstriping. Therfore, you are using all drives of a ESS/GSS/DSS model under the hood in the vdisks anyway. The only reason for having more NSDs is for using them for different filesystems. Mit freundlichen Gr??en / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org > Date: 01/03/2021 08:58 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could see > > any performance difference. > > That's encouraging. > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > allow us to grow with same size vdisks when adding additional enclosures > > in the future. > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > directly, so you have to manually create multiple vdisk sets each with > > the apropriate size. > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > that you needed a minimum of six NSD's for optimal performance. I have > sat in presentations where IBM employees have said so. What we where > told back then is that GPFS needs a minimum number of NSD's in order to > be able to spread the I/O's out. So if an NSD is being pounded for reads > and a write comes in it. can direct it to a less busy NSD. > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > the winds under the hood this is no longer relevant. But some notes to > the effect for us old timers would be nice if that is the case to put > our minds to rest. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Mon Mar 1 09:08:20 2021 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Mon, 1 Mar 2021 09:08:20 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Mar 1 09:34:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 09:34:26 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Mon Mar 1 09:46:06 2021 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Mon, 1 Mar 2021 10:46:06 +0100 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: Correct, there was. The OS is dealing with pdisks, while GPFS is striping over Vdisks/NSDs. For GNR there is a differetnt queuing setup in GPFS, than there was for NSDs. See "mmfsadm dump nsd" and check for NsdQueueTraditional versus NsdQueueGNR And yes, i was too strict, with "> The only reason for having more NSDs is for using them for different > filesystems." there are other management reasons to run with a reasonable number of vdisks, just not performance reasons. Mit freundlichen Gruessen / Kind regards Achim Rehor IBM EMEA ESS/Spectrum Scale Support gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 10:06:07: > From: Simon Thompson > To: gpfsug main discussion list > Date: 01/03/2021 10:06 > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Or for hedging your bets about how you might want to use it in future. > > We are never quite sure if we want to do something different in the > future with some of the storage, sure that might mean we want to > steal some space from a file-system, but that is perfectly valid. > And we have done this, both in temporary transient states (data > migration between systems), or permanently (found we needed > something on a separate file-system) > > So yes whilst there might be no performance impact on doing this, westill do. > > I vaguely recall some of the old reasoning was around IO queues in > the OS, i.e. if you had 6 NSDs vs 16 NSDs attached to the NSD > server, you have 16 IO queues passing to multipath, which can help > keep the data pipes full. I suspect there was some optimal number of > NSDs for different storage controllers, but I don't know if anyone > ever benchmarked that. > > Simon > > On 01/03/2021, 08:16, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Achim.Rehor at de.ibm.com" bounces at spectrumscale.org on behalf of Achim.Rehor at de.ibm.com> wrote: > > The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > the increased parallelism, that gives you 'more spindles' and thus more > performance. > In GNR the drives are used in parallel anyway through the GNRstriping. > Therfore, you are using all drives of a ESS/GSS/DSS model under the hood > in the vdisks anyway. > > The only reason for having more NSDs is for using them for different > filesystems. > > > Mit freundlichen Gr??en / Kind regards > > Achim Rehor > > IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > > gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > > > From: Jonathan Buzzard > > To: gpfsug-discuss at spectrumscale.org > > Date: 01/03/2021 08:58 > > Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of > NSD's > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On 28/02/2021 09:31, Jan-Frode Myklebust wrote: > > > > > > I?ve tried benchmarking many vs. few vdisks per RG, and never could > see > > > any performance difference. > > > > That's encouraging. > > > > > > > > Usually we create 1 vdisk per enclosure per RG, thinking this will > > > allow us to grow with same size vdisks when adding additional > enclosures > > > in the future. > > > > > > Don?t think mmvdisk can be told to create multiple vdisks per RG > > > directly, so you have to manually create multiple vdisk setseach with > > > > the apropriate size. > > > > > > > Thing is back in the day so GPFS v2.x/v3.x there where strict warnings > > that you needed a minimum of six NSD's for optimal performance. I have > > sat in presentations where IBM employees have said so. What we where > > told back then is that GPFS needs a minimum number of NSD's inorder to > > be able to spread the I/O's out. So if an NSD is being poundedfor reads > > > and a write comes in it. can direct it to a less busy NSD. > > > > Now I can imagine that in a ESS/DSS-G that as it's being scattered to > > the winds under the hood this is no longer relevant. But some notes to > > the effect for us old timers would be nice if that is the case to put > > our minds to rest. > > > > > > JAB. > > > > -- > > Jonathan A. Buzzard Tel: +44141-5483420 > > HPC System Administrator, ARCHIE-WeSt. > > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url? > > > > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > > M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- > > IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- > M&m=gU9xf_Z6rrdOa4- > WKodSyFPbnGGbAGC_LK7hgYPB3yQ&s=L_VtTqSwQbqfIR5VVmn6mYxmidgnH37osHrFPX0E-Ck&e= > From jonathan.buzzard at strath.ac.uk Mon Mar 1 11:45:45 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 11:45:45 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: Message-ID: On 01/03/2021 09:08, Luis Bolinches wrote: > Hi > > There other reasons to have more than 1. It is management of those. When > you have to add or remove NSDs of a FS having more than 1 makes it > possible to empty some space and manage those in and out. Manually but > possible. If you have one big NSD or even 1 per enclosure it might > difficult or even not possible depending the number of enclosures and FS > utilization. > > Starting some ESS version (not DSS, cant comment on that) that I do not > recall but in the last 6 months, we have change the default (for those > that use the default) to 4 NSDs per enclosure for ESS 5000. There is no > impact on performance either way on ESS, we tested it. But management of > those on the long run should be easier. Question how does one create a none default number of vdisks per enclosure then? I tried creating a stanza file and then doing mmcrvdisk but it was not happy, presumably because of the "new style" recovery group management mmcrvdisk: [E] This command is not supported by recovery groups under management of mmvdisk. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Mon Mar 1 11:53:32 2021 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Mon, 1 Mar 2021 11:53:32 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: Message-ID: Jonathan, You need to create vdisk sets which will create multiple vdisks, you can then assign vdisk sets to your filesystem. (Assigning multiple vdisks at a time) Things to watch - free space calculations are more complex as it?s building multiple vdisks under the cover using multiple raid parameters Also it?s worth assuming a 10% reserve or approx - drive per disk shelf for rebuild space Mmvdisk vdisk set ... insert parameters https://www.ibm.com/support/knowledgecenter/mk/SSYSP8_5.3.2/com.ibm.spectrum.scale.raid.v5r02.adm.doc/bl8adm_mmvdisk.htm Sent from my iPhone > On 1 Mar 2021, at 21:45, Jonathan Buzzard wrote: > > ?On 01/03/2021 09:08, Luis Bolinches wrote: >> Hi >> >> There other reasons to have more than 1. It is management of those. When >> you have to add or remove NSDs of a FS having more than 1 makes it >> possible to empty some space and manage those in and out. Manually but >> possible. If you have one big NSD or even 1 per enclosure it might >> difficult or even not possible depending the number of enclosures and FS >> utilization. >> >> Starting some ESS version (not DSS, cant comment on that) that I do not >> recall but in the last 6 months, we have change the default (for those >> that use the default) to 4 NSDs per enclosure for ESS 5000. There is no >> impact on performance either way on ESS, we tested it. But management of >> those on the long run should be easier. > Question how does one create a none default number of vdisks per > enclosure then? > > I tried creating a stanza file and then doing mmcrvdisk but it was not > happy, presumably because of the "new style" recovery group management > > mmcrvdisk: [E] This command is not supported by recovery groups under > management of mmvdisk. > > > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=STXkGEO2XATS_s2pRCAAh2wXtuUgwVcx1XjUX7ELNdk&m=9HlRHByoByQcM0mY0elL-l4DgA6MzHkAGzE70Rl2p2E&s=eWRfWGpdZB-PZ_InCCjgmdQOCy6rgWj9Oi3TGGA38yY&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scl at virginia.edu Mon Mar 1 12:31:37 2021 From: scl at virginia.edu (Losen, Stephen C (scl)) Date: Mon, 1 Mar 2021 12:31:37 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl Message-ID: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Hi folks, Experimenting with POSIX ACLs on GPFS 4.2 and noticed that the Linux command setfacl clears "c" permissions that were set with mmputacl. So if I have this: ... group:group1:rwxc mask::rwxc ... and I modify a different entry with: setfacl -m group:group2:r-x dirname then the "c" permissions above get cleared and I end up with ... group:group1:rwx- mask::rwx- ... I discovered that chmod does not clear the "c" mode. Is there any filesystem option to change this behavior to leave "c" modes in place? Steve Losen Research Computing University of Virginia scl at virginia.edu 434-924-0640 From olaf.weiser at de.ibm.com Mon Mar 1 12:45:44 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 12:45:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 1 12:58:44 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 1 Mar 2021 12:58:44 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 13:14:38 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 13:14:38 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: On 01/03/2021 12:45, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Hallo Stephen, > behavior ... or better to say ... predicted behavior for chmod and ACLs > .. is not an easy thing or only? , if? you stay in either POSIX world or > NFSv4 world > to be POSIX compliant, a chmod overwrites ACLs One might argue that the general rubbishness of the mmputacl cammand, and if a mmsetfacl command (or similar) existed it would negate messing with Linux utilities to change ACL's on GPFS file systems Only been bringing it up for over a decade now ;-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 15:18:59 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 15:18:59 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From laurence at qsplace.co.uk Mon Mar 1 08:59:35 2021 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Mon, 01 Mar 2021 08:59:35 +0000 Subject: [gpfsug-discuss] dssgmkfs.mmvdisk number of NSD's In-Reply-To: References: <0b800bf2-6eb2-442b-142f-f0814d745873@strath.ac.uk> Message-ID: <6F478E88-E350-46BF-9993-82C21ADD2262@qsplace.co.uk> Like Jan, I did some benchmarking a few years ago when the default recommended RG's dropped to 1 per DA to meet rebuild requirements. I couldn't see any discernable difference. As Achim has also mentioned, I just use vdisks for creating additional filesystems. Unless there is going to be a lot of shuffling of space or future filesystem builds, then I divide the RG's into say 10 vdisks to give some flexibility and granularity There is also a flag iirc that changes the gpfs magic to consider multiple under lying disks, when I find it again........ Which can provide increased performance on traditional RAID builds. -- Lauz On 1 March 2021 08:16:43 GMT, Achim Rehor wrote: >The reason for having multiple NSDs in legacy NSD (non-GNR) handling is > >the increased parallelism, that gives you 'more spindles' and thus more > >performance. >In GNR the drives are used in parallel anyway through the GNRstriping. >Therfore, you are using all drives of a ESS/GSS/DSS model under the >hood >in the vdisks anyway. > >The only reason for having more NSDs is for using them for different >filesystems. > > >Mit freundlichen Gr??en / Kind regards > >Achim Rehor > >IBM EMEA ESS/Spectrum Scale Support > > > > > > > > > > > > >gpfsug-discuss-bounces at spectrumscale.org wrote on 01/03/2021 08:58:43: > >> From: Jonathan Buzzard >> To: gpfsug-discuss at spectrumscale.org >> Date: 01/03/2021 08:58 >> Subject: [EXTERNAL] Re: [gpfsug-discuss] dssgmkfs.mmvdisk number of >NSD's >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> On 28/02/2021 09:31, Jan-Frode Myklebust wrote: >> > >> > I?ve tried benchmarking many vs. few vdisks per RG, and never could > >see >> > any performance difference. >> >> That's encouraging. >> >> > >> > Usually we create 1 vdisk per enclosure per RG, thinking this >will >> > allow us to grow with same size vdisks when adding additional >enclosures >> > in the future. >> > >> > Don?t think mmvdisk can be told to create multiple vdisks per RG >> > directly, so you have to manually create multiple vdisk sets each >with > >> > the apropriate size. >> > >> >> Thing is back in the day so GPFS v2.x/v3.x there where strict >warnings >> that you needed a minimum of six NSD's for optimal performance. I >have >> sat in presentations where IBM employees have said so. What we where >> told back then is that GPFS needs a minimum number of NSD's in order >to >> be able to spread the I/O's out. So if an NSD is being pounded for >reads > >> and a write comes in it. can direct it to a less busy NSD. >> >> Now I can imagine that in a ESS/DSS-G that as it's being scattered to > >> the winds under the hood this is no longer relevant. But some notes >to >> the effect for us old timers would be nice if that is the case to put > >> our minds to rest. >> >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url? >> >u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2- >> M&m=Mr4A8ROO2t7qFYTfTRM_LoPLllETw72h51FK07dye7Q&s=z6yRHIKsH- >> IaOjtto4ZyUjFFe0vTGhqzYUiM23rEShg&e= >> > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 1 16:50:31 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 1 Mar 2021 16:50:31 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> On 01/03/2021 15:18, Olaf Weiser wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > JAB, > yes-this is in argument ;-) ... and personally I like the idea of having > smth like setfacl also for GPFS ..? for years... > *but* it would not take away the generic challenge , what to do, if > there are competing standards / definitions to meet > at least that is most likely just one reason, why there's no tool yet > there are several hits on RFE page for "ACL".. some of them could be > also addressed with a (mm)setfacl tool > but I was not able to find a request for a tool itself > (I quickly? searched? public but? not found it there, maybe there is > already one in private...) > So - dependent on how important this item for others? is? ... its time > to fire an RFE ?!? ... Well when I asked I was told by an IBM representative that it was by design there was no proper way to set ACLs directly from Linux. The expectation was that you would do this over NFSv4 or Samba. So filing an RFE would be pointless under those conditions and I have never bothered as a result. This was pre 2012 so IBM's outlook might have changed in the meantime. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Mon Mar 1 17:57:11 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 1 Mar 2021 17:57:11 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk> References: <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Mar 2 09:36:48 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 2 Mar 2021 09:36:48 +0000 Subject: [gpfsug-discuss] Using setfacl vs. mmputacl In-Reply-To: References: , <122a1eab-e27c-0dbf-f0f8-be407c3e1110@strath.ac.uk>, <488C1E07-5CF5-4C7E-8FC7-2CEE8463CC62@virginia.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920000.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920001.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16146770920002.png Type: image/png Size: 1172 bytes Desc: not available URL: From russell at nordquist.info Tue Mar 2 19:31:24 2021 From: russell at nordquist.info (Russell Nordquist) Date: Tue, 2 Mar 2021 14:31:24 -0500 Subject: [gpfsug-discuss] Self service creation of filesets Message-ID: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell From anacreo at gmail.com Tue Mar 2 20:58:29 2021 From: anacreo at gmail.com (Alec) Date: Tue, 2 Mar 2021 12:58:29 -0800 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: This does feel like another situation where I may use a custom attribute and a periodic script to do the fileset creation. Honestly I would want the change management around fileset creation. But I could see a few custom attributes on a newly created user dir... Like maybe just setting user.quota=10TB... Then have a policy that discovers these does the work of creating the fileset, setting the quotas, migrating data to the fileset, and then mounting the fileset over the original directory. Honestly that sounds so nice I may have to implement this... Lol. Like I could see doing something like discovering directories that have user.archive=true and automatically gzipping large files within. Would be nice if GPFS policy engine could have a IF_ANCESTOR_ATTRIBUTE=. Alec On Tue, Mar 2, 2021, 11:40 AM Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the > admins can create them. To the users it?s just a directory so it slows > things down. Has anyone deployed a self service model for creating > filesets? Maybe using the API? This feels like shared pain that someone has > already worked on?. > > thanks > Russell > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Mar 2 22:38:17 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 2 Mar 2021 22:38:17 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon ?On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ckerner at illinois.edu Tue Mar 2 22:59:01 2021 From: ckerner at illinois.edu (Kerner, Chad A) Date: Tue, 2 Mar 2021 22:59:01 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> References: <3E66A36A-67A1-49BC-B374-2A7F1EF123F6@bham.ac.uk> Message-ID: <52196DB3-E8D3-47F7-92F6-3A123B46F615@illinois.edu> We have a similar process. One of our customers has a web app that their managers use to provision spaces. That web app drops a json file into a specific location and a cron job kicks off a python script every so often to process the files and provision the space(fileset creation, link, quota, owner, group, perms, etc). Failures are queued and a jira ticket opened. Successes update the database for the web app. They are not requiring instant processing, so we process hourly on the back end side of things. Chad -- Chad Kerner, Senior Storage Engineer Storage Enabling Technologies National Center for Supercomputing Applications University of Illinois, Urbana-Champaign ?On 3/2/21, 4:38 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Not quite user self-service .... But we have some web tooling for project registration that pushes sanitised messages onto a redis (rq) backed message bus which then does "stuff". For example create and populate groups in AD and LDAP. Create and link a fileset, set quota etc ... Our consumer code is all built to be tolerant to running it a second time safely and has quite a bit of internal locking to prevent multiple instances running at the same time (though we have multiple consumer entities to handle fail-over). The fault tolerant thing is quite important as create a fileset can fail for a number of reasons (e.g. restripefs running), so we can always just requeue the requests again. Simon On 02/03/2021, 19:40, "gpfsug-discuss-bounces at spectrumscale.org on behalf of russell at nordquist.info" wrote: Hi all We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. thanks Russell _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!DZ3fjg!uQVokpQk0pPyjpae7a_Aui1wGk3k7xJzIxzX1DBNfOyNOfzZeJFUjVOqN3OVEyVqdw$ From tortay at cc.in2p3.fr Wed Mar 3 08:06:37 2021 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 3 Mar 2021 09:06:37 +0100 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: Message-ID: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> On 02/03/2021 20:31, Russell Nordquist wrote: > Hi all > > We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). Delegation authorization (identifying "power-users") is external to the tool. Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From russell at nordquist.info Wed Mar 3 17:14:37 2021 From: russell at nordquist.info (Russell Nordquist) Date: Wed, 3 Mar 2021 12:14:37 -0500 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. What I would want is to be able to grant the the following calls + maybe a few more. The related REST API calls. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesets.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_apiv2postfilesystemfilesetlink.htm Russell > On Mar 3, 2021, at 3:06 AM, Loic Tortay wrote: > > On 02/03/2021 20:31, Russell Nordquist wrote: >> Hi all >> We are trying to use filesets quite a bit, but it?s a hassle that only the admins can create them. To the users it?s just a directory so it slows things down. Has anyone deployed a self service model for creating filesets? Maybe using the API? This feels like shared pain that someone has already worked on?. > Hello, > We have a quota management delegation (CLI) tool that allows "power-users" (PI and such) to create and remove filesets and manage users quotas for the groups/projects they're heading. > > Like someone else said, from their point of view they're just directories, so they create a "directory with quotas". > In our experience, "directories with quotas" are the most convenient way for end-users to understand and use quotas. > > This is a tool written in C, about 13 years ago, using the GPFS API (and a few calls to GPFS commands where there is no API or it's lacking). > > Delegation authorization (identifying "power-users") is external to the tool. > > Permissions & ACLs are also set on the junction when a fileset is created so that it's both immediately usable ("instant processing") and accessible to "power-users" (for space management purposes). > > There are extra features for staff to allow higher-level operations (e.g. create an independent fileset for a group/project, change the group/project quotas, etc.) > > The dated looking user documentation is https://ccspsmon.in2p3.fr/spsquota.html > > Both the tool and the documentation have a few site-specific things, so it's not open-source (and it has become a "legacy" tool in need of a rewrite/refactoring). > > > Lo?c. > -- > | Lo?c Tortay - IN2P3 Computing Centre | > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Thu Mar 4 09:51:45 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 4 Mar 2021 09:51:45 +0000 Subject: [gpfsug-discuss] Self service creation of filesets In-Reply-To: References: <2c6a8d32-26ee-67e0-9d29-48cde6ed312c@cc.in2p3.fr> Message-ID: <566f81f3bfd243f1b0258562b627e4e1b6869863.camel@icr.ac.uk> On Wed, 2021-03-03 at 12:14 -0500, Russell Nordquist wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Sounds like I am not the only one that needs this. The REST API has everything needed to do this, but the problem is we can?t restrict the GUI role account to just the commands they need. They need ?storage administrator? access which means the could also make/delete filesystems. I guess you could use sudo and wrap the CLI, but I am told that?s old fashioned :) Too bad we can?t make a API role with specific POST commands tied to it. I am surprised there is no RFE for that yet. The closest I see is http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=148244 am I missing something. That reminds me... We use a Python wrapper around the REST API to monitor usage against fileset quotas etc. In principle this will also set quotas (and create filesets) but it means giving it storage administrator access. It would be nice if the GUI had sufficiently fine grained permissions that you could set quotas without being able to delete the filesystem. Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 10:04:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 10:04:22 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's Message-ID: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> I am seeing that whenever I try and restore a file with an ACL I get the a ANS1589W error in /var/log/dsmerror.log ANS1589W Unable to write extended attributes for ****** due to errno: 13, reason: Permission denied But bizarrely the ACL is actually restored. At least as far as I can tell. This is the 8.1.11-0 TSM client with GPFS version 5.0.5-1 against a 8.1.10-0 TSM server. Running on RHEL 7.7 to match the DSS-G 2.7b install. The backup node makes the third quorum node for the cluster being as that it runs genuine RHEL (unlike all the compute nodes which are running CentOS). Googling I can't find any references to this being fixed in a later version of the GPFS software, though being on RHEL7 and it's derivatives I am stuck on 5.0.5 Surely root has permissions to write the extended attributes for anyone? It would seem perverse if you have to be the owner of a file to restore the ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Fri Mar 5 12:15:38 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 12:15:38 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 13:07:56 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 13:07:56 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem.? There recently was an issue with Protect and how it used the > GPFS API for ACLs.? If I recall Protect was not properly handling a > return code.? I do not know if it is relevant to your problem but? it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Renar.Grunenberg at huk-coburg.de Fri Mar 5 18:06:43 2021 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 5 Mar 2021 18:06:43 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> Hallo All, thge mentioned problem with protect was this: https://www.ibm.com/support/pages/node/6415985?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder, Sarah R?ssler, Thomas Sehn, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= -----Urspr?ngliche Nachricht----- Von: gpfsug-discuss-bounces at spectrumscale.org Im Auftrag von Jonathan Buzzard Gesendet: Freitag, 5. M?rz 2021 14:08 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] TSM errors restoring files with ACL's On 05/03/2021 12:15, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Have you checked to see if Spectrum Protect (TSM) has addressed this > problem. There recently was an issue with Protect and how it used the > GPFS API for ACLs. If I recall Protect was not properly handling a > return code. I do not know if it is relevant to your problem but it > seemed worth mentioning. As far as I am aware 8.1.11.0 is the most recent version of the Spectrum Protect/TSM client. There is nothing newer showing on the IBM FTP site ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/ Checking on fix central also seems to show that 8.1.11.0 is the latest version, and the only fix over 8.1.10.0 is a security update to do with the client web user interface. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Fri Mar 5 19:12:47 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Fri, 5 Mar 2021 19:12:47 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de>, <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Mar 5 20:31:54 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 5 Mar 2021 20:31:54 +0000 Subject: [gpfsug-discuss] TSM errors restoring files with ACL's In-Reply-To: References: <1abffe1a35c54075bd662d6da217a9f0@huk-coburg.de> <1b0792f3-bd28-90a1-5446-16647f57e941@strath.ac.uk> Message-ID: <696e96cc-da52-a24f-d53e-6510407e51e7@strath.ac.uk> On 05/03/2021 19:12, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > I was referring to this flash, > https://www.ibm.com/support/pages/node/6381354?myns=swgtiv&mynp=OCSSEQVQ&mync=E&cm_sp=swgtiv-_-OCSSEQVQ-_-E > > > Spectrum Protect 8.1.11 client has the fix so this should not be an > issue for Jonathan.? Probably best to open a help case against Spectrum > Protect and begin the investigation there. > Also the fix is to stop an unchanged file with an ACL from being backed up again, but only one more time. I suspect we where hit with that issue, but given we only have ~90GB of files with ACL's on them I would not have noticed. That is significantly less than the normal daily churn. This however is an issue with the *restore*. Everything looks to get restored correctly even the ACL's. At the end of the restore all looks good given the headline report from dsmc. However there are ANS1589W warnings in dsmerror.log and dsmc exits with an error code of 8 rather than zero. Will open a case against Spectrum Protect on Monday. I am pretty confident the warnings are false. The current plan is to do carefully curated hand restores of the three afflicted users when the rest of the restore if finished to double check the ACL's are the only issue. Quite how the Spectrum Protect team have missed this bug is beyond me. Do they not have some unit tests to check this stuff before pushing out updates. I know in the past it worked, though that was many years ago now. However I restored many TB of data from backup with ACL's on them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Robert.Oesterlin at nuance.com Mon Mar 8 14:49:59 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 14:49:59 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 15:29:42 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 15:29:42 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> References: <3053B07C-5509-4610-BF45-E04E3F54C7D7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Mar 8 15:34:21 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 8 Mar 2021 15:34:21 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? Message-ID: Well - the case here is that the file system has, let?s say, 100M files. Some percentage of these are sym-links to a location that?s not in this file system. I want a report of all these off file system links. However, not all of the sym-links off file system are of interest, just some of them. I can?t say for sure where in the file system they are (and I don?t care). Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Frederick Stock Reply-To: gpfsug main discussion list Date: Monday, March 8, 2021 at 9:29 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Policy scan of symbolic links with contents? CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ Could you use the PATHNAME LIKE statement to limit the location to the files of interest? Fred _______________________________________________________ Fred Stock | Spectrum Scale Development Advocacy | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "Oesterlin, Robert" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Policy scan of symbolic links with contents? Date: Mon, Mar 8, 2021 10:12 AM Looking to craft a policy scan that pulls out symbolic links to a particular destination. For instance: file1.py -> /fs1/patha/pathb/file1.py (I want to include these) file2.py -> /fs2/patha/pathb/file2.py (exclude these) The easy way would be to pull out all sym-links and just grep for the ones I want but was hoping for a more elegant solution? Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=i6m1zVXf4peZo0yo02IiRaQ_pUX95MN3wU53M0NiWcI&s=z-ibh2kAPHbehAsrGavNIg2AJdXmHkpUwy5YhZfUbpc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Mar 8 16:07:48 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 8 Mar 2021 16:07:48 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Mon Mar 8 20:45:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 20:45:05 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 16:07, Frederick Stock wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > Presumably the only feature that would help here is if policy could > determine that the end location pointed to by a symbolic link is within > the current file system.? I am not aware of any such feature or > attribute which policy could check so I think all you can do is run > policy to find the symbolic links and then check each link to see if it > points into the same file system.? You might find the mmfind command > useful for this purpose.? I expect it would eliminate the need to create > a policy to find the symbolic links. > Unless you are using bind mounts if the symbolic link points outside the mount point of the file system it is not within the current file system. So noting that you can write very SQL like statements something like the following should in theory do it RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' Note the above is not checked in any way shape or form for working. Even if you do have bind mounts of other GPFS file systems you just need a more complicated WHERE statement. When doing policy engine stuff I find having that section of the GPFS manual printed out and bound, along with an SQL book for reference is very helpful. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Mon Mar 8 21:00:04 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 8 Mar 2021 21:00:04 +0000 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: On 08/03/2021 20:45, Jonathan Buzzard wrote: [SNIP] > So noting that you can write very SQL like statements something like the > following should in theory do it > > RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND > SUBSTR(PATH_NAME,0,4)='/fs1/' > > Note the above is not checked in any way shape or form for working. Even > if you do have bind mounts of other GPFS file systems you just need a > more complicated WHERE statement. Duh, of course as soon as I sent it, I realized there is a missing SHOW RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' You could replace the SUBSTR with a REGEX if you prefer JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From ulmer at ulmer.org Mon Mar 8 22:33:38 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 8 Mar 2021 17:33:38 -0500 Subject: [gpfsug-discuss] Policy scan of symbolic links with contents? In-Reply-To: References: Message-ID: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. > On Mar 8, 2021, at 3:34 PM, Jonathan Buzzard wrote: > > ?On 08/03/2021 20:45, Jonathan Buzzard wrote: > > [SNIP] > >> So noting that you can write very SQL like statements something like the >> following should in theory do it >> RULE finddangling LIST dangle WHERE MISC_ATTRIBUTES='L' AND >> SUBSTR(PATH_NAME,0,4)='/fs1/' >> Note the above is not checked in any way shape or form for working. Even >> if you do have bind mounts of other GPFS file systems you just need a >> more complicated WHERE statement. > > Duh, of course as soon as I sent it, I realized there is a missing SHOW > > RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' > > You could replace the SUBSTR with a REGEX if you prefer > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Tue Mar 9 12:25:56 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 9 Mar 2021 12:25:56 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Policy scan of symbolic links with contents? In-Reply-To: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> References: <23501705-F0E8-4B5D-B5D3-9D42D71FAA82@ulmer.org> Message-ID: <3B0AD02E-335F-4540-B109-EC5301C3188A@nuance.com> RULE finddangling LIST dangle SHOW(PATH_NAME) WHERE MISC_ATTRIBUTES='L' AND SUBSTR(PATH_NAME,0,4)='/fs1/' In this case PATH_NAME is the path within the GPFS file system, not the target of the link, correct? That's not what I want. I want the path of the *link target*. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 3/8/21, 4:41 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Stephen Ulmer" wrote: CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ---------------------------------------------------------------------- Does that check the target of the symlink, or the path to the link itself? I think the OP was checking the target (or I misunderstood). From bill.burke.860 at gmail.com Wed Mar 10 02:19:02 2021 From: bill.burke.860 at gmail.com (William Burke) Date: Tue, 9 Mar 2021 21:19:02 -0500 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Wed Mar 10 02:21:54 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 10 Mar 2021 02:21:54 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From anacreo at gmail.com Wed Mar 10 02:59:18 2021 From: anacreo at gmail.com (Alec) Date: Tue, 9 Mar 2021 18:59:18 -0800 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: You would definitely be able to search by inode creation date and find the files you want... our 1.25m file filesystem takes about 47 seconds to query... One thing I would worry about though is inode deletion and inter-fileset file moves. The SQL based engine wouldn't be able to identify those changes and so you'd not be able to replicate deletes and such. Alternatively.... I have a script that runs in about 4 minutes and it pulls all the data out of the backup indexes, and compares the pre-built hourly file index on our system and identifies files that don't exist in the backup, so I have a daily backup validation... I filter the file list using ksh's printf date manipulation to filter out files that are less than 2 days old, to reduce the noise. A modification to this could simply compare a daily file index with the previous day's index, and send rsync a list of files (existing or deleted) based on just a delta of the two indexes (sort|diff), then you could properly account for all the changes. If you don't care about file modifications just produce both lists based on creation time instead of modification time. The mmfind command or GPFS policy engine should be able to produce a full file list/index very rapidly. In another thread there was a conversation with ACL's... I don't think our backup system backs up ACL's so I just have GPFS produce a list of all ACL applied objects on the daily, and have a script that just makes a null delimited backup file of every single ACL on our file system... and have a script to apply the ACL's as a "restore". It's a pretty simple thing to write-up and keeping 90 day history on this lets me compare the ACL evolution on a file very easily. Alec MVH Most Victorious Hunting (Why should Scandinavians own this cool sign off) On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski wrote: > Yup, you want to use the policy engine: > > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m > reluctant to provide examples as I?m actually suspicious that we don?t have > it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > > On Mar 9, 2021, at 9:19 PM, William Burke > wrote: > > > > I would like to know what files were modified/created/deleted (only for > the current day) on the GPFS's file system so that I could rsync ONLY those > files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not > have to traverse the filesystem looking for these files? If i use the rsync > tool it will scan the file system which is 400+ million files. Obviously > this will be problematic to complete a scan in a day, if it would ever > complete single-threaded. There are tools or scripts that run multithreaded > rsync but it's still a brute force attempt. and it would be nice to know > where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not > sure if this is the best approach to looking at the GPFS metadata - inodes, > modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Mar 10 15:15:58 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 10 Mar 2021 15:15:58 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: <641ea714-579b-1d74-4b86-d0e0b2e8e9c3@strath.ac.uk> On 10/03/2021 02:59, Alec wrote: > CAUTION: This email originated outside the University. Check before > clicking links or attachments. > You would definitely be able to search by inode creation date and find > the files you want... our 1.25m file filesystem takes about 47 seconds > to query...? One thing I would worry about though is inode deletion and > inter-fileset file moves.? ?The SQL based engine wouldn't be able to > identify those changes and so you'd not be able to replicate deletes and > such. > This is the problem with rsync "backups", you need to run it with --delete otherwise any restore will "upset" your users as they find large numbers of file they had deleted unhelpfully "restored" > Alternatively.... > I have a script that runs in about 4 minutes and it pulls all the data > out of the backup indexes, and compares the pre-built hourly file index > on our system and identifies files that don't exist in the backup, so I > have a daily backup validation...? I filter the file list using > ksh's?printf date manipulation to filter out files that are less than 2 > days old, to reduce the noise.? A modification to this could simply > compare a daily file index with the previous day's index, and send rsync > a list of files (existing or deleted) based on just a delta of the two > indexes (sort|diff), then you could properly account for all the > changes.? If you don't care about file modifications just produce both > lists based on creation time instead of modification time.? The mmfind > command or GPFS policy engine should be able to produce a full file > list/index very rapidly. > My view would be somewhere along the lines of this is a lot of work and if you have the space to rsync your GPFS file system to, presumably with a server attached to said storage then for under 500 PVU of Spectrum Protect licensing you can have a fully supported client/server Spectrum Protect/TSM backup solution and just use mmbackup. You need to play the game and use older hardware ;-) I use an ancient pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be quad core and 70 PVU per core which would cost an additional ~$1000 in licensing. If it breaks down they are under $100 on eBay. It's never skipped a beat and I have just finished a complete planned restore of our DSS-G using it. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Mar 10 19:09:13 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Mar 2021 19:09:13 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> References: <05584755-C218-46D9-93C4-0373C2045589@rutgers.edu> Message-ID: I was looking for the original source for this, but it was on dev works ... which is now dead. But you can use something like: tsbuhelper clustermigdiff \ $migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \ $migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist "mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today "mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented... We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses: RULE EXTERNAL LIST 'latest.filelist' EXEC '' \ RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \ SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \ VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \ ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \ WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \ ELSE 'resdnt' END )) \ WHERE \ ( \ NOT \ ( (PATH_NAME LIKE '/%/.mmbackup%') OR \ (PATH_NAME LIKE '/%/.mmmigrate%') OR \ (PATH_NAME LIKE '/%/.afm%') OR \ (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \ (PATH_NAME LIKE '/%/.mmLockDir/%') OR \ (MODE LIKE 's%') \ ) \ ) \ AND \ (MISC_ATTRIBUTES LIKE '%u%') \ AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \ AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%')) On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files. This comes with no warranty ... We don't use this for backup, Spectrum Protect and mmbackup are our friends ... Simon ?On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" wrote: Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From enrico.tagliavini at fmi.ch Thu Mar 11 09:22:46 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 09:22:46 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync References: <8d58f5c6c8ee4f44a5e09c4f9e3a6dac@ex2013mbx2.fmi.ch> Message-ID: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org? On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > > > \\UTGERS,?? |---------------------------*O*--------------------------- > > > _// the State |???????? Ryan Novosielski - novosirj at rutgers.edu > > > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > > > ?\\??? of NJ | Office of Advanced Research Computing - MSB C630, Newark > ???? `' > > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > > > ?I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > > i use the rsync tool it will scan the file system which is 400+ million files.? Obviously this will be problematic to complete a > > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > > brute force attempt. and it would be nice to know where the delta of files that have changed. > > > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > > metadata - inodes, modify times, creation times, etc. > > > > > > > > -- > > > > Best Regards, > > > > William Burke (he/him) > > Lead HPC Engineer > > Advance Research Computing > > 860.255.8832 m | LinkedIn > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu Mar 11 13:17:30 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:17:30 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> Message-ID: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >> Sent: Wednesday, March 10, 2021 3:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >> >> Yup, you want to use the policy engine: >> >> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >> >> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >> don?t have it quite right and are passing far too much stuff to rsync). >> >> -- >> #BlackLivesMatter >> ____ >>>> \\UTGERS, |---------------------------*O*--------------------------- >>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >> `' >> >>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>> >>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>> >>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>> >>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>> metadata - inodes, modify times, creation times, etc. >>> >>> >>> >>> -- >>> >>> Best Regards, >>> >>> William Burke (he/him) >>> Lead HPC Engineer >>> Advance Research Computing >>> 860.255.8832 m | LinkedIn >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From enrico.tagliavini at fmi.ch Thu Mar 11 13:24:47 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 13:24:47 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> References: <96bae625e90c9d92ddfd335ea429fae8c0601bb6.camel@fmi.ch> <9268BFE4-27BA-4D71-84C7-834250A552D2@ulmer.org> Message-ID: Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Mar 11 13:47:44 2021 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 11 Mar 2021 08:47:44 -0500 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > >> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: >> I?m going to ask what may be a dumb question: >> >> Given that you have GPFS on both ends, what made you decide to NOT use AFM? >> >> -- >> Stephen >> >> >>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: >>> >>> ?Hello William, >>> >>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents. >>> >>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is >>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example >>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. >>> >>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me >>> enough not to go that route. >>> >>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just >>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which >>> the ctime changes in the last couple of days (to update metadata info). >>> >>> Good luck. >>> Kind regards. >>> >>> -- >>> >>> Enrico Tagliavini >>> Systems / Software Engineer >>> >>> enrico.tagliavini at fmi.ch >>> >>> Friedrich Miescher Institute for Biomedical Research >>> Infomatics >>> >>> Maulbeerstrasse 66 >>> 4058 Basel >>> Switzerland >>> >>> >>> >>> >>> -------- Forwarded Message -------- >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski >>>> Sent: Wednesday, March 10, 2021 3:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync >>>> >>>> Yup, you want to use the policy engine: >>>> >>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm >>>> >>>> Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we >>>> don?t have it quite right and are passing far too much stuff to rsync). >>>> >>>> -- >>>> #BlackLivesMatter >>>> ____ >>>>>> \\UTGERS, |---------------------------*O*--------------------------- >>>>>> _// the State | Ryan Novosielski - novosirj at rutgers.edu >>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus >>>>>> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark >>>> `' >>>> >>>>>> On Mar 9, 2021, at 9:19 PM, William Burke wrote: >>>>> >>>>> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I >>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 >>>>> >>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If >>>>> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a >>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a >>>>> brute force attempt. and it would be nice to know where the delta of files that have changed. >>>>> >>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS >>>>> metadata - inodes, modify times, creation times, etc. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best Regards, >>>>> >>>>> William Burke (he/him) >>>>> Lead HPC Engineer >>>>> Advance Research Computing >>>>> 860.255.8832 m | LinkedIn >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Mar 11 14:20:05 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 11 Mar 2021 14:20:05 +0000 Subject: [gpfsug-discuss] Synchronization/Restore of file systems Message-ID: As promised last year I having just completed a storage upgrade, I have sanitized my scripts and put them up on Github for other people to have a look at the methodology I use in these sorts of scenarios. This time the upgrade involved pulling out all the existing disks and fitting large ones then restoring from backup, rather than synchronizing to a new system, but the principles are the same. Bear in mind the code is written in Perl because it's history is ancient now and with few opportunities to test it in anger, rewriting it in the latest fashionable scripting language is unappealing. https://github.com/digitalcabbage/syncrestore JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From enrico.tagliavini at fmi.ch Thu Mar 11 14:24:43 2021 From: enrico.tagliavini at fmi.ch (Tagliavini, Enrico) Date: Thu, 11 Mar 2021 14:24:43 +0000 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM. The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM. The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling. As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage. And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote: Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sadaniel at us.ibm.com Thu Mar 11 16:08:11 2021 From: sadaniel at us.ibm.com (Steven Daniels) Date: Thu, 11 Mar 2021 09:08:11 -0700 Subject: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync In-Reply-To: References: Message-ID: Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com http://www.ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From novosirj at rutgers.edu Thu Mar 11 16:28:57 2021 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 11 Mar 2021 16:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync In-Reply-To: References: Message-ID: <1298DFDD-9701-4FE4-9B06-1541455E0F52@rutgers.edu> Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 11, 2021, at 11:08 AM, Steven Daniels wrote: > > Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. > > I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. > > The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. > > I'll leave it to Venkat and others on the development team to share more details about improvements. > > > Steven A. Daniels > Cross-brand Client Architect > Senior Certified IT Specialist > National Programs > Fax and Voice: 3038101229 > sadaniel at us.ibm.com > http://www.ibm.com > <1A816397.jpg> > > Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting y > > From: Stephen Ulmer > To: gpfsug main discussion list > Cc: bill.burke.860 at gmail.com > Date: 03/11/2021 06:47 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > Thank you! Would you mind letting me know in what era you made your evaluation? > > I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. > > Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. > > Your original post was very thoughtful, and I appreciate your time. > > -- > Stephen > > On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: > > ? > Hello Stephen, > > actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. > > The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. > > Kind regards. > > -- > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > > On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: > I?m going to ask what may be a dumb question: > > Given that you have GPFS on both ends, what made you decide to NOT use AFM? > > -- > Stephen > > > On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: > > ?Hello William, > > I've got your email forwarded my another user and I decided to subscribe to give you my two cents. > > I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is > easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example > if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. > > DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me > enough not to go that route. > > What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just > build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which > the ctime changes in the last couple of days (to update metadata info). > > Good luck. > Kind regards. > > -- > > Enrico Tagliavini > Systems / Software Engineer > > enrico.tagliavini at fmi.ch > > Friedrich Miescher Institute for Biomedical Research > Infomatics > > Maulbeerstrasse 66 > 4058 Basel > Switzerland > > > > > -------- Forwarded Message -------- > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski > Sent: Wednesday, March 10, 2021 3:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync > > Yup, you want to use the policy engine: > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm > > Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we > don?t have it quite right and are passing far too much stuff to rsync). > > -- > #BlackLivesMatter > ____ > \\UTGERS, |---------------------------*O*--------------------------- > _// the State | Ryan Novosielski - novosirj at rutgers.edu > \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > On Mar 9, 2021, at 9:19 PM, William Burke wrote: > > I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I > could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 > > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If > i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a > scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a > brute force attempt. and it would be nice to know where the delta of files that have changed. > > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS > metadata - inodes, modify times, creation times, etc. > > > > -- > > Best Regards, > > William Burke (he/him) > Lead HPC Engineer > Advance Research Computing > 860.255.8832 m | LinkedIn > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From honwai.leong at sydney.edu.au Thu Mar 11 22:28:57 2021 From: honwai.leong at sydney.edu.au (Honwai Leong) Date: Thu, 11 Mar 2021 22:28:57 +0000 Subject: [gpfsug-discuss] Backing up GPFS with Rsync Message-ID: This paper might provide some ideas, not the best solution but works fine https://github.com/HPCSYSPROS/Workshop20/blob/master/Parallelized_data_replication_of_multi-petabyte_storage_systems/ws_hpcsysp103s1-file1.pdf It is a two-part workflow to replicate files from production to DR site. It leverages on snapshot ID to determine which files have been updated/modified after a snapshot was taken. It doesn't take care of deletion of files moved from one directory to another, so it uses dsync to take care of that part. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Friday, March 12, 2021 3:08 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 20 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Fwd: FW: Backing up GPFS with Rsync (Steven Daniels) ---------------------------------------------------------------------- Message: 1 Date: Thu, 11 Mar 2021 09:08:11 -0700 From: "Steven Daniels" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org, bill.burke.860 at gmail.com Subject: Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Message-ID: Content-Type: text/plain; charset="utf-8" Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly. The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. I'll leave it to Venkat and others on the development team to share more details about improvements. Steven A. Daniels Cross-brand Client Architect Senior Certified IT Specialist National Programs Fax and Voice: 3038101229 sadaniel at us.ibm.com https://protect-au.mimecast.com/s/ZnryCr81nyt88D8ZkuztwY-?domain=ibm.com From: Stephen Ulmer To: gpfsug main discussion list Cc: bill.burke.860 at gmail.com Date: 03/11/2021 06:47 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync Sent by: gpfsug-discuss-bounces at spectrumscale.org Thank you! Would you mind letting me know in what era you made your evaluation? I?m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product. Different clients have different requirements, so every implementation could be different. When I add someone else?s judgement to my own, I just like getting as close to their actual evaluation scenario as possible. Your original post was very thoughtful, and I appreciate your time. -- Stephen On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico wrote: ? Hello Stephen, actually not a dumb question at all. We evaluated AFM quite a bit before turning it down. The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote: I?m going to ask what may be a dumb question: Given that you have GPFS on both ends, what made you decide to NOT use AFM? -- Stephen On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico wrote: ?Hello William, I've got your email forwarded my another user and I decided to subscribe to give you my two cents. I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated. DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me enough not to go that route. What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which the ctime changes in the last couple of days (to update metadata info). Good luck. Kind regards. -- Enrico Tagliavini Systems / Software Engineer enrico.tagliavini at fmi.ch Friedrich Miescher Institute for Biomedical Research Infomatics Maulbeerstrasse 66 4058 Basel Switzerland -------- Forwarded Message -------- -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Wednesday, March 10, 2021 3:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync Yup, you want to use the policy engine: https://protect-au.mimecast.com/s/5FXFCvl1rKi77y78YhzCNU5?domain=ibm.com Something in here ought to help. We do something like this (but I?m reluctant to provide examples as I?m actually suspicious that we don?t have it quite right and are passing far too much stuff to rsync). -- #BlackLivesMatter ____ \\UTGERS, |---------------------------*O*--------------------------- _// the State | Ryan Novosielski - novosirj at rutgers.edu \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Mar 9, 2021, at 9:19 PM, William Burke wrote: I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9 Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed. I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc. -- Best Regards, William Burke (he/him) Lead HPC Engineer Advance Research Computing 860.255.8832 m | LinkedIn _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/uNqKCwV1vMfGGRGxqcKIIVS?domain=urldefense.proofpoint.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A816397.jpg Type: image/jpeg Size: 4919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://protect-au.mimecast.com/s/NW07Cq71mwf8878NEuZEWhS?domain=gpfsug.org End of gpfsug-discuss Digest, Vol 110, Issue 20 *********************************************** From juergen.hannappel at desy.de Mon Mar 15 16:20:51 2021 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Mon, 15 Mar 2021 17:20:51 +0100 (CET) Subject: [gpfsug-discuss] Detecting open files Message-ID: <1985303510.24419797.1615825251660.JavaMail.zimbra@desy.de> Hi, when unlinking filesets that sometimes fails because some open files on that fileset still exist. Is there a way to find which files are open, and from which node? Without running a mmdsh -N all lsof on serveral (big) remote clusters, that is. -- Dr. J?rgen Hannappel DESY/IT Tel. : +49 40 8998-4616 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1711 bytes Desc: S/MIME Cryptographic Signature URL: From Robert.Oesterlin at nuance.com Wed Mar 17 11:59:57 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 11:59:57 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Anyone run into this error from the GUI task ?FILESYSTEM_MOUNT? or ideas on how to fix it? Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 07:55:14.051000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch.,Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg5_tools','ems1-hs','RO','2021-03-17 07:55:15.686000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg5_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 14:18:56 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 14:18:56 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898090.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898091.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898092.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 14:30:36 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 14:30:36 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: Can you give me details on how to do this? I tried this: [root at ess1ems ~]# su postgres -c 'psql -d postgres -c "delete from fscc.filesystem_mounts"' could not change directory to "/root" psql: FATAL: Peer authentication failed for user "postgres" Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 9:19 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ This is strange, the Java code should only try to insert rows that are not already there. If it was just the insert for the duplicate row we could ignore it. But this is a batch insert failing and therefore the FILESYSTEM_MOUNTS table does not get updated anymore. A quick fix is to launch the psql client and do a "delete from fscc.filesystem_mounts" to clear the table and run the FILESYSTEM_MOUNT task afterwards to repopulate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 15:09:51 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 15:09:51 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898093.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898094.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898095.png Type: image/png Size: 1172 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Wed Mar 17 15:33:54 2021 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 Mar 2021 15:33:54 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint Message-ID: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> The command completed, and I re-ran the FILESYSTEM_MOUNT, but it failed the same way. [root at ess1ems ~]# psql postgres postgres -c "delete from fscc.filesystem_mounts" DELETE 20 /usr/lpp/mmfs/gui/cli/runtask FILESYSTEM_MOUNT -debug 10:32 AM Operation Failed 10:32 AM Error: debug: locale=en_US debug: Running 'mmlsmount 'fs1' -Y ' on node localhost debug: Running 'mmlsmount 'fs2' -Y ' on node localhost debug: Running 'mmlsmount 'fs3' -Y ' on node localhost debug: Running 'mmlsmount 'fs4' -Y ' on node localhost debug: Running 'mmlsmount 'nrg1_tools' -Y ' on node localhost debug: Running 'mmlsmount 'nrg5_tools' -Y ' on node localhost err: java.sql.BatchUpdateException: Batch entry 5 INSERT INTO FSCC.FILESYSTEM_MOUNTS (CLUSTER_ID, DEVICENAME, HOST_NAME, MOUNT_MODE, LAST_UPDATE) VALUES ('16677155164911809171','nrg1_tools','ems1-hs','RO','2021-03-17 11:32:38.830000-04'::timestamp) was aborted: ERROR: duplicate key value violates unique constraint "filesystem_mounts_pk" Detail: Key (host_name, cluster_id, devicename)=(ems1-hs, 16677155164911809171, nrg1_tools) already exists. Call getNextException to see other errors in the batch. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Alexander Wolf Reply-To: gpfsug main discussion list Date: Wednesday, March 17, 2021 at 10:10 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint CAUTION: This Email is from an EXTERNAL source. Ensure you trust this sender before clicking on any links or attachments. ________________________________ I think psql postgres postgres -c "delete from fscc.filesystem_mounts"' ran as root should do the trick. Mit freundlichen Gr??en / Kind regards [cid:image001.png at 01D71B19.07732D00] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1135 bytes Desc: image001.png URL: From A.Wolf-Reber at de.ibm.com Wed Mar 17 17:05:11 2021 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 17 Mar 2021 17:05:11 +0000 Subject: [gpfsug-discuss] GUI: FILESYSTEM_MOUNT returns ERROR: duplicate key value violates unique constraint In-Reply-To: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> References: <1EBD5AE9-C2D0-460D-A030-6B8CFD1253E2@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898096.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898097.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D71B19.07732D00.png Type: image/png Size: 1135 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.16159683898098.png Type: image/png Size: 1172 bytes Desc: not available URL: From robert.horton at icr.ac.uk Thu Mar 18 15:47:07 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Thu, 18 Mar 2021 15:47:07 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Message-ID: Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 06:32:00 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 12:02:00 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Mar 19 09:42:22 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 19 Mar 2021 09:42:22 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Mar 19 09:50:04 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 19 Mar 2021 15:20:04 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> References: <459e3051d6fb702682cdf0fc43c814591b9d85a3.camel@icr.ac.uk> Message-ID: Hi Robert, So you might have started seeing problem after upgrading the gateway nodes to 5.0.5.2. Upgrading gateway nodes at cache cluster to 5.0.5.6 would resolve this problem. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/19/2021 03:13 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Venkat, Thanks for getting back to me. On the cache side we're running 5.0.4-3 on the nsd servers and 5.0.5-2 everywhere else, including gateway nodes. The home cluster is 4.2.3-22 - unfortunately we're stuck on 4.x due to the licensing but we're in the process of replacing that system. The actual AFM seems to be behaving fine though so I'm not sure that's our issue. I guess our next job is to see if we can reproduce it in a non-AFM fileset. Rob On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: CAUTION: This email originated from outside of the ICR. Do not click links or open attachments unless you recognize the sender's email address and know the content is safe. Robert, What is the scale version ? This issue may be related to these alerts. https://www.ibm.com/support/pages/node/6355983 https://www.ibm.com/support/pages/node/6380740 These are the recommended steps to resolve the issue, but need more details on the scale version. 1. Stop all AFM filesets at cache using "mmafmctl device stop -j fileset" command. 2. Perform rolling upgrade parallely at both cache and home clusters a. All nodes on home cluster to 5.0.5.6 b. All gateway nodes in cache cluster to 5.0.5.6 3. At home cluster, for each fileset target path, repeat below steps a. Remove .afmctl file mmafmlocal rm /.afm/.afmctl b. Enable AFM mmafmconfig enable 4. Start all AFM filesets at cache using "mmafmctl device start -j fileset" command. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/18/2021 09:17 PM Subject: [EXTERNAL] [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We've recently started having an issue where processes running in a singularity container get stuck in a soft lockup and eventually the node needs to be forcibly rebooted. I have included a sample call trace below. Additionally, other (non-singularity) processes on other nodes accessing the same fileset seem to get into the same state. It's also an AFM IW fileset just to add to the complexity ;) Does anyone have any thoughts on what might be happening / how to proceed? I'm not really sure if it's a GPFS issue or a Singularity / Kernel issue - although fact it seems to spread to other nodes would seem to suggest some GPFS involvement. It's possible the user is doing something inadvisable with Singularity (it's difficult to work out what's happening in the Nextflow pipeline) but even if they are it would be good to find a way of preventing them taking nodes down. I'm assuming the AFM is unlikely to be relevant - any views on that? Thanks, Rob Call Trace: ? _Z11kSFSGetattrP15KernelOperationP13gpfsVfsData_tP10gpfsNode_tiP10cxiVattr_tP12gpfs_iattr64+0x1e4/0x5d0 [mmfs26] _ZL17refreshCacheAttrsP13gpfsVfsData_tP15KernelOperationP9cxiNode_tP10pcacheAttriPcj+0x441/0x450 [mmfs26] _Z21pcacheHandleCollisionP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tS4_PcPvP9MMFSVInfoiP10pcacheAttriS5_10PcacheModej+0xa21/0x11b0 [mmfs26] ? _ZN6ThCond6signalEv+0x82/0x190 [mmfs26] ? _ZN10MemoryPool6shFreeEPv9MallocUse+0x1a5/0x2a0 [mmfs26] ? _ZL14kSFSPcacheSendP13gpfsVfsData_tP15KernelOperation7FileUIDS3_PciiPPv+0x387/0x570 [mmfs26] ? _ZL17pcacheNeedRefresh10PcacheModejlijj+0x206/0x230 [mmfs26] _Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1dcf/0x25c0 [mmfs26] ? _Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26] _Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0 [mmfs26] gpfs_i_lookup+0x189/0x3f0 [mmfslinux] ? _Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6e0/0x6e0 [mmfs26] ? d_alloc_parallel+0x99/0x4a0 ? _Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26] __lookup_slow+0x97/0x150 lookup_slow+0x35/0x50 walk_component+0x1bf/0x330 ? _ZL12gpfsGetattrxP13gpfsVfsData_tP9cxiNode_tP10cxiVattr_tP12gpfs_iattr64i+0x147/0x390 [mmfs26] path_lookupat.isra.49+0x75/0x200 filename_lookup.part.63+0xa0/0x170 ? strncpy_from_user+0x4f/0x1b0 vfs_statx+0x73/0xe0 __do_sys_newlstat+0x39/0x70 ? syscall_trace_enter+0x1d3/0x2c0 ? __audit_syscall_exit+0x249/0x2a0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk| W www.icr.ac.uk| Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=gHmKEtEM3EvdWRefAF0Cs8N2qXPZg5flGutpiJu_bfg&s=dnKFsINgU63_3b-7i3z3uDnxnij6iT-y8L_mmYHr8IE&e= -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=KgYs-kXBKE5JoAaGYRiU9iIxNkJSZeicxpSTmL39_B8&s=6FodZ_EQ8VAOE_xoEkfoUzmJpaiF7bgbERvA9avLZfg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 09:32:10 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 10:32:10 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly Message-ID: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From janfrode at tanso.net Mon Mar 22 09:54:28 2021 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Mon, 22 Mar 2021 10:54:28 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller < u.sibiller at science-computing.de>: > Hello, > > we usually create filesets for project dirs and homes. > > Unfortunately we have discovered that this convention has been ignored for > some dirs and their data > no resides in the root fileset. We would like to move the data to > independent filesets. > > Is there a way to do this without having to schedule a downtime for the > dirs in question? > > I mean, is there a way to transparently move data to an independent > fileset at the same path? > > > Kind regards, > > Ulrich Sibiller > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Mar 22 12:24:59 2021 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Mar 2021 12:24:59 +0000 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: You could maybe create the new file-set, link in a different place, copy the data ? Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially reducing the time to do the copy. Simon From: on behalf of "janfrode at tanso.net" Reply to: "gpfsug-discuss at spectrumscale.org" Date: Monday, 22 March 2021 at 09:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Move data to fileset seamlessly No ? all copying between filesets require full data copy. No simple rename. This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. -jf man. 22. mar. 2021 kl. 10:39 skrev Ulrich Sibiller >: Hello, we usually create filesets for project dirs and homes. Unfortunately we have discovered that this convention has been ignored for some dirs and their data no resides in the root fileset. We would like to move the data to independent filesets. Is there a way to do this without having to schedule a downtime for the dirs in question? I mean, is there a way to transparently move data to an independent fileset at the same path? Kind regards, Ulrich Sibiller -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Mon Mar 22 13:20:46 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:20:46 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: On 22.03.21 13:24, Simon Thompson wrote: > You could maybe create the new file-set, link in a different place, copy the data ? > > Then at somepoint, unlink and relink and resync. Still some user access, but you are potentially > reducing the time to do the copy. Yes, but this does not help if a file is open all the time, e.g. during a long-running job. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Mon Mar 22 13:41:39 2021 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Mon, 22 Mar 2021 14:41:39 +0100 Subject: [gpfsug-discuss] Move data to fileset seamlessly In-Reply-To: References: <653685c0-bbe6-5e3a-b275-ff68217746d7@science-computing.de> Message-ID: <6f626186-cb7a-46d5-781c-8f3a21b7e270@science-computing.de> On 22.03.21 10:54, Jan-Frode Myklebust wrote: > No ? all copying between filesets require full data copy. No simple rename. > > This might be worthy of an RFE, as it?s a bit unexpected, and could potentially work more efficiently.. Yes, your are right. So please vote here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=149429 Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From robert.horton at icr.ac.uk Tue Mar 23 19:02:05 2021 From: robert.horton at icr.ac.uk (Robert Horton) Date: Tue, 23 Mar 2021 19:02:05 +0000 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From vpuvvada at in.ibm.com Wed Mar 24 02:36:31 2021 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Wed, 24 Mar 2021 08:06:31 +0530 Subject: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups In-Reply-To: References: Message-ID: ># mmafmlocal rm /.afm/.afmctl >/bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted This step is only required if home cluster is on 5.0.5.2/5.0.5.3. You can ignore this issue, and restart AFM filesets at cache. ~Venkat (vpuvvada at in.ibm.com) From: Robert Horton To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 12:33 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] SpectrumScale / AFM / Singularity soft lockups Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Sorry for the delay... On Fri, 2021-03-19 at 12:02 +0530, Venkateswara R Puvvada wrote: > ... > 1. Stop all AFM filesets at cache using "mmafmctl device stop -j > fileset" command. > 2. Perform rolling upgrade parallely at both cache and home clusters > a. All nodes on home cluster to 5.0.5.6 > b. All gateway nodes in cache cluster to 5.0.5.6 > 3. At home cluster, for each fileset target path, repeat below steps > a. Remove .afmctl file > mmafmlocal rm /.afm/.afmctl > b. Enable AFM At point 3 I'm getting: # mmafmlocal rm /.afm/.afmctl /bin/rm: cannot remove ?/.afm/.afmctl?: Operation not permitted afmconfig disable is the same. Any idea what the issue is? Thanks, Rob -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=OLf3tBvTItpLRieM34xb8Xd69tBYbwTDYAecT0D_B7k&s=FCJEEoTWGIoM4eY4SMzE55qskwhAnxC_noZu7fJHoqw&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasad.surampudi at theatsgroup.com Wed Mar 24 14:32:30 2021 From: prasad.surampudi at theatsgroup.com (Prasad Surampudi) Date: Wed, 24 Mar 2021 14:32:30 +0000 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Message-ID: Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Mar 25 16:33:48 2021 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 25 Mar 2021 11:33:48 -0500 Subject: [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems In-Reply-To: References: Message-ID: Prasad, This is unexpected. Please open a PMR so that data can be collected and looked at. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Prasad Surampudi To: "gpfsug-discuss at spectrumscale.org" Date: 03/24/2021 10:32 AM Subject: [EXTERNAL] [gpfsug-discuss] mmrepquota is not reporting root fileset statistics for some filesystems Sent by: gpfsug-discuss-bounces at spectrumscale.org Recently while checking fileset quotas in a ESS cluster, we noticed that the mmrepquota command is not reporting the root fileset quota and inode details for some filesystems. Does anyone else also saw this issue? Please see the output below. The root fileset shows up for 'prod' filesystem and does not show up for 'prod-private'. I could not figure out why it does not show up for prod-private. Any ideas? /usr/lpp/mmfs/bin/mmrepquota -j prod-private Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace xFIN root FILESET 12028144 0 0 0 none | 4524237 0 0 0 none /usr/lpp/mmfs/bin/mmrepquota -j prod Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace root root FILESET 7106656 0 0 1273643728 none | 7 0 0 400 none xxx_tick root FILESET 0 0 0 0 none | 1 0 0 0 none _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oluwasijibomi.saula at ndsu.edu Mon Mar 29 19:38:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Mon, 29 Mar 2021 18:38:00 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Mar 30 07:06:54 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 30 Mar 2021 06:06:54 +0000 Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From oluwasijibomi.saula at ndsu.edu Tue Mar 30 19:24:00 2021 From: oluwasijibomi.saula at ndsu.edu (Saula, Oluwasijibomi) Date: Tue, 30 Mar 2021 18:24:00 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 110, Issue 34 In-Reply-To: References: Message-ID: Hey Olaf, We'll investigate as suggested. I'm hopeful the journald logs would provide some additional insight. As for OFED versions, we use the same Mellanox version across the cluster and haven't seen any issues with working nodes that mount the filesystem. We also have a PMR open with IBM but we'll send a follow-up if we discover something more for group discussion. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, March 30, 2021 1:07 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 110, Issue 34 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Filesystem mount attempt hangs GPFS client node (Saula, Oluwasijibomi) 2. Re: Filesystem mount attempt hangs GPFS client node (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 29 Mar 2021 18:38:00 +0000 From: "Saula, Oluwasijibomi" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="utf-8" Hello Folks, So we are experiencing a mind-boggling issue where just a couple of nodes in our cluster, at GPFS boot up, get hung so badly that the node must be power reset. These AMD client nodes are diskless in nature and have at least 256G of memory. We have other AMD nodes that are working just fine in a separate GPFS cluster albeit on RHEL7. Just before GPFS (or related processes) seize up the node, the following lines of /var/mmfs/gen/mmfslog are noted: 2021-03-29_12:47:37.343-0500: [N] mmfsd ready 2021-03-29_12:47:37.426-0500: mmcommon mmfsup invoked. Parameters: 10.12.50.47 10.12.50.242 all 2021-03-29_12:47:37.587-0500: mounting /dev/mmfs1 2021-03-29_12:47:37.590-0500: [I] Command: mount mmfs1 2021-03-29_12:47:37.859-0500: [N] Connecting to 10.12.50.243 tier1-sn-02.pixstor 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01.pixstor) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.864-0500: [I] VERBS RDMA connecting to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.866-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 0 2021-03-29_12:47:37.867-0500: [I] VERBS RDMA connected to 10.12.50.242 (tier1-sn-01) on mlx5_0 port 1 fabnum 0 sl 0 index 1 2021-03-29_12:47:37.868-0500: [I] Connected to 10.12.50.243 tier1-sn-02 There have been hunches that this might be a network issue, however, other nodes connected to the IB network switch are mounting the filesystem without incident. I'm inclined to believe there's a GPFS/OS-specific setting that might be causing these crashes especially when we note that disabling the automount on the client node doesn't result in the node hanging. However, once we issue mmmount, we see the node seize up shortly... Please let me know if you have any thoughts on where to look for root-causes as I and a few fellows are stuck here ? Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu [cid:image001.gif at 01D57DE0.91C300C0] -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 30 Mar 2021 06:06:54 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Filesystem mount attempt hangs GPFS client node Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 110, Issue 34 *********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: