From ewahl at osc.edu Mon Oct 4 23:23:59 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 4 Oct 2021 22:23:59 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 5 00:28:57 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 5 Oct 2021 00:28:57 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ?Not > concerned?about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Tue Oct 5 07:10:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 5 Oct 2021 06:10:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From peter.chase at metoffice.gov.uk Tue Oct 5 11:00:17 2021 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 5 Oct 2021 10:00:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 117, Issue 1 Message-ID: Morning Ed, I'm not sure how useful this would be if you're wanting to hunt for bad file names, but in the past we've used the built in HEX function to convert problem strings to hex and have an external script convert it back into ASCII/Unicode (whatever it should be). That way all the intelligence goes into an external script and there's no digging around in ILM to find a solution. I don't have an example to hand, but if you're interested in the approach I can probably get one for you. Regards, Pete Chase Met Office SVM team -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 05 October 2021 07:11 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 117, Issue 1 This email was received from an external source. Always check sender details, links & attachments. Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Handling bad file names in policies? (Wahl, Edward) 2. Re: Handling bad file names in policies? (Jonathan Buzzard) 3. Re: Handling bad file names in policies? (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 4 Oct 2021 22:23:59 +0000 From: "Wahl, Edward" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="utf-8" I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 5 Oct 2021 00:28:57 +0100 From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset=utf-8; format=flowed On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? > ?Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 3 Date: Tue, 5 Oct 2021 06:10:26 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 117, Issue 1 ********************************************** From chair at spectrumscale.org Fri Oct 8 16:29:31 2021 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Oct 2021 16:29:31 +0100 Subject: [gpfsug-discuss] IBM Webinar: Spectrum Scale Information Lifecycle Management (ILM) Message-ID: Hi All, IBM are running a Webinar on 20th October and 21st October titled: ?Spectrum Scale Information Lifecycle Management (ILM)? Which might be of interest to the group. Details and registration are at: https://www.ibm.com/support/pages/node/6480851 The webinar will be running in two timezones, please check the web page for details. Thanks Simon SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 19:14:26 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 18:14:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ? > Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ From anacreo at gmail.com Fri Oct 8 20:36:03 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 12:36:03 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot > recall what version of TSM-EE that was. But newline has been the only > stopping point, for what seems like forever. > Having filed many an mmbackup bug, I don't recall ever crashing on > filenames. (tons of OTHER reasons, but not character set) We even > generate an error report from this and email users to fix it. > We accept basically almost everything else, and I have to say, we see some > really crazy things sometimes. I think my current favorite is the full > windows paths as a filename. > (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters frequently > used in the names of files or directories, backup failures might occur. > Known special characters that require special handling include: *, ?, ", ?, > carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in > backup activities and make sure that the mmbackup option --noquote is used > when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches didn't > show what TSM doesn't accept. > > Ed Wahl > OSC > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: Monday, October 4, 2021 7:29 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Handling bad file names in policies? > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I > > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > > are from 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. > > Not concerned about the dreaded 'newline' as we force users to fix > > those or the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM would > backup files with newlines in them no problem. mmbackup on the other hand > in that time frame would simply die and backup nothing if there was a > single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file names. > Basically you cut and paste the file name complete with newlines (most > likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:42:00 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:42:00 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that I can easily modify an EXERNAL LIST to do what I want, which is a LIST policy using MISC_ATTRIBUTES and find all files without X, etc. And using mmlsattr on hundreds of millions of files will take until the next millennium, so I really would like to stick with the policy engine. Perhaps I can do some RULE 1 feeds RULE 2 type thing? Sort of thing I?m looking at: define( immut, MISC_ATTRIBUTES LIKE '%X%') RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) Ed Wahl OSC From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Olaf Weiser Sent: Tuesday, October 5, 2021 2:10 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Hi Ed, not a ready to run for "everything".. but just to remind, there is an ESCAPE statement by this you can cat policy2 RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' and turn a file name into smth , what a policy can use I haven't used it for a while , but here is an example from a while ago .. ;-) [root at c25m4n03 stupid_files]# ll total 0 -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen [root at c25m4n03 stupid_files]# policy: 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename 101381 1945364096 0 -- /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. verify with policy (ESCAPE '%/? ') 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename [...] hope this helps.. cheers ----- Urspr?ngliche Nachricht ----- Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: gpfsug-discuss at spectrumscale.org CC: Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? Datum: Di, 5. Okt 2021 01:29 On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. Not > concerned about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:44:14 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:44:14 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This is an interesting idea, but not at all what I was working towards, and is getting me off track. (and I'm known to get distracted and explore interesting Rabbit Holes, red herrings, et al) I've next to no issues with the filenames in day to day operations. On the positive side, this is a one off. What I need is a LIST policy, and the return leaves off the entire filename. Ed Wahl ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Alec Sent: Friday, October 8, 2021 3:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward > wrote: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. > Not concerned about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Oct 8 22:02:47 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 14:02:47 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Well.... How about: define(DISPLAY_NEWLINE,[CASE WHEN ($1) *HAS NEWLINE* THEN *REPLACE NEWLINE WITH ALTERNATE CHARACTER* ELSE varchar(1) END]) Define your show to have the DISPLAY_NEWLINE in place of the file name? Sorry I don't know offhand how to do the find newline and replace newline sql string code, I don't have gpfs at home sadly. On Fri, Oct 8, 2021, 1:42 PM Wahl, Edward wrote: > Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that > I can easily modify an EXERNAL LIST to do what I want, which is a LIST > policy using MISC_ATTRIBUTES and find all files without X, etc. > > And using mmlsattr on hundreds of millions of files will take until the > next millennium, so I really would like to stick with the policy engine. Perhaps > I can do some RULE 1 feeds RULE 2 type thing? > > > > Sort of thing I?m looking at: > > > > define( immut, MISC_ATTRIBUTES LIKE '%X%') > > RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) > > > > > > Ed Wahl > > OSC > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *On Behalf Of *Olaf Weiser > *Sent:* Tuesday, October 5, 2021 2:10 AM > *To:* gpfsug-discuss at spectrumscale.org > *Cc:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] Handling bad file names in policies? > > > > Hi Ed, > > > > not a ready to run for "everything".. but just to remind, there is an > ESCAPE statement > > by this you can > > > > cat policy2 > RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' > > > > and turn a file name into smth , what a policy can use > > > > I haven't used it for a while , but here is an example from a while ago .. > ;-) > > > > [root at c25m4n03 stupid_files]# ll > total 0 > -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename > -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen > [root at c25m4n03 stupid_files]# > > > > > > policy: > > 101378 247907919 0 -- > /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename > 101381 1945364096 0 -- > /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename > [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. > > > > > verify with policy (ESCAPE '%/? ') > > 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename > [...] > > > > > > hope this helps.. > > cheers > > > > > > > > > > ----- Urspr?ngliche Nachricht ----- > Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org > An: gpfsug-discuss at spectrumscale.org > CC: > Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in > policies? > Datum: Di, 5. Okt 2021 01:29 > > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I solved > > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > > 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. Not > > concerned about the dreaded 'newline' as we force users to fix those or > > the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM > would backup files with newlines in them no problem. mmbackup on the > other hand in that time frame would simply die and backup nothing if > there was a single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file > names. Basically you cut and paste the file name complete with newlines > (most likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 9 10:09:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 9 Oct 2021 10:09:22 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From s.j.thompson at bham.ac.uk Mon Oct 11 09:35:31 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Oct 2021 08:35:31 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> Message-ID: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Mon Oct 11 09:55:45 2021 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 11 Oct 2021 08:55:45 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: We've had this same issue with characters that are fine in Scale but Protect can't handle. Normally its because some script has embedded a newline in the middle of a file name, and normally we end up renaming that file by inode number find . -inum 9975226749 -exec mv {} badfilename \; mostly because we can't even type the filename at the command prompt. However its not always just new line characters currently we've got a few files with unprintable characters in it. but its normally less than 50 files every few months, so is easy to handle manually. I normally end up looking at /data/mmbackup.unsupported which is the standard output from mmapplypolicy and extracting the file names from it and emailing the users concerned to assist them in working out what went wrong. I guess you could automate the parsing of this file at the end of the backup process and do something interesting with it. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Monday, October 11, 2021 9:35 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Mon Oct 11 11:47:49 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 11 Oct 2021 11:47:49 +0100 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: <750aa707-8949-c416-c432-0b07cb8498f8@strath.ac.uk> On 11/10/2021 09:55, Peter Childs wrote> > We've had this same issue with characters that are fine in Scale but > Protect can't handle. Normally its because some script has embedded a > newline in the middle of a file name, and normally we end up renaming > that file by inode number > > find . -inum 9975226749 -exec mv {} badfilename \; > > mostly because we can't even type the filename at the command > prompt. > You can it just requires know how. I will freely admit it took me a long time to work out how to do it. The dirty alternative that sometimes works is to use wildcards. What gets me is I have never created a single file with "problem" characters in the filename in over 30 years of computing. Well apart from deliberately trying to work out how the hell you do it, and it's not easy. I think the most likely answer for newlines in file names is cut and paste into a file save dialogue box. > However its not always just new line characters currently we've got a > few files with unprintable characters in it. but its normally less > than 50 files every few months, so is easy to handle manually. Mostly I find the none newline issues are down to "foreigners" using something other than UTF-8 (aka random stupid Windows code pages) to give files names in their native language. You can usually work out what the filename is supposed to be once you know the nationality of the file owner. Again I think this happens due to cut and paste from text documents in none UTF-8 encodings. So for example take something Cyrillic in codepage 1251, copy and paste it into a file save dialogue box and end up with a filename containing unprintable characters. > I normally end up looking at /data/mmbackup.unsupported which is the > standard output from mmapplypolicy and extracting the file names from > it and emailing the users concerned to assist them in working out > what went wrong. > > I guess you could automate the parsing of this file at the end of the > backup process and do something interesting with it. > Email the owner of the file and tell them it's not being backed up and won't be till they "fix" the file name so that backup software can process it. If it is just a newline I would be tempted to have them automatically renamed sans the newline, and then send the file owner an email (per file) letting them know what has happened. If their inbox is spammed that will hopefully prompt them to stop doing it :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stuartb at 4gh.net Tue Oct 19 18:16:54 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Tue, 19 Oct 2021 13:16:54 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance Message-ID: Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From stockf at us.ibm.com Tue Oct 19 18:58:40 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 19 Oct 2021 17:58:40 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 19 19:12:34 2021 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 19 Oct 2021 18:12:34 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: Not that it answers Stuart's questions in any way, but we gave up on the same problem on a similar setup, rescued an old fileserver off the scrapheap (RAID6 of 12 x 7.2k rpm SAS on a PERC H710P) and just served the reference data by nfs - good enough to keep the compute busy rather than in cxiWaitEventWait. If there's significant demand for Alphafold then somebody's arm will be twisted for a new server with some NVMe. If I remember right, the reference data is ~2.3TB, ruling out our usual approach of just reading the problematic files into a ramdisk first. We are also interested in hearing how it might be usably served from GPFS. Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287873 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stuart Barkley [stuartb at 4gh.net] Sent: 19 October 2021 18:16 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] alphafold and mmap performance Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Oct 19 21:27:39 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 19 Oct 2021 20:27:39 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Thu Oct 21 00:19:51 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 20 Oct 2021 19:19:51 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: Thanks Olaf, Jon and Fred. Some more details below. We may just need to wait on things to evolve (us getting Spectrum Scale 5 installed, alphafold getting HPC specific improvements). It will also be driven by whether our users have a real need for alphafold or are just enthusiastic due to the press releases. On Tue, 19 Oct 2021 at 16:27 -0000, Olaf Weiser wrote: > > [...] We have tried a number of things including Spectrum Scale > > client version 5.0.5-9[...] > in the client code or the server code? Our main client code is 4.2.3-22 but I'm trying 5.0.5-9 on a test client. The server code is (very old) 4.2.3-5. > there are going? multiple improvements in the code.. continuously... > Since your version 4.2.3 /? 5.0.5 a lot of them are in the area of > NSD server/GNR (which is server based) and also a lot of > enhancements went into the client part. Some are on both .. such as > RoCE, or using multiple TCP/IP sockets per communication pair, > etc.... All this influences your performance.. Thanks for the information. Some of this sounds good. We had upgrade issues with DDN but we now have a license for Spectrum Scale 5. Its now mostly getting enough cycles to do the update. > But Id like to try to give you some answers to? your specific Q - > > Only now do I notice a suggestion: > > ?? ?mmchconfig prefetchAggressivenessRead=0 -i > > I did not use this. ?Would a performance change be expected? > YES;-)? .. this parameter should really help.. I'm trying this now with the 5.0 client. Initial indications are that there may be about 50% performance improvement but that is still significantly lower than we would hope. Using "mmdiag --iohist" we were seeing 750-900 8 sector reads per second. With prefetchAggressivenessRead=0 it looks the 8 sector reads seem about as frequent but there are often (5-10/second) reads of 100-2000 sectors in the mix. A rough estimate is the large reads are for about the same amount of data as the 8 sector reads. The number of large sector reads seem to be decreasing over time. I don't know the specifics of the algorithm but I image there is a lot of jumping around in the data. The early large reads may have brought in the more common regions and now it is filling the less dense regions. Just a thought. > from the UG expert talk 2020 we shared some numbers/charts on it > https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talks-update-on-per > formance-enhancements-in-spectrum-scale/ starting ~ 8:30 minutes / > just 2 slides? ... let us know, if you need more information Yes, I had looked at the slides but not listened to the talk which was a mistake. There were some other interesting tidbits. In particular if we can get this to work we may try a scheduler prolog/epilog to change the parameter. We can look at that after our move from Grid Engine to Slurm which requires other cycles. On Tue, 19 Oct 2021 at 14:12 -0000, Jon Diprose wrote: > If I remember right, the reference data is ~2.3TB, ruling out our > usual approach of just reading the problematic files into a ramdisk > first. We found the critical file is about 1.5TB and we are able to load that into ramdisk on a 2TB system (but it doesn't have any GPUs). We also have some old "spare" hardware that might be built as an NFS appliance for this purpose. I would prefer to see the ~10 year old hardware die. The alphafold application is one large monolith. The first phase does some large I/O and CPU intensive operations. The second phase does some GPU operations. We would prefer to separate the non-GPU code from the GPU code so we could have the GPU systems doing GPU stuff. We do this quite effectively with some of our other GPU code with CPU based pre/post processing. Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From ewahl at osc.edu Mon Oct 4 23:23:59 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 4 Oct 2021 22:23:59 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 5 00:28:57 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 5 Oct 2021 00:28:57 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ?Not > concerned?about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Tue Oct 5 07:10:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 5 Oct 2021 06:10:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From peter.chase at metoffice.gov.uk Tue Oct 5 11:00:17 2021 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 5 Oct 2021 10:00:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 117, Issue 1 Message-ID: Morning Ed, I'm not sure how useful this would be if you're wanting to hunt for bad file names, but in the past we've used the built in HEX function to convert problem strings to hex and have an external script convert it back into ASCII/Unicode (whatever it should be). That way all the intelligence goes into an external script and there's no digging around in ILM to find a solution. I don't have an example to hand, but if you're interested in the approach I can probably get one for you. Regards, Pete Chase Met Office SVM team -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 05 October 2021 07:11 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 117, Issue 1 This email was received from an external source. Always check sender details, links & attachments. Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Handling bad file names in policies? (Wahl, Edward) 2. Re: Handling bad file names in policies? (Jonathan Buzzard) 3. Re: Handling bad file names in policies? (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 4 Oct 2021 22:23:59 +0000 From: "Wahl, Edward" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="utf-8" I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 5 Oct 2021 00:28:57 +0100 From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset=utf-8; format=flowed On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? > ?Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 3 Date: Tue, 5 Oct 2021 06:10:26 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 117, Issue 1 ********************************************** From chair at spectrumscale.org Fri Oct 8 16:29:31 2021 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Oct 2021 16:29:31 +0100 Subject: [gpfsug-discuss] IBM Webinar: Spectrum Scale Information Lifecycle Management (ILM) Message-ID: Hi All, IBM are running a Webinar on 20th October and 21st October titled: ?Spectrum Scale Information Lifecycle Management (ILM)? Which might be of interest to the group. Details and registration are at: https://www.ibm.com/support/pages/node/6480851 The webinar will be running in two timezones, please check the web page for details. Thanks Simon SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 19:14:26 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 18:14:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ? > Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ From anacreo at gmail.com Fri Oct 8 20:36:03 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 12:36:03 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot > recall what version of TSM-EE that was. But newline has been the only > stopping point, for what seems like forever. > Having filed many an mmbackup bug, I don't recall ever crashing on > filenames. (tons of OTHER reasons, but not character set) We even > generate an error report from this and email users to fix it. > We accept basically almost everything else, and I have to say, we see some > really crazy things sometimes. I think my current favorite is the full > windows paths as a filename. > (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters frequently > used in the names of files or directories, backup failures might occur. > Known special characters that require special handling include: *, ?, ", ?, > carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in > backup activities and make sure that the mmbackup option --noquote is used > when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches didn't > show what TSM doesn't accept. > > Ed Wahl > OSC > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: Monday, October 4, 2021 7:29 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Handling bad file names in policies? > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I > > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > > are from 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. > > Not concerned about the dreaded 'newline' as we force users to fix > > those or the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM would > backup files with newlines in them no problem. mmbackup on the other hand > in that time frame would simply die and backup nothing if there was a > single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file names. > Basically you cut and paste the file name complete with newlines (most > likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:42:00 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:42:00 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that I can easily modify an EXERNAL LIST to do what I want, which is a LIST policy using MISC_ATTRIBUTES and find all files without X, etc. And using mmlsattr on hundreds of millions of files will take until the next millennium, so I really would like to stick with the policy engine. Perhaps I can do some RULE 1 feeds RULE 2 type thing? Sort of thing I?m looking at: define( immut, MISC_ATTRIBUTES LIKE '%X%') RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) Ed Wahl OSC From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Olaf Weiser Sent: Tuesday, October 5, 2021 2:10 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Hi Ed, not a ready to run for "everything".. but just to remind, there is an ESCAPE statement by this you can cat policy2 RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' and turn a file name into smth , what a policy can use I haven't used it for a while , but here is an example from a while ago .. ;-) [root at c25m4n03 stupid_files]# ll total 0 -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen [root at c25m4n03 stupid_files]# policy: 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename 101381 1945364096 0 -- /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. verify with policy (ESCAPE '%/? ') 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename [...] hope this helps.. cheers ----- Urspr?ngliche Nachricht ----- Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: gpfsug-discuss at spectrumscale.org CC: Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? Datum: Di, 5. Okt 2021 01:29 On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. Not > concerned about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:44:14 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:44:14 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This is an interesting idea, but not at all what I was working towards, and is getting me off track. (and I'm known to get distracted and explore interesting Rabbit Holes, red herrings, et al) I've next to no issues with the filenames in day to day operations. On the positive side, this is a one off. What I need is a LIST policy, and the return leaves off the entire filename. Ed Wahl ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Alec Sent: Friday, October 8, 2021 3:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward > wrote: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. > Not concerned about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Oct 8 22:02:47 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 14:02:47 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Well.... How about: define(DISPLAY_NEWLINE,[CASE WHEN ($1) *HAS NEWLINE* THEN *REPLACE NEWLINE WITH ALTERNATE CHARACTER* ELSE varchar(1) END]) Define your show to have the DISPLAY_NEWLINE in place of the file name? Sorry I don't know offhand how to do the find newline and replace newline sql string code, I don't have gpfs at home sadly. On Fri, Oct 8, 2021, 1:42 PM Wahl, Edward wrote: > Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that > I can easily modify an EXERNAL LIST to do what I want, which is a LIST > policy using MISC_ATTRIBUTES and find all files without X, etc. > > And using mmlsattr on hundreds of millions of files will take until the > next millennium, so I really would like to stick with the policy engine. Perhaps > I can do some RULE 1 feeds RULE 2 type thing? > > > > Sort of thing I?m looking at: > > > > define( immut, MISC_ATTRIBUTES LIKE '%X%') > > RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) > > > > > > Ed Wahl > > OSC > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *On Behalf Of *Olaf Weiser > *Sent:* Tuesday, October 5, 2021 2:10 AM > *To:* gpfsug-discuss at spectrumscale.org > *Cc:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] Handling bad file names in policies? > > > > Hi Ed, > > > > not a ready to run for "everything".. but just to remind, there is an > ESCAPE statement > > by this you can > > > > cat policy2 > RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' > > > > and turn a file name into smth , what a policy can use > > > > I haven't used it for a while , but here is an example from a while ago .. > ;-) > > > > [root at c25m4n03 stupid_files]# ll > total 0 > -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename > -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen > [root at c25m4n03 stupid_files]# > > > > > > policy: > > 101378 247907919 0 -- > /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename > 101381 1945364096 0 -- > /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename > [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. > > > > > verify with policy (ESCAPE '%/? ') > > 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename > [...] > > > > > > hope this helps.. > > cheers > > > > > > > > > > ----- Urspr?ngliche Nachricht ----- > Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org > An: gpfsug-discuss at spectrumscale.org > CC: > Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in > policies? > Datum: Di, 5. Okt 2021 01:29 > > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I solved > > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > > 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. Not > > concerned about the dreaded 'newline' as we force users to fix those or > > the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM > would backup files with newlines in them no problem. mmbackup on the > other hand in that time frame would simply die and backup nothing if > there was a single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file > names. Basically you cut and paste the file name complete with newlines > (most likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 9 10:09:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 9 Oct 2021 10:09:22 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From s.j.thompson at bham.ac.uk Mon Oct 11 09:35:31 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Oct 2021 08:35:31 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> Message-ID: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Mon Oct 11 09:55:45 2021 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 11 Oct 2021 08:55:45 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: We've had this same issue with characters that are fine in Scale but Protect can't handle. Normally its because some script has embedded a newline in the middle of a file name, and normally we end up renaming that file by inode number find . -inum 9975226749 -exec mv {} badfilename \; mostly because we can't even type the filename at the command prompt. However its not always just new line characters currently we've got a few files with unprintable characters in it. but its normally less than 50 files every few months, so is easy to handle manually. I normally end up looking at /data/mmbackup.unsupported which is the standard output from mmapplypolicy and extracting the file names from it and emailing the users concerned to assist them in working out what went wrong. I guess you could automate the parsing of this file at the end of the backup process and do something interesting with it. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Monday, October 11, 2021 9:35 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Mon Oct 11 11:47:49 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 11 Oct 2021 11:47:49 +0100 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: <750aa707-8949-c416-c432-0b07cb8498f8@strath.ac.uk> On 11/10/2021 09:55, Peter Childs wrote> > We've had this same issue with characters that are fine in Scale but > Protect can't handle. Normally its because some script has embedded a > newline in the middle of a file name, and normally we end up renaming > that file by inode number > > find . -inum 9975226749 -exec mv {} badfilename \; > > mostly because we can't even type the filename at the command > prompt. > You can it just requires know how. I will freely admit it took me a long time to work out how to do it. The dirty alternative that sometimes works is to use wildcards. What gets me is I have never created a single file with "problem" characters in the filename in over 30 years of computing. Well apart from deliberately trying to work out how the hell you do it, and it's not easy. I think the most likely answer for newlines in file names is cut and paste into a file save dialogue box. > However its not always just new line characters currently we've got a > few files with unprintable characters in it. but its normally less > than 50 files every few months, so is easy to handle manually. Mostly I find the none newline issues are down to "foreigners" using something other than UTF-8 (aka random stupid Windows code pages) to give files names in their native language. You can usually work out what the filename is supposed to be once you know the nationality of the file owner. Again I think this happens due to cut and paste from text documents in none UTF-8 encodings. So for example take something Cyrillic in codepage 1251, copy and paste it into a file save dialogue box and end up with a filename containing unprintable characters. > I normally end up looking at /data/mmbackup.unsupported which is the > standard output from mmapplypolicy and extracting the file names from > it and emailing the users concerned to assist them in working out > what went wrong. > > I guess you could automate the parsing of this file at the end of the > backup process and do something interesting with it. > Email the owner of the file and tell them it's not being backed up and won't be till they "fix" the file name so that backup software can process it. If it is just a newline I would be tempted to have them automatically renamed sans the newline, and then send the file owner an email (per file) letting them know what has happened. If their inbox is spammed that will hopefully prompt them to stop doing it :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stuartb at 4gh.net Tue Oct 19 18:16:54 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Tue, 19 Oct 2021 13:16:54 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance Message-ID: Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From stockf at us.ibm.com Tue Oct 19 18:58:40 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 19 Oct 2021 17:58:40 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 19 19:12:34 2021 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 19 Oct 2021 18:12:34 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: Not that it answers Stuart's questions in any way, but we gave up on the same problem on a similar setup, rescued an old fileserver off the scrapheap (RAID6 of 12 x 7.2k rpm SAS on a PERC H710P) and just served the reference data by nfs - good enough to keep the compute busy rather than in cxiWaitEventWait. If there's significant demand for Alphafold then somebody's arm will be twisted for a new server with some NVMe. If I remember right, the reference data is ~2.3TB, ruling out our usual approach of just reading the problematic files into a ramdisk first. We are also interested in hearing how it might be usably served from GPFS. Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287873 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stuart Barkley [stuartb at 4gh.net] Sent: 19 October 2021 18:16 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] alphafold and mmap performance Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Oct 19 21:27:39 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 19 Oct 2021 20:27:39 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Thu Oct 21 00:19:51 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 20 Oct 2021 19:19:51 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: Thanks Olaf, Jon and Fred. Some more details below. We may just need to wait on things to evolve (us getting Spectrum Scale 5 installed, alphafold getting HPC specific improvements). It will also be driven by whether our users have a real need for alphafold or are just enthusiastic due to the press releases. On Tue, 19 Oct 2021 at 16:27 -0000, Olaf Weiser wrote: > > [...] We have tried a number of things including Spectrum Scale > > client version 5.0.5-9[...] > in the client code or the server code? Our main client code is 4.2.3-22 but I'm trying 5.0.5-9 on a test client. The server code is (very old) 4.2.3-5. > there are going? multiple improvements in the code.. continuously... > Since your version 4.2.3 /? 5.0.5 a lot of them are in the area of > NSD server/GNR (which is server based) and also a lot of > enhancements went into the client part. Some are on both .. such as > RoCE, or using multiple TCP/IP sockets per communication pair, > etc.... All this influences your performance.. Thanks for the information. Some of this sounds good. We had upgrade issues with DDN but we now have a license for Spectrum Scale 5. Its now mostly getting enough cycles to do the update. > But Id like to try to give you some answers to? your specific Q - > > Only now do I notice a suggestion: > > ?? ?mmchconfig prefetchAggressivenessRead=0 -i > > I did not use this. ?Would a performance change be expected? > YES;-)? .. this parameter should really help.. I'm trying this now with the 5.0 client. Initial indications are that there may be about 50% performance improvement but that is still significantly lower than we would hope. Using "mmdiag --iohist" we were seeing 750-900 8 sector reads per second. With prefetchAggressivenessRead=0 it looks the 8 sector reads seem about as frequent but there are often (5-10/second) reads of 100-2000 sectors in the mix. A rough estimate is the large reads are for about the same amount of data as the 8 sector reads. The number of large sector reads seem to be decreasing over time. I don't know the specifics of the algorithm but I image there is a lot of jumping around in the data. The early large reads may have brought in the more common regions and now it is filling the less dense regions. Just a thought. > from the UG expert talk 2020 we shared some numbers/charts on it > https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talks-update-on-per > formance-enhancements-in-spectrum-scale/ starting ~ 8:30 minutes / > just 2 slides? ... let us know, if you need more information Yes, I had looked at the slides but not listened to the talk which was a mistake. There were some other interesting tidbits. In particular if we can get this to work we may try a scheduler prolog/epilog to change the parameter. We can look at that after our move from Grid Engine to Slurm which requires other cycles. On Tue, 19 Oct 2021 at 14:12 -0000, Jon Diprose wrote: > If I remember right, the reference data is ~2.3TB, ruling out our > usual approach of just reading the problematic files into a ramdisk > first. We found the critical file is about 1.5TB and we are able to load that into ramdisk on a 2TB system (but it doesn't have any GPUs). We also have some old "spare" hardware that might be built as an NFS appliance for this purpose. I would prefer to see the ~10 year old hardware die. The alphafold application is one large monolith. The first phase does some large I/O and CPU intensive operations. The second phase does some GPU operations. We would prefer to separate the non-GPU code from the GPU code so we could have the GPU systems doing GPU stuff. We do this quite effectively with some of our other GPU code with CPU based pre/post processing. Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From ewahl at osc.edu Mon Oct 4 23:23:59 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 4 Oct 2021 22:23:59 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 5 00:28:57 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 5 Oct 2021 00:28:57 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ?Not > concerned?about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From olaf.weiser at de.ibm.com Tue Oct 5 07:10:26 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 5 Oct 2021 06:10:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From peter.chase at metoffice.gov.uk Tue Oct 5 11:00:17 2021 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 5 Oct 2021 10:00:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 117, Issue 1 Message-ID: Morning Ed, I'm not sure how useful this would be if you're wanting to hunt for bad file names, but in the past we've used the built in HEX function to convert problem strings to hex and have an external script convert it back into ASCII/Unicode (whatever it should be). That way all the intelligence goes into an external script and there's no digging around in ILM to find a solution. I don't have an example to hand, but if you're interested in the approach I can probably get one for you. Regards, Pete Chase Met Office SVM team -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 05 October 2021 07:11 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 117, Issue 1 This email was received from an external source. Always check sender details, links & attachments. Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Handling bad file names in policies? (Wahl, Edward) 2. Re: Handling bad file names in policies? (Jonathan Buzzard) 3. Re: Handling bad file names in policies? (Olaf Weiser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 4 Oct 2021 22:23:59 +0000 From: "Wahl, Edward" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="utf-8" I know I've run into this before way back, but my notes on how I solved this aren't getting the job done in Scale 5.0.5.8 and my notes are from 3.5. ? Anyone know a way to get a LIST policy to properly feed bad filenames into the output or an external script? When I say bad I mean things like control characters, spaces, etc. Not concerned about the dreaded 'newline' as we force users to fix those or the files do not get backed up in Tivoli. Ed Wahl OSC -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 5 Oct 2021 00:28:57 +0100 From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset=utf-8; format=flowed On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? > ?Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 3 Date: Tue, 5 Oct 2021 06:10:26 +0000 From: "Olaf Weiser" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Message-ID: Content-Type: text/plain; charset="us-ascii" An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 117, Issue 1 ********************************************** From chair at spectrumscale.org Fri Oct 8 16:29:31 2021 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Fri, 08 Oct 2021 16:29:31 +0100 Subject: [gpfsug-discuss] IBM Webinar: Spectrum Scale Information Lifecycle Management (ILM) Message-ID: Hi All, IBM are running a Webinar on 20th October and 21st October titled: ?Spectrum Scale Information Lifecycle Management (ILM)? Which might be of interest to the group. Details and registration are at: https://www.ibm.com/support/pages/node/6480851 The webinar will be running in two timezones, please check the web page for details. Thanks Simon SSUG Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 19:14:26 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 18:14:26 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5.? ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc.? ? > Not concerned?about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ From anacreo at gmail.com Fri Oct 8 20:36:03 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 12:36:03 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot > recall what version of TSM-EE that was. But newline has been the only > stopping point, for what seems like forever. > Having filed many an mmbackup bug, I don't recall ever crashing on > filenames. (tons of OTHER reasons, but not character set) We even > generate an error report from this and email users to fix it. > We accept basically almost everything else, and I have to say, we see some > really crazy things sometimes. I think my current favorite is the full > windows paths as a filename. > (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters frequently > used in the names of files or directories, backup failures might occur. > Known special characters that require special handling include: *, ?, ", ?, > carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in > backup activities and make sure that the mmbackup option --noquote is used > when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches didn't > show what TSM doesn't accept. > > Ed Wahl > OSC > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard > Sent: Monday, October 4, 2021 7:29 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Handling bad file names in policies? > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I > > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > > are from 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. > > Not concerned about the dreaded 'newline' as we force users to fix > > those or the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM would > backup files with newlines in them no problem. mmbackup on the other hand > in that time frame would simply die and backup nothing if there was a > single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file names. > Basically you cut and paste the file name complete with newlines (most > likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:42:00 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:42:00 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: , Message-ID: Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that I can easily modify an EXERNAL LIST to do what I want, which is a LIST policy using MISC_ATTRIBUTES and find all files without X, etc. And using mmlsattr on hundreds of millions of files will take until the next millennium, so I really would like to stick with the policy engine. Perhaps I can do some RULE 1 feeds RULE 2 type thing? Sort of thing I?m looking at: define( immut, MISC_ATTRIBUTES LIKE '%X%') RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) Ed Wahl OSC From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Olaf Weiser Sent: Tuesday, October 5, 2021 2:10 AM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Hi Ed, not a ready to run for "everything".. but just to remind, there is an ESCAPE statement by this you can cat policy2 RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' and turn a file name into smth , what a policy can use I haven't used it for a while , but here is an example from a while ago .. ;-) [root at c25m4n03 stupid_files]# ll total 0 -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen [root at c25m4n03 stupid_files]# policy: 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename 101381 1945364096 0 -- /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. verify with policy (ESCAPE '%/? ') 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename [...] hope this helps.. cheers ----- Urspr?ngliche Nachricht ----- Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org An: gpfsug-discuss at spectrumscale.org CC: Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? Datum: Di, 5. Okt 2021 01:29 On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I solved > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. Not > concerned about the dreaded 'newline' as we force users to fix those or > the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Oct 8 21:44:14 2021 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 8 Oct 2021 20:44:14 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: This is an interesting idea, but not at all what I was working towards, and is getting me off track. (and I'm known to get distracted and explore interesting Rabbit Holes, red herrings, et al) I've next to no issues with the filenames in day to day operations. On the positive side, this is a one off. What I need is a LIST policy, and the return leaves off the entire filename. Ed Wahl ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Alec Sent: Friday, October 8, 2021 3:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Handling bad file names in policies? Why not just configure a file placement policy using a non existent pool or a bad encryption key to prevent files with non-printables characters from even being created in the first place. Alec On Fri, Oct 8, 2021, 11:49 AM Wahl, Edward > wrote: This goes back as far as I can recall to <=GPFS 3.5 days. And no, I cannot recall what version of TSM-EE that was. But newline has been the only stopping point, for what seems like forever. Having filed many an mmbackup bug, I don't recall ever crashing on filenames. (tons of OTHER reasons, but not character set) We even generate an error report from this and email users to fix it. We accept basically almost everything else, and I have to say, we see some really crazy things sometimes. I think my current favorite is the full windows paths as a filename. (eg: "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) Current IBM documentation doesn't go backwards past 4.2 but it says: "For IBM Spectrum Scale? file systems with special characters frequently used in the names of files or directories, backup failures might occur. Known special characters that require special handling include: *, ?, ", ?, carriage return, and the new line character. In such cases, enable the Tivoli Storage Manager client options WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used when invoking mmbackup." So maybe we could handle newlines somehow. But my lazy searches didn't show what TSM doesn't accept. Ed Wahl OSC -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: Monday, October 4, 2021 7:29 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Handling bad file names in policies? On 04/10/2021 23:23, Wahl, Edward wrote: > I know I've run into this before way back, but my notes on how I > solved this aren't getting the job done in Scale 5.0.5.8 and my notes > are from 3.5. ? > Anyone know a way to get a LIST policy to properly feed bad filenames > into the output or an external script? > > When I say bad I mean things like control characters, spaces, etc. > Not concerned about the dreaded 'newline' as we force users to fix > those or the files do not get backed up in Tivoli. > Since when? Last time I checked which was admittedly circa 2008, TSM would backup files with newlines in them no problem. mmbackup on the other hand in that time frame would simply die and backup nothing if there was a single file on the file system with a newline in it. I would take a look at the mmbackup scripts which can handle such stuff (least ways in >4.2) which would also suggest dsmc can handle it. As an aside I now think I know how you end up with newlines in file names. Basically you cut and paste the file name complete with newlines (most likely at the end) into a text field when saving the file. Personally I think any program should baulk at that point but what do I know. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!nVH69Xr88S0X5DmO8QbaI7eozd9pDvmtMN40tZU8vWuduEF4J01ZTfnypvOy$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacreo at gmail.com Fri Oct 8 22:02:47 2021 From: anacreo at gmail.com (Alec) Date: Fri, 8 Oct 2021 14:02:47 -0700 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: Well.... How about: define(DISPLAY_NEWLINE,[CASE WHEN ($1) *HAS NEWLINE* THEN *REPLACE NEWLINE WITH ALTERNATE CHARACTER* ELSE varchar(1) END]) Define your show to have the DISPLAY_NEWLINE in place of the file name? Sorry I don't know offhand how to do the find newline and replace newline sql string code, I don't have gpfs at home sadly. On Fri, Oct 8, 2021, 1:42 PM Wahl, Edward wrote: > Sadly the ESCAPE only works for EXTERNAL LISTs, correct? Not sure that > I can easily modify an EXERNAL LIST to do what I want, which is a LIST > policy using MISC_ATTRIBUTES and find all files without X, etc. > > And using mmlsattr on hundreds of millions of files will take until the > next millennium, so I really would like to stick with the policy engine. Perhaps > I can do some RULE 1 feeds RULE 2 type thing? > > > > Sort of thing I?m looking at: > > > > define( immut, MISC_ATTRIBUTES LIKE '%X%') > > RULE 'listimmut' LIST 'not-immut' WHERE NOT (exclude_list) and NOT (immut) > > > > > > Ed Wahl > > OSC > > > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> *On Behalf Of *Olaf Weiser > *Sent:* Tuesday, October 5, 2021 2:10 AM > *To:* gpfsug-discuss at spectrumscale.org > *Cc:* gpfsug-discuss at spectrumscale.org > *Subject:* Re: [gpfsug-discuss] Handling bad file names in policies? > > > > Hi Ed, > > > > not a ready to run for "everything".. but just to remind, there is an > ESCAPE statement > > by this you can > > > > cat policy2 > RULE EXTERNAL LIST 'allfiles' EXEC '/var/mmfs/etc/list.exe' ESCAPE '%/#' > > > > and turn a file name into smth , what a policy can use > > > > I haven't used it for a while , but here is an example from a while ago .. > ;-) > > > > [root at c25m4n03 stupid_files]# ll > total 0 > -rw-r--r-- 1 root root 21 Mar 22 03:44 d?mlicher filename > -rw-r--r-- 1 root root 2 Mar 22 03:59 ???????? spacefilen > [root at c25m4n03 stupid_files]# > > > > > > policy: > > 101378 247907919 0 -- > /gpfs/fpofs/files/stupid_files/d%C3%A4mlicher%20filename > 101381 1945364096 0 -- > /gpfs/fpofs/files/stupid_files/%C3%BC%C3%BC%C3%BC%C3%B6%C3%B6%C3%A4%C3%A4%3F%3F%3F%C3%9F%C3%9F%20spacefilename > [I]2013-03-22 at 13:12:58.687 Policy execution. 2 files dispatched. > > > > > verify with policy (ESCAPE '%/? ') > > 101378 247907919 0 -- /gpfs/fpofs/files/stupid_files/d?mlicher filename > [...] > > > > > > hope this helps.. > > cheers > > > > > > > > > > ----- Urspr?ngliche Nachricht ----- > Von: "Jonathan Buzzard" > Gesendet von: gpfsug-discuss-bounces at spectrumscale.org > An: gpfsug-discuss at spectrumscale.org > CC: > Betreff: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in > policies? > Datum: Di, 5. Okt 2021 01:29 > > > On 04/10/2021 23:23, Wahl, Edward wrote: > > > I know I've run into this before way back, but my notes on how I solved > > this aren't getting the job done in Scale 5.0.5.8 and my notes are from > > 3.5. ? > > Anyone know a way to get a LIST policy to properly feed bad filenames > > into the output or an external script? > > > > When I say bad I mean things like control characters, spaces, etc. Not > > concerned about the dreaded 'newline' as we force users to fix those or > > the files do not get backed up in Tivoli. > > > > Since when? Last time I checked which was admittedly circa 2008, TSM > would backup files with newlines in them no problem. mmbackup on the > other hand in that time frame would simply die and backup nothing if > there was a single file on the file system with a newline in it. > > I would take a look at the mmbackup scripts which can handle such stuff > (least ways in >4.2) which would also suggest dsmc can handle it. > > As an aside I now think I know how you end up with newlines in file > names. Basically you cut and paste the file name complete with newlines > (most likely at the end) into a text field when saving the file. > Personally I think any program should baulk at that point but what do I > know. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Oct 9 10:09:22 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 9 Oct 2021 10:09:22 +0100 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: References: Message-ID: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From s.j.thompson at bham.ac.uk Mon Oct 11 09:35:31 2021 From: s.j.thompson at bham.ac.uk (Simon Thompson) Date: Mon, 11 Oct 2021 08:35:31 +0000 Subject: [gpfsug-discuss] Handling bad file names in policies? In-Reply-To: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> Message-ID: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Mon Oct 11 09:55:45 2021 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 11 Oct 2021 08:55:45 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: We've had this same issue with characters that are fine in Scale but Protect can't handle. Normally its because some script has embedded a newline in the middle of a file name, and normally we end up renaming that file by inode number find . -inum 9975226749 -exec mv {} badfilename \; mostly because we can't even type the filename at the command prompt. However its not always just new line characters currently we've got a few files with unprintable characters in it. but its normally less than 50 files every few months, so is easy to handle manually. I normally end up looking at /data/mmbackup.unsupported which is the standard output from mmapplypolicy and extracting the file names from it and emailing the users concerned to assist them in working out what went wrong. I guess you could automate the parsing of this file at the end of the backup process and do something interesting with it. Peter Childs ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Monday, October 11, 2021 9:35 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Handling bad file names in policies? CAUTION: This email originated from outside of QMUL. Do not click links or open attachments unless you recognise the sender and know the content is safe. We have both: WILDCARDSARELITERAL yes QUOTESARELITERAL yes Set. And use --noquote for mmbackup, the backup runs, but creates a file: /filesystem/mmbackup.unsupported.CLIENTNAME Which contains a list of files that are not backed up due to \n in the filename. So it doesn't break backup, but they don't get backed up either. I believe this is because the TSM client can't back the file up rather than mmbackup no longer allowing them. I had an RFE at some point to get dsmc changed ... but it got closed WONTFIX. Simon ?On 09/10/2021, 10:09, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 08/10/2021 19:14, Wahl, Edward wrote: > This goes back as far as I can recall to <=GPFS 3.5 days. And no, I > cannot recall what version of TSM-EE that was. But newline has been > the only stopping point, for what seems like forever. Having filed > many an mmbackup bug, I don't recall ever crashing on filenames. > (tons of OTHER reasons, but not character set) We even generate an > error report from this and email users to fix it. We accept basically > almost everything else, and I have to say, we see some really crazy > things sometimes. I think my current favorite is the full windows > paths as a filename. (eg: > "Y:\Temp\temp\290\work\0\Material_ERTi-5.in" ) > I will have to do a test but I am sure newlines have worked just fine in the past. At the very least they have not stopped an entire backup from working when using dsmc incr. Now mmbackup that's a different kettle of fish. If you have not seen mmbackup fail entirely because of a random "special" character you simply have not been using it long enough :-) For the longest of times I would simply not go anywhere near it because it was not fit for purpose. > > Current IBM documentation doesn't go backwards past 4.2 but it says: > > "For IBM Spectrum Scale? file systems with special characters > frequently used in the names of files or directories, backup failures > might occur. Known special characters that require special handling > include: *, ?, ", ?, carriage return, and the new line character. > > In such cases, enable the Tivoli Storage Manager client options > WILDCARDSARELITERAL and QUOTESARELITERAL on all nodes that are used > in backup activities and make sure that the mmbackup option --noquote > is used when invoking mmbackup." > > So maybe we could handle newlines somehow. But my lazy searches > didn't show what TSM doesn't accept. > We strongly advise our users (our GPFS file system is for an HPC system) in training not to use "special" characters. That is followed with a warning that if they do then we don't make any promises to backup their files :-) From time to time I run a dsmc incr in a screen and capture the output to a log file and then look at the list of failed files and prompt users to "fix" them. Though sometimes I just "fix" them myself if the correction is going to be obvious and then email them to tell them what has happened. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Mon Oct 11 11:47:49 2021 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 11 Oct 2021 11:47:49 +0100 Subject: [gpfsug-discuss] [EXTERNAL] Re: Handling bad file names in policies? In-Reply-To: References: <1c88a766-77e3-0ccb-8377-66df7f144003@strath.ac.uk> <82703BA1-6C53-4B18-828E-96EB2122F1E5@bham.ac.uk> Message-ID: <750aa707-8949-c416-c432-0b07cb8498f8@strath.ac.uk> On 11/10/2021 09:55, Peter Childs wrote> > We've had this same issue with characters that are fine in Scale but > Protect can't handle. Normally its because some script has embedded a > newline in the middle of a file name, and normally we end up renaming > that file by inode number > > find . -inum 9975226749 -exec mv {} badfilename \; > > mostly because we can't even type the filename at the command > prompt. > You can it just requires know how. I will freely admit it took me a long time to work out how to do it. The dirty alternative that sometimes works is to use wildcards. What gets me is I have never created a single file with "problem" characters in the filename in over 30 years of computing. Well apart from deliberately trying to work out how the hell you do it, and it's not easy. I think the most likely answer for newlines in file names is cut and paste into a file save dialogue box. > However its not always just new line characters currently we've got a > few files with unprintable characters in it. but its normally less > than 50 files every few months, so is easy to handle manually. Mostly I find the none newline issues are down to "foreigners" using something other than UTF-8 (aka random stupid Windows code pages) to give files names in their native language. You can usually work out what the filename is supposed to be once you know the nationality of the file owner. Again I think this happens due to cut and paste from text documents in none UTF-8 encodings. So for example take something Cyrillic in codepage 1251, copy and paste it into a file save dialogue box and end up with a filename containing unprintable characters. > I normally end up looking at /data/mmbackup.unsupported which is the > standard output from mmapplypolicy and extracting the file names from > it and emailing the users concerned to assist them in working out > what went wrong. > > I guess you could automate the parsing of this file at the end of the > backup process and do something interesting with it. > Email the owner of the file and tell them it's not being backed up and won't be till they "fix" the file name so that backup software can process it. If it is just a newline I would be tempted to have them automatically renamed sans the newline, and then send the file owner an email (per file) letting them know what has happened. If their inbox is spammed that will hopefully prompt them to stop doing it :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stuartb at 4gh.net Tue Oct 19 18:16:54 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Tue, 19 Oct 2021 13:16:54 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance Message-ID: Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From stockf at us.ibm.com Tue Oct 19 18:58:40 2021 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 19 Oct 2021 17:58:40 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 19 19:12:34 2021 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 19 Oct 2021 18:12:34 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: Message-ID: Not that it answers Stuart's questions in any way, but we gave up on the same problem on a similar setup, rescued an old fileserver off the scrapheap (RAID6 of 12 x 7.2k rpm SAS on a PERC H710P) and just served the reference data by nfs - good enough to keep the compute busy rather than in cxiWaitEventWait. If there's significant demand for Alphafold then somebody's arm will be twisted for a new server with some NVMe. If I remember right, the reference data is ~2.3TB, ruling out our usual approach of just reading the problematic files into a ramdisk first. We are also interested in hearing how it might be usably served from GPFS. Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287873 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Stuart Barkley [stuartb at 4gh.net] Sent: 19 October 2021 18:16 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] alphafold and mmap performance Over the years there have been several discussions about performance problems with mmap() on GPFS/Spectrum Scale. We are currently having problems with mmap() performance on our systems with new alphafold protein folding software. Things look similar to previous times we have had mmap() problems. The software component "hhblits" appears to mmap a large file with genomic data and then does random reads throughout the file. GPFS appears to be doing 4K reads for each block limiting the performance. The first run takes 20+ hours to run. Subsequent identical runs complete in just 1-2 hours. After clearing the linux system cache (echo 3 > /proc/sys/vm/drop_caches) the slow performance returns for the next run. GPFS Server is 4.2.3-5 running on DDN hardware. CentOS 7.3 Default GPFS Client is 4.2.3-22. CentOS 7.9 We have tried a number of things including Spectrum Scale client version 5.0.5-9 which should have Sven's recent mmap performance improvements. Are the recent mmap performance improvements in the client code or the server code? Only now do I notice a suggestion: mmchconfig prefetchAggressivenessRead=0 -i I did not use this. Would a performance change be expected? Would the pagepool size be involved in this? Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Tue Oct 19 21:27:39 2021 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 19 Oct 2021 20:27:39 +0000 Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Thu Oct 21 00:19:51 2021 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 20 Oct 2021 19:19:51 -0400 (EDT) Subject: [gpfsug-discuss] alphafold and mmap performance In-Reply-To: References: , Message-ID: Thanks Olaf, Jon and Fred. Some more details below. We may just need to wait on things to evolve (us getting Spectrum Scale 5 installed, alphafold getting HPC specific improvements). It will also be driven by whether our users have a real need for alphafold or are just enthusiastic due to the press releases. On Tue, 19 Oct 2021 at 16:27 -0000, Olaf Weiser wrote: > > [...] We have tried a number of things including Spectrum Scale > > client version 5.0.5-9[...] > in the client code or the server code? Our main client code is 4.2.3-22 but I'm trying 5.0.5-9 on a test client. The server code is (very old) 4.2.3-5. > there are going? multiple improvements in the code.. continuously... > Since your version 4.2.3 /? 5.0.5 a lot of them are in the area of > NSD server/GNR (which is server based) and also a lot of > enhancements went into the client part. Some are on both .. such as > RoCE, or using multiple TCP/IP sockets per communication pair, > etc.... All this influences your performance.. Thanks for the information. Some of this sounds good. We had upgrade issues with DDN but we now have a license for Spectrum Scale 5. Its now mostly getting enough cycles to do the update. > But Id like to try to give you some answers to? your specific Q - > > Only now do I notice a suggestion: > > ?? ?mmchconfig prefetchAggressivenessRead=0 -i > > I did not use this. ?Would a performance change be expected? > YES;-)? .. this parameter should really help.. I'm trying this now with the 5.0 client. Initial indications are that there may be about 50% performance improvement but that is still significantly lower than we would hope. Using "mmdiag --iohist" we were seeing 750-900 8 sector reads per second. With prefetchAggressivenessRead=0 it looks the 8 sector reads seem about as frequent but there are often (5-10/second) reads of 100-2000 sectors in the mix. A rough estimate is the large reads are for about the same amount of data as the 8 sector reads. The number of large sector reads seem to be decreasing over time. I don't know the specifics of the algorithm but I image there is a lot of jumping around in the data. The early large reads may have brought in the more common regions and now it is filling the less dense regions. Just a thought. > from the UG expert talk 2020 we shared some numbers/charts on it > https://www.spectrumscaleug.org/event/ssugdigital-spectrum-scale-expert-talks-update-on-per > formance-enhancements-in-spectrum-scale/ starting ~ 8:30 minutes / > just 2 slides? ... let us know, if you need more information Yes, I had looked at the slides but not listened to the talk which was a mistake. There were some other interesting tidbits. In particular if we can get this to work we may try a scheduler prolog/epilog to change the parameter. We can look at that after our move from Grid Engine to Slurm which requires other cycles. On Tue, 19 Oct 2021 at 14:12 -0000, Jon Diprose wrote: > If I remember right, the reference data is ~2.3TB, ruling out our > usual approach of just reading the problematic files into a ramdisk > first. We found the critical file is about 1.5TB and we are able to load that into ramdisk on a 2TB system (but it doesn't have any GPUs). We also have some old "spare" hardware that might be built as an NFS appliance for this purpose. I would prefer to see the ~10 year old hardware die. The alphafold application is one large monolith. The first phase does some large I/O and CPU intensive operations. The second phase does some GPU operations. We would prefer to separate the non-GPU code from the GPU code so we could have the GPU systems doing GPU stuff. We do this quite effectively with some of our other GPU code with CPU based pre/post processing. Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone