[gpfsug-discuss] Special characters in filenames

Alec anacreo at gmail.com
Fri Jul 7 21:26:09 BST 2023


Building off of this you could use the file placement engine to give a bad
pool name for files that have bad names... Then new files couldn't be
created with bad names.  However files can still be renamed so you need a
policy to deal with those cases.

Alec

On Fri, Jul 7, 2023, 11:11 AM Wayne Sawdon <wsawdon at us.ibm.com> wrote:

>
>
> The policy code uses a more or less standard linux regexp library, so your
> regular expressions used for grep should work. The catch is the policy file
> is preprocessed with M4 which makes writing regexs a bit tricky. I grabbed
> a comment from the code:
>
>
>
>
>
>        The policy SQL parser normally does M4 macros processing with [ ]
> set as the quote characters.
>
>        SOOOO…. We highly recommend you add an extra set of [ ] around your
> REGEX pattern string like this:
>
>
>
>        . . . WHERE REGEX(name, [‘^[a-z]*$’])
>
>
>
>        To only match lowercase alphabetic names.
>
>
>
>
>
> Once you’ve gotten past M4, you can either match for not good characters
> or directly for bad characters
>
>
>
> REGEX(. FILENAME, [‘[^a-zA-Z0-9\_\-\.]’]  )  ### match when you find a
> character not in the good set
>
> REGEX(. FILENAME, [‘[\n\*\\]’]  ).                   ### match when you
> find a bad character
>
>
>
> I am not sure which is more difficult to enumerate.
>
>
>
> The ESCAPE clause described by Olaf is the trick we use to pass file names
> with bad characters through the surrounding scripts (like mmbackup, mmxcp,
> etc). There is code in samples/ilm that show how to use it.
>
>
>
>
>
> -Wayne
>
>
>
>
>
> *From: *gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of
> Olaf Weiser <olaf.weiser at de.ibm.com>
> *Date: *Wednesday, July 5, 2023 at 7:06 PM
> *To: *gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] Special characters in filenames
>
> Hallo Jonathan, I haven't used it for a while, but I can remember a
> customer, where we masked "all" special characters with ESCAPE In fact, as
> far as I remember. . this was an iterative progress .. . 😉 😉 You're
> right, the doc's are
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an External Sender *
>
> This message came from outside your organization.
>
>   *  Report Suspicious  *
> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1e-vr57TRvm6FYv7eAEkoFZ5-fuixwOfksdMyYJ2Yw9UHwuf23wcNAn2q-2zAW_rt-pXEwiWUEgZYE59IM6oXjeF6R9iCOapflooMaGkIunnVczfBcG0YBhSB07msMGJqVJ3cuRnSrg$>  ‌
>
>
> ZjQcmQRYFpfptBannerEnd
>
> Hallo Jonathan,
>
>
>
> I haven't used it for a while, but I can remember a customer, where we
> masked "all"  special characters  with *ESCAPE*
>
> In fact, as far as I remember.. this was an iterative progress ... 😉 😉
>
>
>
> You're right, the doc's are not really self-explaining here..
>
>
>
> from my personal notes I found a litte better example:
>
>
>
> In GPFS 3.5 we introduce an (optional) ESCAPE clause to the EXTERNAL LIST
> and EXTERNAL POOL rules, which allow the user-administrator to specify that
> path names and SHOW(strings) within the associated file lists are encoded
> using an encoding based on the RFC3986 URI-percent-encoding scheme. For
> example:
>
> RULE 'xp' EXTERNAL POOL 'pool-name' EXEC 'script-name' ESCAPE '%'
>
> RULE 'xl' EXTERNAL LIST 'list-name' EXEC 'script-name' ESCAPE '%/+@#'
>
>
>
> ESCAPE '%' specifies that all characters except the "unreserved"
> characters in the set a-zA-Z0-9-_.~ are encoded as %XX where XX comprises 2
> hexadecimal digits. The GPFS ESCAPE clause allows you to add to the set of
> "unreserved" characters.
>
> For example, ESCAPE '%/+@#', specifies that none of the characters in
> "/+@#" are escaped, so that a path name like
> "/root/directory/@abc+def#ghi.jkl" will appear in a file list with no
> escape sequences, whereas under ESCAPE '%', specifying a rigorous RFC3986
> encoding yields "%2Froot%2Fdirectory%2F%40abc%2Bdef%23ghi.jkl".
>
>
>
> at least for us, it was doing the trick (back then) by using ESCAPE..
>
> Maybe it is useful for your case here as well
>
>
>
> cheers
>
> laff
>
>
> ------------------------------
>
> *Von:* gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> im Auftrag von
> Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
> *Gesendet:* Donnerstag, 6. Juli 2023 00:20
> *An:* gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> *Betreff:* [EXTERNAL] [gpfsug-discuss] Special characters in filenames
>
>
>
>
> After another support incident that eventually transpired to be down to
> the user using what I will call stupid characters in their filenames (we
> include a section on not doing this in our mandatory training so no
> excuse) I have been musing on using the policy engine to periodically
> produce lists of files that have stupid characters in their filenames so
> we can proactively educate the users and get them to rename their files
> to something sensible :-)
>
> The issue is of course the stupid characters include all the regular
> expression wildcard characters in addition to \n, \r and backticks. I am
> coming up short on escaping them correctly in REGEX() for the policy
> engine.
>
> The documentation appears to be devoid of help on the subject, because
> of course only an fool would be including these characters in their
> filenames...
>
> Anyone any idea on how to do this?
>
>
> JAB.
>
> --
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20230707/95f0f31f/attachment.htm>


More information about the gpfsug-discuss mailing list