[gpfsug-discuss] Special characters in filenames

Wayne Sawdon wsawdon at us.ibm.com
Fri Jul 7 18:55:07 BST 2023


The policy code uses a more or less standard linux regexp library, so your regular expressions used for grep should work. The catch is the policy file is preprocessed with M4 which makes writing regexs a bit tricky. I grabbed a comment from the code:



  Note: The policy SQL parser normally does M4 macro preprocessing with [ ] set as the quote characters.

       SOOOO.... we highly recommend you add an extra set of [ ] around your REGEX pattern string.

       Like this:



       ... WHERE REGEX(name,['^[a-z]*$'])

       only accept lowercase alphabetic file names

Once you’ve gotten past M4, you can either match for not good characters or directly for bad characters

REGEX(. FILENAME, [‘[^a-zA-Z0-9\_\-\.]’]  )  ### match when you find a character not in the good set
REGEX(. FILENAME, [‘[\n\*\\]’]  ).                   ### match when you find a bad character

I am not sure which is more difficult to enumerate.

The ESCAPE clause described by Olaf is the trick we use to pass file names with bad characters through the surrounding scripts (like mmbackup, mmxcp, etc). There is code in samples/ilm that show how to use it.


-Wayne




From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Date: Wednesday, July 5, 2023 at 7:06 PM
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Special characters in filenames
Hallo Jonathan, I haven't used it for a while, but I can remember a customer, where we masked "all" special characters with ESCAPE In fact, as far as I remember. . this was an iterative progress .. . 😉 😉 You're right, the doc's are
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
    Report Suspicious  <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1e-vr57TRvm6FYv7eAEkoFZ5-fuixwOfksdMyYJ2Yw9UHwuf23wcNAn2q-2zAW_rt-pXEwiWUEgZYE59IM6oXjeF6R9iCOapflooMaGkIunnVczfBcG0YBhSB07msMGJqVJ3cuRnSrg$>   ‌
ZjQcmQRYFpfptBannerEnd
Hallo Jonathan,

I haven't used it for a while, but I can remember a customer, where we  masked "all"  special characters  with ESCAPE
In fact, as far as I remember.. this was an iterative progress ... 😉 😉

You're right, the doc's are not really self-explaining here..

from my personal notes I found a litte better example:


In GPFS 3.5 we introduce an (optional) ESCAPE clause to the EXTERNAL LIST and EXTERNAL POOL rules, which allow the user-administrator to specify that path names and SHOW(strings) within the associated file lists are encoded using an encoding based on the RFC3986 URI-percent-encoding scheme. For example:

RULE 'xp' EXTERNAL POOL 'pool-name' EXEC 'script-name' ESCAPE '%'

RULE 'xl' EXTERNAL LIST 'list-name' EXEC 'script-name' ESCAPE '%/+@#'


ESCAPE '%' specifies that all characters except the "unreserved" characters in the set a-zA-Z0-9-_.~ are encoded as %XX where XX comprises 2 hexadecimal digits. The GPFS ESCAPE clause allows you to add to the set of "unreserved" characters.

For example, ESCAPE '%/+@#', specifies that none of the characters in "/+@#" are escaped, so that a path name like "/root/directory/@abc+def#ghi.jkl" will appear in a file list with no escape sequences, whereas under ESCAPE '%', specifying a rigorous RFC3986 encoding yields "%2Froot%2Fdirectory%2F%40abc%2Bdef%23ghi.jkl".



at least for us, it was doing the trick (back then) by using ESCAPE..

Maybe it is useful for your case here as well



cheers

laff

________________________________
Von: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> im Auftrag von Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
Gesendet: Donnerstag, 6. Juli 2023 00:20
An: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Betreff: [EXTERNAL] [gpfsug-discuss] Special characters in filenames


After another support incident that eventually transpired to be down to
the user using what I will call stupid characters in their filenames (we
include a section on not doing this in our mandatory training so no
excuse) I have been musing on using the policy engine to periodically
produce lists of files that have stupid characters in their filenames so
we can proactively educate the users and get them to rename their files
to something sensible :-)

The issue is of course the stupid characters include all the regular
expression wildcard characters in addition to \n, \r and backticks. I am
coming up short on escaping them correctly in REGEX() for the policy engine.

The documentation appears to be devoid of help on the subject, because
of course only an fool would be including these characters in their
filenames...

Anyone any idea on how to do this?


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20230707/64a2d0dc/attachment-0001.htm>


More information about the gpfsug-discuss mailing list