[gpfsug-discuss] Placement Policy Installation and RDM Considerations

Luke Raimbach Luke.Raimbach at crick.ac.uk
Thu Jun 18 13:30:40 BST 2015


Hi All,

Something I am thinking about doing is utilising the placement policy engine to insert custom metadata tags upon file creation, based on which fileset the creation occurs in. This might be to facilitate Research Data Management tasks that could happen later in the data lifecycle.

I am also thinking about allowing users to specify additional custom metadata tags (maybe through a fancy web interface) and also potentially give users control over creating new filesets (e.g. for scientists running new experiments). So… pretend this is a placement policy on my GPFS driven data-ingest platform:


RULE 'RDMTEST'

     SET POOL 'instruments’

     FOR FILESET

('%GPFSRDM%10.01013%RDM%0ab34906-5357-4ca0-9d19-a470943db30a%RDM%8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')

     WHERE SetXattr

('user.rdm.parent','0ab34906-5357-4ca0-9d19-a470943db30a')

     AND SetXattr

               ('user.rdm.ingestor','8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')

RULE 'DEFAULT' SET POOL 'data'

The fileset name can be meaningless (as far as the user is concerned), but would be linked somewhere nice that they recognise – say /gpfs/incoming/instrument1. The fileset, when it is created, would also be an AFM cache for its ‘home’ counterpart which exists on a much larger (also GPFS driven) pool of storage… so that my metadata tags are preserved, you see.

This potentially user driven activity might look a bit like this:


-        User logs in to web interface and creates new experiment

-        Filesets (system-generated names) are created on ‘home’ and ‘ingest’ file systems and linked into the directory namespace wherever the user specifies

-        AFM relationships are set up and established for the ingest (cache) fileset to write back to the AFM home fileset (probably Independent Writer mode)

-        A set of ‘default’ policies are defined and installed on the cache file system to tag data for that experiment (the user can’t change these)

-        The user now specifies additional metadata tags they want added to their experiment data (some of this might be captured through additional mandatory fields in the web form for instance)

-        A policy for later execution by mmapplypolicy on the AFM home file system is created which looks for the tags generated at ingest-time and applies the extra user-defined tags

There’s much more that would go on later in the lifecycle to take care of automated HSM tiering, data publishing, movement and cataloguing of data onto external non GPFS file systems, etc. but I won’t go in to it here. My GPFS related questions are:

When I install a placement policy into the file system, does the file system need to quiesce? My suspicion is yes, because the policy needs to be consistent on all nodes performing I/O, but I may be wrong.

What is the specific limitation for having a policy placement file no larger than 1MB?

Cheers,
Luke.

Luke Raimbach​
Senior HPC Data and Storage Systems Engineer
The Francis Crick Institute
Gibbs Building
215 Euston Road
London NW1 2BE

E: luke.raimbach at crick.ac.uk<mailto:luke.raimbach at crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk/>


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150618/db89b50b/attachment-0002.htm>


More information about the gpfsug-discuss mailing list