[gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas?

Marc A Kaplan makaplan at us.ibm.com
Fri Aug 17 12:59:56 BST 2018


My idea, not completely thought out, is that before you hit the 1000 
limit, you start putting new customers or projects into dependent filesets 
and define those new dependent filesets within either a lesser number of 
independent filesets expressly created to receive the new customers OR 
perhaps even lump them with already existing independent filesets that 
have matching backup requirements.

I would NOT try to create backups for each dependent fileset.  But stick 
with the supported facilities to manage backup per independent...

Having said that, if you'd still like to do backup per dependent fileset 
-- then have at it -- but test, test and retest.... And measure 
performance...
My GUESS is that IF you can hack mmbackup or similar to use  mmapplypolicy 
/path-to-dependent-fileset  --scope fileset .... 
instead of mmapplypolicy /path-to-independent-fileset --scope inodespace 
....

You'll be okay because the inodescan where you end up reading some extra 
inodes is probably a tiny fraction of all the other IO you'll be doing! 

BUT I don't think IBM is in a position to encourage you to hack mmbackup 
-- it's already very complicated!





From:   "Peinkofer, Stephan" <Stephan.Peinkofer at lrz.de>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   08/17/2018 07:40 AM
Subject:        Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs 
Quotas?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Dear Marc,

well as I think I cannot simply "move" dependent filesets between 
independent ones and our customers must have the opportunity to change 
data protection policy for their Containers at any given time, I cannot 
map them to a "backed up" or "not backed up" independent fileset.

So how much performance impact is lets say 1-10 exclude.dir directives per 
independent fileset?

Many thanks in advance.
Best Regards,
Stephan Peinkofer



From: gpfsug-discuss-bounces at spectrumscale.org 
<gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan 
<makaplan at us.ibm.com>
Sent: Tuesday, August 14, 2018 5:31 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? 
 
True, mmbackup is designed to work best backing up either a single 
independent fileset or the entire file system.  So if you know some 
filesets do not need to be backed up, map them to one or more indepedent 
filesets that will not be backed up.    

mmapplypolicy is happy to scan a single dependent fileset, use option 
--scope fileset and make the primary argument the path to the root of the 
fileset you wish to scan.   The overhead is not simply described.   The 
directory scan phase will explore or walk the (sub)tree in parallel with 
multiple threads on multiple nodes, reading just the directory blocks that 
need to be read.

The inodescan phase will read blocks of inodes from the given inodespace 
...  since the inodes of dependent filesets may be "mixed" into the same 
blocks as other dependend filesets that are in the same independent 
fileset, mmapplypolicy will incur what you might consider "extra" 
overhead.




From:        "Peinkofer, Stephan" <Stephan.Peinkofer at lrz.de>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        08/14/2018 12:50 AM
Subject:        Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs 
Quotas?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Dear Marc,


If you "must" exceed 1000 filesets because you are assigning each project 
to its own fileset, my suggestion is this:

Yes, there are scaling/performance/manageability benefits to using 
mmbackup over independent filesets.

But maybe you don't need 10,000 independent filesets -- 
maybe you can hash or otherwise randomly assign projects that each have 
their own (dependent) fileset name to a lesser number of independent 
filesets that will serve as management groups for (mm)backup...

OK, if that might be doable, whats then the performance impact of having 
to specify Include/Exclude lists for each independent fileset in order to 
specify which dependent fileset should be backed up and which one not?
I don’t remember exactly, but I think I’ve heard at some time, that 
Include/Exclude and mmbackup have to be used with caution. And the same 
question holds true for running mmapplypolicy for a “job” on a single 
dependent fileset? Is the scan runtime linear to the size of the 
underlying independent fileset or are there some optimisations when I just 
want to scan a subfolder/dependent fileset of an independent one?

Like many things in life, sometimes compromises are necessary!

Hmm, can I reference this next time, when we negotiate Scale License 
pricing with the ISS sales people? ;)

Best Regards,
Stephan Peinkofer
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180817/51b6475e/attachment-0002.htm>


More information about the gpfsug-discuss mailing list