[gpfsug-discuss] Couple of questions related to storage pools and mmapplypolicy

Bryan Banister bbanister at jumptrading.com
Wed Dec 19 19:13:16 GMT 2018


Hadn’t seen a response, but here’s one thing that might make your decision easier on this question:
“But since ALL of the files in the capacity pool haven’t even been looked at in at least 90 days already, does it really matter?  I.e. should I just add the NSDs to the capacity pool and be done with it?”

Does the performance matter for accessing files in this capacity pool?

If not, then just add it in.

If it does, then you’ll need to concern yourself with the performance you’ll get from the NSDs that still have free space to store new data once the smaller NSDs become full.  If that’s enough then just add it in.  Old data will still be spread across the current storage in the capacity pool, so you’ll get current read performance rates for that data.

By creating a new pool, oc, and then migrating data that hasn’t been accessed in over 1 year to it from the capacity pool, you’re freeing up new space to store new data on the capacity pool.  This seems to really only be a benefit if the performance of the capacity pool is a lot greater than the oc pool and your users need that performance to satisfy their application workloads.

Of course moving data around on a regular basis also has an impact to overall performance during these operations too, but maybe there are times when the system is idle and these operations will not really cause any performance heartburn.

I think Marc will have to answer your other question… ;o)

Hope that helps!
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L
Sent: Monday, December 17, 2018 4:02 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Couple of questions related to storage pools and mmapplypolicy

[EXTERNAL EMAIL]
Hi All,

As those of you who suffered thru my talk at SC18 already know, we’re really short on space on one of our GPFS filesystems as the output of mmdf piped to grep pool shows:

Disks in storage pool: system (Maximum disk size allowed is 24 TB)
(pool total)           4.318T                                1.078T ( 25%)        79.47G ( 2%)
Disks in storage pool: data (Maximum disk size allowed is 262 TB)
(pool total)           494.7T                                38.15T (  8%)        4.136T ( 1%)
Disks in storage pool: capacity (Maximum disk size allowed is 519 TB)
(pool total)           640.2T                                14.56T (  2%)        716.4G ( 0%)

The system pool is metadata only.  The data pool is the default pool.  The capacity pool is where files with an atime (yes, atime) > 90 days get migrated.  The capacity pool is comprised of NSDs that are 8+2P RAID 6 LUNs of 8 TB drives, so roughly 58.2 TB usable space per NSD.

We have the new storage we purchased, but that’s still being tested and held in reserve for after the first of the year when we create a new GPFS 5 formatted filesystem and start migrating everything to the new filesystem.

In the meantime, we have also purchased a 60-bay JBOD and 30 x 12 TB drives and will be hooking it up to one of our existing storage arrays on Wednesday.  My plan is to create another 3 8+2P RAID 6 LUNs and present those to GPFS as NSDs.  They will be about 88 TB usable space each (because … beginning rant … a 12 TB drive is < 11 TB is size … and don’t get me started on so-called “4K” TV’s … end rant).

A very wise man who used to work at IBM but now hangs out with people in red polos (<grin>) once told me that it’s OK to mix NSDs of slightly different sizes in the same pool, but you don’t want to put NSDs of vastly different sizes in the same pool because the smaller ones will fill first and then the larger ones will have to take all the I/O.  I consider 58 TB and 88 TB to be pretty significantly different and am therefore planning on creating yet another pool called “oc” (over capacity if a user asks, old crap internally!) and migrating files with an atime greater than, say, 1 year to that pool.  But since ALL of the files in the capacity pool haven’t even been looked at in at least 90 days already, does it really matter?  I.e. should I just add the NSDs to the capacity pool and be done with it?

If it’s a good idea to create another pool, then I have a question about mmapplypolicy and migrations.  I believe I understand how things work, but after spending over an hour looking at the documentation I cannot find anything that explicitly confirms my understanding … so if I have another pool called oc that’s ~264 TB in size and I write a policy file that looks like:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))

RULE 'ReallyOldStuff'
  MIGRATE FROM POOL 'capacity'
  TO POOL 'oc'
  LIMIT(98)
  SIZE(KB_ALLOCATED/NLINK)
  WHERE ((access_age > 365) AND (KB_ALLOCATED > 3584))

RULE 'OldStuff'
  MIGRATE FROM POOL 'data'
  TO POOL 'capacity'
  LIMIT(98)
  SIZE(KB_ALLOCATED/NLINK)
  WHERE ((access_age > 90) AND (KB_ALLOCATED > 3584))

Keeping in mind that my capacity pool is already 98% full, is mmapplypolicy smart enough to calculate how much space it’s going to free up in the capacity pool by the “ReallyOldStuff” rule and therefore be able to potentially also move a ton of stuff from the data pool to the capacity pool via the 2nd rule with just one invocation of mmapplypolicy?  That’s what I expect that it will do.  I’m hoping I don’t have to run the mmapplypolicy twice … the first to move stuff from capacity to oc and then a second time for it to realize, oh, I’ve got a much of space free in the capacity pool now.

Thanks in advance...

Kevin

P.S.  In case you’re scratching your head over the fact that we have files that people haven’t even looked at for months and months (more than a year in some cases) sitting out there … we sell quota in 1 TB increments … once they’ve bought the quota, it’s theirs.  As long as they’re paying us the monthly fee if they want to keep files relating to research they did during the George Bush Presidency out there … and I mean Bush 41, not Bush 43 ….then that’s their choice.  We do not purge files.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633




________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company’s treatment of personal data, please email datarequests at jumptrading.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181219/00794ef9/attachment-0001.htm>


More information about the gpfsug-discuss mailing list