[gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?
Alex Chekholko
chekh at stanford.edu
Mon Apr 17 19:49:12 BST 2017
Hi Kevin,
IMHO, safe to just run it again.
You can also run it with '-I test -L 6' again and look through the
output. But I don't think you can "break" anything by having it scan
and/or move data.
Can you post the full command line that you use to run it?
The behavior you describe is odd; you say it prints out the "files
migrated successfully" message, but the files didn't actually get
migrated? Turn up the debug param and have it print every file as it is
moving it or something.
Regards,
Alex
On 4/17/17 8:24 AM, Buterbaugh, Kevin L wrote:
> Hi Marc,
>
> I do understand what you’re saying about mmapplypolicy deciding it only
> needed to move ~1.8 million files to fill the capacity pool to ~98%
> full. However, it is now more than 24 hours since the mmapplypolicy
> finished “successfully” and:
>
> Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
> eon35Ansd 58.2T 35 No Yes 29.66T (
> 51%) 64.16G ( 0%)
> eon35Dnsd 58.2T 35 No Yes 29.66T (
> 51%) 64.61G ( 0%)
> -------------
> -------------------- -------------------
> (pool total) 116.4T 59.33T (
> 51%) 128.8G ( 0%)
>
> And yes, I did run the mmapplypolicy with “-I yes” … here’s the
> partially redacted command line:
>
> /usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
> another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy
> -N some,list,of,NSD,server,nodes
>
> And here’s that policy file:
>
> define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
> define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))
>
> RULE 'OldStuff'
> MIGRATE FROM POOL 'gpfs23data'
> TO POOL 'gpfs23capacity'
> LIMIT(98)
> WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))
>
> RULE 'INeedThatAfterAll'
> MIGRATE FROM POOL 'gpfs23capacity'
> TO POOL 'gpfs23data'
> LIMIT(75)
> WHERE (access_age < 14)
>
> The one thing that has changed is that formerly I only ran the migration
> in one direction at a time … i.e. I used to have those two rules in two
> separate files and would run an mmapplypolicy using the OldStuff rule
> the 1st weekend of the month and run the other rule the other weekends
> of the month. This is the 1st weekend that I attempted to run an
> mmapplypolicy that did both at the same time. Did I mess something up
> with that?
>
> I have not run it again yet because we also run migrations on the other
> filesystem that we are still in the process of migrating off of. So
> gpfs23 goes 1st and as soon as it’s done the other filesystem migration
> kicks off. I don’t like to run two migrations simultaneously if at all
> possible. The 2nd migration ran until this morning, when it was
> unfortunately terminated by a network switch crash that has also had me
> tied up all morning until now. :-(
>
> And yes, there is something else going on … well, was going on - the
> network switch crash killed this too … I have been running an rsync on
> one particular ~80TB directory tree from the old filesystem to gpfs23.
> I understand that the migration wouldn’t know about those files and
> that’s fine … I just don’t understand why mmapplypolicy said it was
> going to fill the capacity pool to 98% but didn’t do it … wait,
> mmapplypolicy hasn’t gone into politics, has it?!? ;-)
>
> Thanks - and again, if I should open a PMR for this please let me know...
>
> Kevin
>
>> On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <makaplan at us.ibm.com
>> <mailto:makaplan at us.ibm.com>> wrote:
>>
>> Let's look at how mmapplypolicy does the reckoning.
>> Before it starts, it see your pools as:
>>
>> [I] GPFS Current Data Pool Utilization in KB and %
>> Pool_Name KB_Occupied KB_Total Percent_Occupied
>> gpfs23capacity 55365193728 124983549952 44.297984614%
>> gpfs23data 166747037696 343753326592 48.507759721%
>> system 0 0
>> 0.000000000% (no user data)
>> [I] 75142046 of 209715200 inodes used: 35.830520%.
>>
>> Your rule says you want to migrate data to gpfs23capacity, up to 98% full:
>>
>> RULE 'OldStuff'
>> MIGRATE FROM POOL 'gpfs23data'
>> TO POOL 'gpfs23capacity'
>> LIMIT(98) WHERE ...
>>
>> We scan your files and find and reckon...
>> [I] Summary of Rule Applicability and File Choices:
>> Rule# Hit_Cnt KB_Hit Chosen KB_Chosen
>> KB_Ill Rule
>> 0 5255960 237675081344 1868858 67355430720
>> 0 RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO
>> POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
>>
>> So yes, 5.25Million files match the rule, but the utility chooses
>> 1.868Million files that add up to 67,355GB and figures that if it
>> migrates those to gpfs23capacity,
>> (and also figuring the other migrations by your second rule)then
>> gpfs23 will end up 97.9999% full.
>> We show you that with our "predictions" message.
>>
>> Predicted Data Pool Utilization in KB and %:
>> Pool_Name KB_Occupied KB_Total Percent_Occupied
>> gpfs23capacity 122483878944 124983549952 97.999999993%
>> gpfs23data 104742360032 343753326592 30.470209865%
>>
>> So that's why it chooses to migrate "only" 67GB....
>>
>> See? Makes sense to me.
>>
>> Questions:
>> Did you run with -I yes or -I defer ?
>>
>> Were some of the files illreplicated or illplaced?
>>
>> Did you give the cluster-wide space reckoning protocols time to see
>> the changes? mmdf is usually "behind" by some non-neglible amount of
>> time.
>>
>> What else is going on?
>> If you're moving or deleting or creating data by other means while
>> mmapplypolicy is running -- it doesn't "know" about that!
>>
>> Run it again!
>>
>> <ATT00001.gif>
>>
>>
>>
>> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu
>> <mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
>> To: gpfsug main discussion list
>> <gpfsug-discuss at spectrumscale.org
>> <mailto:gpfsug-discuss at spectrumscale.org>>
>> Date: 04/16/2017 09:47 AM
>> Subject: [gpfsug-discuss] mmapplypolicy didn't migrate
>> everything it should have - why not?
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>> ------------------------------------------------------------------------
>>
>>
>>
>> Hi All,
>>
>> First off, I can open a PMR for this if I need to. Second, I am far
>> from an mmapplypolicy guru. With that out of the way … I have an
>> mmapplypolicy job that didn’t migrate anywhere close to what it could
>> / should have. From the log file I have it create, here is the part
>> where it shows the policies I told it to invoke:
>>
>> [I] Qos 'maintenance' configured as inf
>> [I] GPFS Current Data Pool Utilization in KB and %
>> Pool_Name KB_Occupied KB_Total Percent_Occupied
>> gpfs23capacity 55365193728 124983549952 44.297984614%
>> gpfs23data 166747037696 343753326592 48.507759721%
>> system 0 0
>> 0.000000000% (no user data)
>> [I] 75142046 of 209715200 inodes used: 35.830520%.
>> [I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
>> Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
>> Parsed 2 policy rules.
>>
>> RULE 'OldStuff'
>> MIGRATE FROM POOL 'gpfs23data'
>> TO POOL 'gpfs23capacity'
>> LIMIT(98)
>> WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND
>> (KB_ALLOCATED > 3584))
>>
>> RULE 'INeedThatAfterAll'
>> MIGRATE FROM POOL 'gpfs23capacity'
>> TO POOL 'gpfs23data'
>> LIMIT(75)
>> WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)
>>
>> And then the log shows it scanning all the directories and then says,
>> "OK, here’s what I’m going to do":
>>
>> [I] Summary of Rule Applicability and File Choices:
>> Rule# Hit_Cnt KB_Hit Chosen KB_Chosen
>> KB_Ill Rule
>> 0 5255960 237675081344 1868858 67355430720
>> 0 RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO
>> POOL 'gpfs23capacity' LIMIT(98.000000) WHERE(.)
>> 1 611 236745504 611 236745504
>> 0 RULE 'INeedThatAfterAll' MIGRATE FROM POOL
>> 'gpfs23capacity' TO POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)
>>
>> [I] Filesystem objects with no applicable rules: 414911602.
>>
>> [I] GPFS Policy Decisions and File Choice Totals:
>> Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
>> Predicted Data Pool Utilization in KB and %:
>> Pool_Name KB_Occupied KB_Total Percent_Occupied
>> gpfs23capacity 122483878944 124983549952 97.999999993%
>> gpfs23data 104742360032 343753326592 30.470209865%
>> system 0 0
>> 0.000000000% (no user data)
>>
>> Notice that it says it’s only going to migrate less than 2 million of
>> the 5.25 million candidate files!! And sure enough, that’s all it did:
>>
>> [I] A total of 1869469 files have been migrated, deleted or processed
>> by an EXTERNAL EXEC/script;
>> 0 'skipped' files and/or errors.
>>
>> And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere
>> near 98% full:
>>
>> Disks in storage pool: gpfs23capacity (Maximum disk size allowed is
>> 519 TB)
>> eon35Ansd 58.2T 35 No Yes 29.54T (
>> 51%) 63.93G ( 0%)
>> eon35Dnsd 58.2T 35 No Yes 29.54T (
>> 51%) 64.39G ( 0%)
>> -------------
>> -------------------- -------------------
>> (pool total) 116.4T 59.08T (
>> 51%) 128.3G ( 0%)
>>
>> I don’t understand why it only migrated a small subset of what it
>> could / should have?
>>
>> We are doing a migration from one filesystem (gpfs21) to gpfs23 and I
>> really need to stuff my gpfs23capacity pool as full of data as I can
>> to keep the migration going. Any ideas anyone? Thanks in advance…
>>
>> —
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and
>> Education
>> _Kevin.Buterbaugh at vanderbilt.edu_
>> <mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633
More information about the gpfsug-discuss
mailing list