[gpfsug-discuss] Blocksize - file size distribution
Edward Wahl
ewahl at osc.edu
Wed Sep 28 21:18:55 BST 2016
On Wed, 28 Sep 2016 10:34:05 -0400
Marc A Kaplan <makaplan at us.ibm.com> wrote:
> Consider using samples/ilm/mmfind (or mmapplypolicy with a LIST ...
> SHOW rule) to gather the stats much faster. Should be minutes, not
> hours.
>
I'll agree with the policy engine. Runs like a beast if you tune it a
little for nodes and threads.
Only takes a couple of minutes to collect info on over a hundred
million files. Show where the data is now by pool and sort it by age
with queries? quick hack up example. you could sort the mess on the
front end fairly quickly. (use fileset or pool, etc as your storage
needs)
RULE '2yrold_files' LIST '2yrold_filelist.txt'
SHOW (varchar(file_size) || ' ' || varchar(USER_ID) || ' ' || varchar(POOL_NAME))
WHERE DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) >= 730 AND DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) < 1095
don't forget to run the engine with the -I defer for this kind of
list/show policy.
Ed
--
Ed Wahl
Ohio Supercomputer Center
614-292-9302
More information about the gpfsug-discuss
mailing list