[gpfsug-discuss] mmfind performance

Buterbaugh, Kevin L Kevin.Buterbaugh at Vanderbilt.Edu
Wed Mar 7 15:18:24 GMT 2018


Hi Marc,

Thanks, I’m going to give this a try as the first mmfind finally finished overnight, but produced no output:

/root
root at gpfsmgrb# bash -x ~/bin/klb.sh
+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 -o -inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 137939675 -o -inum 137997971 -o -inum 138013736 -o -inum 138029061 -o -inum 138029065 -o -inum 138029076 -o -inum 138029086 -o -inum 138029093 -o -inum 138029099 -o -inum 138029101 -o -inum 138029102 -o -inum 138029106 -o -inum 138029112 -o -inum 138029113 -o -inum 138029114 -o -inum 138029119 -o -inum 138029120 -o -inum 138029121 -o -inum 138029130 -o -inum 138029131 -o -inum 138029132 -o -inum 138029141 -o -inum 138029146 -o -inum 138029147 -o -inum 138029152 -o -inum 138029153 -o -inum 138029154 -o -inum 138029163 -o -inum 138029164 -o -inum 138029165 -o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o -inum 138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o -inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 -o -inum 138235320 -ls
/root
root at gpfsmgrb#

BTW, I had put that in a simple script simply because I had a list of those inodes and it was easier for me to get that in the format I wanted via a script that I was editing than trying to do that on the command line.

However, in the log file it was producing it “hit” on 48 files:

[I] Inodes scan: 978275821 files, 99448202 directories, 37189547 other objects, 1967508 'skipped' files and/or errors.
[I] 2018-03-06 at 23:43:15.988 Policy evaluation. 1114913570 files scanned.
[I] 2018-03-06 at 23:43:16.016 Sorting 48 candidate file list records.
[I] 2018-03-06 at 23:43:16.040 Sorting 48 candidate file list records.
[I] 2018-03-06 at 23:43:16.065 Choosing candidate files. 0 records scanned.
[I] 2018-03-06 at 23:43:16.066 Choosing candidate files. 48 records scanned.
[I] Summary of Rule Applicability and File Choices:
 Rule#    Hit_Cnt     KB_Hit     Chosen  KB_Chosen     KB_Ill Rule
     0         48 1274453504         48 1274453504          0 RULE 'mmfind' LIST 'mmfindList' DIRECTORIES_PLUS SHOW(.) WHERE(.)

[I] Filesystem objects with no applicable rules: 1112946014.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to list 1274453504KB: 48 of 48 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name              KB_Occupied       KB_Total Percent_Occupied
gpfs23capacity         564722407424   624917749760    90.367477583%
gpfs23data             304797672448   531203506176    57.378701177%
system                            0              0     0.000000000% (no user data)
[I] 2018-03-06 at 23:43:16.066 Policy execution. 0 files dispatched.
[I] 2018-03-06 at 23:43:16.102 Policy execution. 0 files dispatched.
[I] A total of 0 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
0 'skipped' files and/or errors.

While I’m going to follow your suggestion next, if you (or anyone else on the list) can explain why the “Hit_Cnt” is 48 but the “-ls” I passed to mmfind didn’t result in anything being listed, my curiosity is piqued.

And I’ll go ahead and say it before someone else does … I haven’t just chosen a special case, I AM a special case… ;-)

Kevin

On Mar 6, 2018, at 4:27 PM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

Please try:

mmfind --polFlags '-N a_node_list  -g /gpfs23/tmp'  directory find-flags ...

Where a_node_list is a node list of your choice and /gpfs23/tmp is a temp directory of your choice...

And let us know how that goes.

Also, you have chosen a special case, just looking for some inode numbers -- so find can skip stating the other inodes...
whereas mmfind is not smart enough to do that -- but still with parallelism, I'd guess mmapplypolicy might still beat find in elapsed time to complete, even for this special case.

-- Marc K of GPFS



From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        03/06/2018 01:52 PM
Subject:        [gpfsug-discuss] mmfind performance
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



Hi All,

In the README for the mmfind command it says:

mmfind
  A highly efficient file system traversal tool, designed to serve
   as a drop-in replacement for the 'find' command as used against GPFS FSes.

And:

mmfind is expected to be slower than find on file systems with relatively few inodes.
This is due to the overhead of using mmapplypolicy.
However, if you make use of the -exec flag to carry out a relatively expensive operation
on each file (e.g. compute a checksum), using mmfind should yield a significant performance
improvement, even on a file system with relatively few inodes.

I have a list of just shy of 50 inode numbers that I need to figure out what file they correspond to, so I decided to give mmfind a try:

+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 -o -inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 137939675 -o -inum 137997971 -o -inum 138013736 -o -inum 138029061 -o -inum 138029065 -o -inum 138029076 -o -inum 138029086 -o -inum 138029093 -o -inum 138029099 -o -inum 138029101 -o -inum 138029102 -o -inum 138029106 -o -inum 138029112 -o -inum 138029113 -o -inum 138029114 -o -inum 138029119 -o -inum 138029120 -o -inum 138029121 -o -inum 138029130 -o -inum 138029131 -o -inum 138029132 -o -inum 138029141 -o -inum 138029146 -o -inum 138029147 -o -inum 138029152 -o -inum 138029153 -o -inum 138029154 -o -inum 138029163 -o -inum 138029164 -o -inum 138029165 -o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o -inum 138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o -inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 -o -inum 138235320 -ls

I kicked that off last Friday and it is _still_ running.  By comparison, I have a Perl script that I have run in the past that simple traverses the entire filesystem tree and stat’s each file and outputs that to a log file.  That script would “only” run ~24 hours.

Clearly mmfind as I invoked it is much slower than the corresponding Perl script, so what am I doing wrong?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=48WYhVkWI1kr_BM-Wg_VaXEOi7xfGusnZcJtkiA98zg&s=IXUhEC_thuGAVwGJ02oazCCnKEuAdGeg890fBelP4kE&e=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DcvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8%26m%3D48WYhVkWI1kr_BM-Wg_VaXEOi7xfGusnZcJtkiA98zg%26s%3DIXUhEC_thuGAVwGJ02oazCCnKEuAdGeg890fBelP4kE%26e%3D&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C724521c8034241913d8508d58412dcf8%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636560138922366489&sdata=qp%2FZwpYl77ThzYApt5VKMnPVfIc44BR6dOwTl62HpXM%3D&reserved=0>



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C724521c8034241913d8508d58412dcf8%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636560138922366489&sdata=faXozQ%2FGGDf8nARmk52%2B2W5eIEBfnYwNapJyH%2FagqIQ%3D&reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180307/96673c7e/attachment-0002.htm>


More information about the gpfsug-discuss mailing list