[gpfsug-discuss] mmfind performance

Marc A Kaplan makaplan at us.ibm.com
Tue Mar 6 22:27:34 GMT 2018


Please try:

mmfind --polFlags '-N a_node_list  -g /gpfs23/tmp'  directory find-flags 
...

Where a_node_list is a node list of your choice and /gpfs23/tmp is a temp 
directory of your choice...

And let us know how that goes.

Also, you have chosen a special case, just looking for some inode numbers 
-- so find can skip stating the other inodes...
whereas mmfind is not smart enough to do that -- but still with 
parallelism, I'd guess mmapplypolicy might still beat find in elapsed time 
to complete, even for this special case.

-- Marc K of GPFS



From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   03/06/2018 01:52 PM
Subject:        [gpfsug-discuss] mmfind performance
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi All, 

In the README for the mmfind command it says:

mmfind
  A highly efficient file system traversal tool, designed to serve
   as a drop-in replacement for the 'find' command as used against GPFS 
FSes.

And:

mmfind is expected to be slower than find on file systems with relatively 
few inodes.
This is due to the overhead of using mmapplypolicy.
However, if you make use of the -exec flag to carry out a relatively 
expensive operation 
on each file (e.g. compute a checksum), using mmfind should yield a 
significant performance 
improvement, even on a file system with relatively few inodes.

I have a list of just shy of 50 inode numbers that I need to figure out 
what file they correspond to, so I decided to give mmfind a try:

+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 
-o -inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 
137939675 -o -inum 137997971 -o -inum 138013736 -o -inum 138029061 -o 
-inum 138029065 -o -inum 138029076 -o -inum 138029086 -o -inum 138029093 
-o -inum 138029099 -o -inum 138029101 -o -inum 138029102 -o -inum 
138029106 -o -inum 138029112 -o -inum 138029113 -o -inum 138029114 -o 
-inum 138029119 -o -inum 138029120 -o -inum 138029121 -o -inum 138029130 
-o -inum 138029131 -o -inum 138029132 -o -inum 138029141 -o -inum 
138029146 -o -inum 138029147 -o -inum 138029152 -o -inum 138029153 -o 
-inum 138029154 -o -inum 138029163 -o -inum 138029164 -o -inum 138029165 
-o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o -inum 
138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o 
-inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 
-o -inum 138235320 -ls

I kicked that off last Friday and it is _still_ running.  By comparison, I 
have a Perl script that I have run in the past that simple traverses the 
entire filesystem tree and stat’s each file and outputs that to a log 
file.  That script would “only” run ~24 hours.

Clearly mmfind as I invoked it is much slower than the corresponding Perl 
script, so what am I doing wrong?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=48WYhVkWI1kr_BM-Wg_VaXEOi7xfGusnZcJtkiA98zg&s=IXUhEC_thuGAVwGJ02oazCCnKEuAdGeg890fBelP4kE&e=





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180306/d3c33fd4/attachment-0002.htm>


More information about the gpfsug-discuss mailing list