[gpfsug-discuss] Policy scan against billion files for ILM/HSM

Marc A Kaplan makaplan at us.ibm.com
Tue Apr 11 16:36:47 BST 2017


As primary developer of mmapplypolicy, please allow me to comment:

1) Fast access to metadata in system pool is most important, as several 
have commented on.  These days SSD is the favorite, but you can still go 
with "spinning" media.
If you do go with disks, it's extremely important to spread your metadata 
over independent disk "arms" -- so you can have many concurrent seeks in 
progress at the same time.  IOW, if there is a virtualization/mapping 
layer, watchout that your logical disks don't get mapped to the same 
physical disk.

2) Crucial to use both -g and -N :: -g 
/gpfs-not-necessarily-the-same-fs-as-Im-scanning/tempdir  and -N 
several-nodes-that-will-be-accessing-the-system-pool

3a) If at all possible, encourage your data and application designers to 
"pack" their directories with lots of files.   Keep in mind that, 
mmapplypolicy will read every directory.  The more directories, the more 
seeks, more time spent waiting for IO.   OTOH, in more typical Unix/Linux 
usage, we tend to low average number of files per directory. 

3b) As admin, you may not be able to change your data design to pack 
hundreds of files per directory, BUT you can  make sure you are running a 
sufficiently modern release of Spectrum Scale that supports "data in 
inode" -- "Data in inode" also means "directory entries in inode" -- which 
means practically any small directory, up to a few hundred files, will fit 
in an an inode -- which means mmapplypolicy can read small directories 
with one seek, instead of two. 

(Someone will please remind us of the release number that first supported 
"directories in inode".)

4) Sorry, Fred, but the recommendation to use RAID mirroring of metadata 
on SSD, is not necessarily, important for metadata scanning. In fact it 
may work against you.  If you use GPFS replication of metadata - that can 
work for you -- since then GPFS can direct read operations to either copy, 
preferring a locally attached copy, depending on how storage is attached 
to node, etc, etc.   Choice of how to replicate metadata - either using 
GPFS replication or the RAID controller - is probably best made based on 
reliability and recoverability requirements.

5) YMMV - We'd love to hear/see your performance results for 
mmapplypolicy, especially if they're good.  Even if they're bad, come back 
here for more tuning tips!

-- marc of Spectrum Scale (ne GPFS)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170411/264bb85a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list