[gpfsug-discuss] Blocksize
Sven Oehme
oehmes at us.ibm.com
Fri Sep 23 22:35:12 BST 2016
your metadata block size these days should be 1 MB and there are only very
few workloads for which you should run with a filesystem blocksize below 1
MB. so if you don't know exactly what to pick, 1 MB is a good starting
point.
the general rule still applies that your filesystem blocksize (metadata or
data pool) should match your raid controller (or GNR vdisk) stripe size of
the particular pool.
so if you use a 128k strip size(defaut in many midrange storage
controllers) in a 8+2p raid array, your stripe or track size is 1 MB and
therefore the blocksize of this pool should be 1 MB. i see many customers
in the field using 1MB or even smaller blocksize on RAID stripes of 2 MB or
above and your performance will be significant impacted by that.
Sven
------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------
From: Stephen Ulmer <ulmer at ulmer.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 09/23/2016 12:16 PM
Subject: Re: [gpfsug-discuss] Blocksize
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Not to be too pedantic, but I believe the the subblock size is 1/32 of the
block size (which strengthens Luis’s arguments below).
I thought the the original question was NOT about inode size, but about
metadata block size. You can specify that the system pool have a different
block size from the rest of the filesystem, providing that it ONLY holds
metadata (—metadata-block-size option to mmcrfs).
So with 4K inodes (which should be used for all new filesystems without
some counter-indication), I would think that we’d want to use a metadata
block size of 4K*32=128K. This is independent of the regular block size,
which you can calculate based on the workload if you’re lucky.
There could be a great reason NOT to use 128K metadata block size, but I
don’t know what it is. I’d be happy to be corrected about this if it’s out
of whack.
--
Stephen
On Sep 22, 2016, at 3:37 PM, Luis Bolinches <
luis.bolinches at fi.ibm.com> wrote:
Hi
My 2 cents.
Leave at least 4K inodes, then you get massive improvement on small
files (less 3.5K minus whatever you use on xattr)
About blocksize for data, unless you have actual data that suggest
that you will actually benefit from smaller than 1MB block, leave
there. GPFS uses sublocks where 1/16th of the BS can be allocated to
different files, so the "waste" is much less than you think on 1MB
and you get the throughput and less structures of much more data
blocks.
No warranty at all but I try to do this when the BS talk comes in:
(might need some clean up it could not be last note but you get the
idea)
POSIX
find . -type f -name '*' -exec ls -l {} \; > find_ls_files.out
GPFS
cd /usr/lpp/mmfs/samples/ilm
gcc mmfindUtil_processOutputFile.c -o mmfindUtil_processOutputFile
./mmfind /gpfs/shared -ls -type f > find_ls_files.out
CONVERT to CSV
POSIX
cat find_ls_files.out | awk '{print $5","}' > find_ls_files.out.csv
GPFS
cat find_ls_files.out | awk '{print $7","}' > find_ls_files.out.csv
LOAD in octave
FILESIZE = int32 (dlmread ("find_ls_files.out.csv", ","));
Clean the second column (OPTIONAL as the next clean up will do
the same)
FILESIZE(:,[2]) = [];
If we are on 4K aligment we need to clean the files that go to
inodes (WELL not exactly ... extended attributes! so maybe use a
lower number!)
FILESIZE(FILESIZE<=3584) =[];
If we are not we need to clean the 0 size files
FILESIZE(FILESIZE==0) =[];
Median
FILESIZEMEDIAN = int32 (median (FILESIZE))
Mean
FILESIZEMEAN = int32 (mean (FILESIZE))
Variance
int32 (var (FILESIZE))
iqr interquartile range, i.e., the difference between the upper
and lower quartile, of the input data.
int32 (iqr (FILESIZE))
Standard deviation
For some FS with lots of files you might need a rather powerful
machine to run the calculations on octave, I never hit anything could
not manage on a 64GB RAM Power box. Most of the times it is enough
with my laptop.
--
Ystävällisin terveisin / Kind regards / Saludos cordiales /
Salutations
Luis Bolinches
Lab Services
http://www-03.ibm.com/systems/services/labservices/
IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
Phone: +358 503112585
"If you continually give you will continually have." Anonymous
----- Original message -----
From: Stef Coene <stef.coene at docum.org>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: Re: [gpfsug-discuss] Blocksize
Date: Thu, Sep 22, 2016 10:30 PM
On 09/22/2016 09:07 PM, J. Eric Wonderley wrote:
> It defaults to 4k:
> mmlsfs testbs8M -i
> flag value description
> ------------------- ------------------------
> -----------------------------------
> -i 4096 Inode size in bytes
>
> I think you can make as small as 512b. Gpfs will store very
small
> files in the inode.
>
> Typically you want your average file size to be your blocksize and
your
> filesystem has one blocksize and one inodesize.
The files are not small, but around 20 MB on average.
So I calculated with IBM that a 1 MB or 2 MB block size is best.
But I'm not sure if it's better to use a smaller block size for the
metadata.
The file system is not that large (400 TB) and will hold backup data
from CommVault.
Stef
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160923/a8c468ef/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160923/a8c468ef/attachment-0002.gif>
More information about the gpfsug-discuss
mailing list