[gpfsug-discuss] Blocksize

Sven Oehme oehmes at us.ibm.com
Fri Sep 23 22:35:12 BST 2016


your metadata block size these days should be 1 MB and there are only very
few workloads for which you should run with a filesystem blocksize below 1
MB. so if you don't know exactly what to pick, 1 MB is a good starting
point.
the general rule still applies that your filesystem blocksize (metadata or
data pool) should match your raid controller (or GNR vdisk) stripe size of
the particular pool.

so if you use a 128k strip size(defaut in many midrange storage
controllers) in a 8+2p raid array, your stripe or track size is 1 MB and
therefore the blocksize of this pool should be 1 MB. i see many customers
in the field using 1MB or even smaller blocksize on RAID stripes of 2 MB or
above and your performance will be significant impacted by that.

Sven

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------



From:	Stephen Ulmer <ulmer at ulmer.org>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	09/23/2016 12:16 PM
Subject:	Re: [gpfsug-discuss] Blocksize
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



Not to be too pedantic, but I believe the the subblock size is 1/32 of the
block size (which strengthens Luis’s arguments below).

I thought the the original question was NOT about inode size, but about
metadata block size. You can specify that the system pool have a different
block size from the rest of the filesystem, providing that it ONLY holds
metadata (—metadata-block-size option to mmcrfs).

So with 4K inodes (which should be used for all new filesystems without
some counter-indication), I would think that we’d want to use a metadata
block size of 4K*32=128K. This is independent of the regular block size,
which you can calculate based on the workload if you’re lucky.

There could be a great reason NOT to use 128K metadata block size, but I
don’t know what it is. I’d be happy to be corrected about this if it’s out
of whack.

--
Stephen



      On Sep 22, 2016, at 3:37 PM, Luis Bolinches <
      luis.bolinches at fi.ibm.com> wrote:

      Hi

      My 2 cents.

      Leave at least 4K inodes, then you get massive improvement on small
      files (less 3.5K minus whatever you use on xattr)

      About blocksize for data, unless you have actual data that suggest
      that you will actually benefit from smaller than 1MB block, leave
      there. GPFS uses sublocks where 1/16th of the BS can be allocated to
      different files, so the "waste" is much less than you think on 1MB
      and you get the throughput and less structures of much more data
      blocks.

      No warranty at all but I try to do this when the BS talk comes in:
      (might need some clean up it could not be last note but you get the
      idea)

      POSIX
      find . -type f -name '*' -exec ls -l {} \; > find_ls_files.out
      GPFS
      cd /usr/lpp/mmfs/samples/ilm
      gcc mmfindUtil_processOutputFile.c -o mmfindUtil_processOutputFile
      ./mmfind /gpfs/shared -ls -type f > find_ls_files.out
          CONVERT to CSV

      POSIX
      cat find_ls_files.out | awk '{print $5","}' > find_ls_files.out.csv
      GPFS
      cat find_ls_files.out | awk '{print $7","}' > find_ls_files.out.csv
          LOAD in octave

      FILESIZE = int32 (dlmread ("find_ls_files.out.csv", ","));
          Clean the second column (OPTIONAL as the next clean up will do
      the same)

      FILESIZE(:,[2]) = [];
          If we are on 4K aligment we need to clean the files that go to
      inodes (WELL not exactly ... extended attributes! so maybe use a
      lower number!)

      FILESIZE(FILESIZE<=3584) =[];
          If we are not we need to clean the 0 size files

      FILESIZE(FILESIZE==0) =[];
          Median

      FILESIZEMEDIAN = int32 (median (FILESIZE))
          Mean

      FILESIZEMEAN = int32 (mean (FILESIZE))
          Variance

      int32 (var (FILESIZE))
          iqr interquartile range, i.e., the difference between the upper
      and lower quartile, of the input data.

      int32 (iqr (FILESIZE))
          Standard deviation


      For some FS with lots of files you might need a rather powerful
      machine to run the calculations on octave, I never hit anything could
      not manage on a 64GB RAM Power box. Most of the times it is enough
      with my laptop.



      --
      Ystävällisin terveisin / Kind regards / Saludos cordiales /
      Salutations

      Luis Bolinches
      Lab Services
      http://www-03.ibm.com/systems/services/labservices/

      IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
      Phone: +358 503112585

      "If you continually give you will continually have." Anonymous


       ----- Original message -----
       From: Stef Coene <stef.coene at docum.org>
       Sent by: gpfsug-discuss-bounces at spectrumscale.org
       To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
       Cc:
       Subject: Re: [gpfsug-discuss] Blocksize
       Date: Thu, Sep 22, 2016 10:30 PM

       On 09/22/2016 09:07 PM, J. Eric Wonderley wrote:
       > It defaults to 4k:
       > mmlsfs testbs8M -i
       > flag                value                    description
       > ------------------- ------------------------
       > -----------------------------------
       >  -i                 4096                     Inode size in bytes
       >
       > I think you can make as small as 512b.   Gpfs will store very
       small
       > files in the inode.
       >
       > Typically you want your average file size to be your blocksize and
       your
       > filesystem has one blocksize and one inodesize.

       The files are not small, but around 20 MB on average.
       So I calculated with IBM that a 1 MB or 2 MB block size is best.

       But I'm not sure if it's better to use a smaller block size for the
       metadata.

       The file system is not that large (400 TB) and will hold backup data
       from CommVault.


       Stef
       _______________________________________________
       gpfsug-discuss mailing list
       gpfsug-discuss at spectrumscale.org
       http://gpfsug.org/mailman/listinfo/gpfsug-discuss



      Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
      Oy IBM Finland Ab
      PL 265, 00101 Helsinki, Finland
      Business ID, Y-tunnus: 0195876-3
      Registered in Finland

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160923/a8c468ef/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160923/a8c468ef/attachment-0002.gif>


More information about the gpfsug-discuss mailing list