[gpfsug-discuss] subblock sanity check in 5.0

Uwe Falke UWEFALKE at de.ibm.com
Mon Jul 2 21:17:26 BST 2018


Hi, Carl, Sven had mentioned the RMW penalty before which could make it 
beneficial to use smaller blocks. 
If you have traditional RAIDs and you go the usual route to do track sizes 
equal to the block size (stripe size = BS/n with n+p RAIDs), you may run 
into problems if your I/O are typically or very often smaller than a block 
because the controller needs to read the entire track, modifies it 
according to your I/O, and writes it back with the parity stripes. 
Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB 
 reaching an NSD the controller needs to read 4MiB into a buffer, modify 
it according to your I/O, calculate parity for the whole track and write 
back 5MiB (8 data stripes of 512kiB plus two parity stripes).  In those 
cases you might be better off with smaller block sizes. 
In the above scenario, it might however still be ok to leave the block 
size at 4MiB and just reduce the track size of the RAIDs. One has to check 
how that affects performance, YMMV I'd say here. 

Mind that the ESS uses a clever way to mask these type of I/O from the n+p 
RS based vdisks, but even there one might need to think ...



 
Mit freundlichen Grüßen / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 




From:   Carl <mutantllama at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   02/07/2018 11:57
Subject:        Re: [gpfsug-discuss] subblock sanity check in 5.0
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Thanks Olaf and Sven,

It looks like a lot of advice from the wiki (
https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata
) is no longer relevant for version 5. Any idea if its likely to be 
updated soon?

The new subblock changes appear to have removed a lot of reasons for using 
smaller block sizes. In broad terms there any situations where you would 
recommend using less than the new default block size?  

Cheers,

Carl.


On Mon, 2 Jul 2018 at 17:55, Sven Oehme <oehmes at gmail.com> wrote:
Olaf, he is talking about indirect size not subblock size .

Carl, 

here is a screen shot of a 4mb filesystem :

[root at p8n15hyp ~]# mmlsfs all_local

File system attributes for /dev/fs2-4m-07:
==========================================
flag                value                    description
------------------- ------------------------ 
-----------------------------------
 -f                 8192                     Minimum fragment (subblock) 
size in bytes
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 1                        Default number of metadata 
replicas
 -M                 2                        Maximum number of metadata 
replicas
 -r                 1                        Default number of data 
replicas
 -R                 2                        Maximum number of data 
replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in 
effect
 -k                 all                      ACL semantics in effect
 -n                 512                      Estimated number of nodes 
that will mount file system
 -B                 4194304                  Block size
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 19.01 (5.0.1.0)          File system version
 --create-time      Mon Jun 18 12:30:54 2018 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 33554432                 Logfile size
 -E                 Yes                      Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation 
option
 --fastea           Yes                      Fast external attributes 
enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      4000000000               Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 512              Number of subblocks per full 
block
 -P                 system                   Disk storage pools in file 
system
 --file-audit-log   No                       File Audit Logging enabled?
 --maintenance-mode No                       Maintenance Mode enabled?
 -d                 RG001VS001;RG002VS001;RG003VS002;RG004VS002  Disks in 
file system
 -A                 no                       Automatic mount option
 -o                 none                     Additional mount options
 -T                 /gpfs/fs2-4m-07          Default mount point
 --mount-priority   0                        Mount priority

as you can see indirect size is 32k

sven

On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
HI Carl, 
8k for 4 M Blocksize
files < ~3,x KB fits into the inode    , for "larger" files (> 3,x KB) at 
least one "subblock"  be allocated  .. 

in R < 5.x ... it was fixed  1/32 from blocksize  so subblocksize is 
retrieved from the blocksize ... 
since R >5 (so new created file systems) .. the new default block size is 
4 MB, fragment size is 8k (512 subblocks) 
for even larger block sizes ... more subblocks are available per block 
so e.g. 
8M .... 1024 subblocks (fragment size is 8 k again)

@Sven.. correct me, if I'm wrong ... 






From:        Carl <mutantllama at gmail.com>

To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        07/02/2018 08:55 AM
Subject:        Re: [gpfsug-discuss] subblock sanity check in 5.0
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi Sven,

What is the resulting indirect-block size with a 4mb metadata block size? 

Does the new sub-block magic mean that it will take up 32k, or will it 
occupy 128k?

Cheers,

Carl.


On Mon, 2 Jul 2018 at 15:26, Sven Oehme <oehmes at gmail.com> wrote:
Hi,

most traditional raid controllers can't deal well with blocksizes above 
4m, which is why the new default is 4m and i would leave it at that unless 
you know for sure you get better performance with 8mb which typically 
requires your raid controller volume full block size to be 8mb with maybe 
a 8+2p @1mb strip size (many people confuse strip size with full track 
size) .
if you don't have dedicated SSDs for metadata i would recommend to just 
use a 4mb blocksize with mixed data and metadata disks, if you have a 
reasonable number of SSD's put them in a raid 1 or raid 10 and use them as 
dedicated metadata and the other disks as dataonly , but i would not use 
the --metadata-block-size parameter as it prevents the datapool to use 
large number of subblocks.
as long as your SSDs are on raid 1 or 10 there is no read/modify/write 
penalty, so using them with the 4mb blocksize has no real negative impact 
at least on controllers i have worked with.

hope this helps.

On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <jam at ucar.edu> wrote:
Hi, it's for a traditional NSD setup.
--Joey

On 6/26/18 12:21 AM, Sven Oehme wrote:
Joseph, 

the subblocksize will be derived from the smallest blocksize in the 
filesytem, given you specified a metadata block size of 512k thats what 
will be used to calculate the number of subblocks, even your data pool is 
4mb. 
is this setup for a traditional NSD Setup or for GNR as the 
recommendations would be different. 

sven

On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <jam at ucar.edu> wrote:
Quick question, anyone know why GPFS wouldn't respect the default for
the subblocks-per-full-block parameter when creating a new filesystem? 
I'd expect it to be set to 512 for an 8MB block size but my guess is
that also specifying a metadata-block-size is interfering with it (by
being too small).  This was a parameter recommended by the vendor for a
4.2 installation with metadata on dedicated SSDs in the system pool, any
best practices for 5.0?  I'm guessing I'd have to bump it up to at least
4MB to get 512 subblocks for both pools.

fs1 created with:
# mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j
cluster -n 9000 --metadata-block-size 512K --perfileset-quota
--filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T 
/gpfs/fs1

# mmlsfs fs1
<snipped>

flag                value                    description
------------------- ------------------------
-----------------------------------
 -f                 8192                     Minimum fragment (subblock)
size in bytes (system pool)
                    131072                   Minimum fragment (subblock)
size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes

 -B                 524288                   Block size (system pool)
                    8388608                  Block size (other pools)

 -V                 19.01 (5.0.1.0)          File system version

 --subblocks-per-full-block 64               Number of subblocks per
full block
 -P                 system;DATA              Disk storage pools in file
system


Thanks!
--Joey Mendoza
NCAR
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss








More information about the gpfsug-discuss mailing list