[gpfsug-discuss] subblock sanity check in 5.0
Uwe Falke
UWEFALKE at de.ibm.com
Mon Jul 2 21:17:26 BST 2018
Hi, Carl, Sven had mentioned the RMW penalty before which could make it
beneficial to use smaller blocks.
If you have traditional RAIDs and you go the usual route to do track sizes
equal to the block size (stripe size = BS/n with n+p RAIDs), you may run
into problems if your I/O are typically or very often smaller than a block
because the controller needs to read the entire track, modifies it
according to your I/O, and writes it back with the parity stripes.
Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB
reaching an NSD the controller needs to read 4MiB into a buffer, modify
it according to your I/O, calculate parity for the whole track and write
back 5MiB (8 data stripes of 512kiB plus two parity stripes). In those
cases you might be better off with smaller block sizes.
In the above scenario, it might however still be ok to leave the block
size at 4MiB and just reduce the track size of the RAIDs. One has to check
how that affects performance, YMMV I'd say here.
Mind that the ESS uses a clever way to mask these type of I/O from the n+p
RS based vdisks, but even there one might need to think ...
Mit freundlichen Grüßen / Kind regards
Dr. Uwe Falke
IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122
From: Carl <mutantllama at gmail.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 02/07/2018 11:57
Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Thanks Olaf and Sven,
It looks like a lot of advice from the wiki (
https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata
) is no longer relevant for version 5. Any idea if its likely to be
updated soon?
The new subblock changes appear to have removed a lot of reasons for using
smaller block sizes. In broad terms there any situations where you would
recommend using less than the new default block size?
Cheers,
Carl.
On Mon, 2 Jul 2018 at 17:55, Sven Oehme <oehmes at gmail.com> wrote:
Olaf, he is talking about indirect size not subblock size .
Carl,
here is a screen shot of a 4mb filesystem :
[root at p8n15hyp ~]# mmlsfs all_local
File system attributes for /dev/fs2-4m-07:
==========================================
flag value description
------------------- ------------------------
-----------------------------------
-f 8192 Minimum fragment (subblock)
size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata
replicas
-M 2 Maximum number of metadata
replicas
-r 1 Default number of data
replicas
-R 2 Maximum number of data
replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in
effect
-k all ACL semantics in effect
-n 512 Estimated number of nodes
that will mount file system
-B 4194304 Block size
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 19.01 (5.0.1.0) File system version
--create-time Mon Jun 18 12:30:54 2018 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation
option
--fastea Yes Fast external attributes
enabled?
--encryption No Encryption enabled?
--inode-limit 4000000000 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 512 Number of subblocks per full
block
-P system Disk storage pools in file
system
--file-audit-log No File Audit Logging enabled?
--maintenance-mode No Maintenance Mode enabled?
-d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in
file system
-A no Automatic mount option
-o none Additional mount options
-T /gpfs/fs2-4m-07 Default mount point
--mount-priority 0 Mount priority
as you can see indirect size is 32k
sven
On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
HI Carl,
8k for 4 M Blocksize
files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at
least one "subblock" be allocated ..
in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is
retrieved from the blocksize ...
since R >5 (so new created file systems) .. the new default block size is
4 MB, fragment size is 8k (512 subblocks)
for even larger block sizes ... more subblocks are available per block
so e.g.
8M .... 1024 subblocks (fragment size is 8 k again)
@Sven.. correct me, if I'm wrong ...
From: Carl <mutantllama at gmail.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 07/02/2018 08:55 AM
Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Hi Sven,
What is the resulting indirect-block size with a 4mb metadata block size?
Does the new sub-block magic mean that it will take up 32k, or will it
occupy 128k?
Cheers,
Carl.
On Mon, 2 Jul 2018 at 15:26, Sven Oehme <oehmes at gmail.com> wrote:
Hi,
most traditional raid controllers can't deal well with blocksizes above
4m, which is why the new default is 4m and i would leave it at that unless
you know for sure you get better performance with 8mb which typically
requires your raid controller volume full block size to be 8mb with maybe
a 8+2p @1mb strip size (many people confuse strip size with full track
size) .
if you don't have dedicated SSDs for metadata i would recommend to just
use a 4mb blocksize with mixed data and metadata disks, if you have a
reasonable number of SSD's put them in a raid 1 or raid 10 and use them as
dedicated metadata and the other disks as dataonly , but i would not use
the --metadata-block-size parameter as it prevents the datapool to use
large number of subblocks.
as long as your SSDs are on raid 1 or 10 there is no read/modify/write
penalty, so using them with the 4mb blocksize has no real negative impact
at least on controllers i have worked with.
hope this helps.
On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <jam at ucar.edu> wrote:
Hi, it's for a traditional NSD setup.
--Joey
On 6/26/18 12:21 AM, Sven Oehme wrote:
Joseph,
the subblocksize will be derived from the smallest blocksize in the
filesytem, given you specified a metadata block size of 512k thats what
will be used to calculate the number of subblocks, even your data pool is
4mb.
is this setup for a traditional NSD Setup or for GNR as the
recommendations would be different.
sven
On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <jam at ucar.edu> wrote:
Quick question, anyone know why GPFS wouldn't respect the default for
the subblocks-per-full-block parameter when creating a new filesystem?
I'd expect it to be set to 512 for an 8MB block size but my guess is
that also specifying a metadata-block-size is interfering with it (by
being too small). This was a parameter recommended by the vendor for a
4.2 installation with metadata on dedicated SSDs in the system pool, any
best practices for 5.0? I'm guessing I'd have to bump it up to at least
4MB to get 512 subblocks for both pools.
fs1 created with:
# mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j
cluster -n 9000 --metadata-block-size 512K --perfileset-quota
--filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T
/gpfs/fs1
# mmlsfs fs1
<snipped>
flag value description
------------------- ------------------------
-----------------------------------
-f 8192 Minimum fragment (subblock)
size in bytes (system pool)
131072 Minimum fragment (subblock)
size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-B 524288 Block size (system pool)
8388608 Block size (other pools)
-V 19.01 (5.0.1.0) File system version
--subblocks-per-full-block 64 Number of subblocks per
full block
-P system;DATA Disk storage pools in file
system
Thanks!
--Joey Mendoza
NCAR
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list