[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sven Oehme oehmes at gmail.com
Wed Aug 1 22:01:28 BST 2018


the only way to get max number of subblocks for a 5.0.x filesystem with the
released code is to have metadata and data use the same blocksize.

sven

On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

> All,
>
> Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does
> this go back to what Marc is saying that there’s really only one sub blocks
> per block parameter?  If so, is there any way to get what I want as
> described below?
>
> Thanks…
>
> Kevin
>
>> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633>
>
>
> On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L <
> Kevin.Buterbaugh at Vanderbilt.Edu> wrote:
>
> Hi Sven,
>
> OK … but why?  I mean, that’s not what the man page says.  Where does that
> “4 x” come from?
>
> And, most importantly … that’s not what I want.  I want a smaller block
> size for the system pool since it’s metadata only and on RAID 1 mirrors
> (HD’s on the test cluster but SSD’s on the production cluster).  So … side
> question … is 1 MB OK there?
>
> But I want a 4 MB block size for data with an 8 KB sub block … I want good
> performance for the sane people using our cluster without unduly punishing
> the … ahem … fine folks whose apps want to create a bazillion tiny files!
>
> So how do I do that?
>
> Thanks!
>
>> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
>
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633>
>
>
> On Aug 1, 2018, at 1:41 PM, Sven Oehme <oehmes at gmail.com> wrote:
>
> the number of subblocks is derived by the smallest blocksize in any pool
> of a given filesystem. so if you pick a metadata blocksize of 1M it will be
> 8k in the metadata pool, but 4 x of that in the data pool if your data pool
> is 4M.
>
> sven
>
> On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop <knop at us.ibm.com> wrote:
>
> Marc, Kevin,
>>
>> We'll be looking into this issue, since at least at a first glance, it
>> does look odd. A 4MB block size should have resulted in an 8KB subblock
>> size. I suspect that, somehow, the *--metadata-block-size** 1M* may have
>> resulted in
>>
>>
>> 32768 Minimum fragment (subblock) size in bytes (other pools)
>>
>> but I do not yet understand how.
>>
>> The *subblocks-per-full-block* parameter is not supported with *mmcrfs *.
>>
>> Felipe
>>
>> ----
>> Felipe Knop knop at us.ibm.com
>> GPFS Development and Security
>> IBM Systems
>> IBM Building 008
>> 2455 South Rd, Poughkeepsie, NY 12601
>> (845) 433-9314 T/L 293-9314
>>
>>
>>
>> <graycol.gif>"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't
>> looked into all the details but here's a clue -- notice there is only one
>> "subblocks-per-
>>
>> From: "Marc A Kaplan" <makaplan at us.ibm.com>
>>
>>
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>
>> Date: 08/01/2018 01:21 PM
>> Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
>>
>>
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> ------------------------------
>>
>>
>>
>> I haven't looked into all the details but here's a clue -- notice there
>> is only one "subblocks-per-full-block" parameter.
>>
>> And it is the same for both metadata blocks and datadata blocks.
>>
>> So maybe (MAYBE) that is a constraint somewhere...
>>
>> Certainly, in the currently supported code, that's what you get.
>>
>>
>>
>>
>> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Date: 08/01/2018 12:55 PM
>> Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> ------------------------------
>>
>>
>>
>> Hi All,
>>
>> Our production cluster is still on GPFS 4.2.3.x, but in preparation for
>> moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS
>> 5.0.1-1. I am setting up a new filesystem there using hardware that we
>> recently life-cycled out of our production environment.
>>
>> I “successfully” created a filesystem but I believe the sub-block size is
>> wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs
>> man page the sub-block size should be 8K:
>>
>> Table 1. Block sizes and subblock sizes
>>
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>> | Block size | Subblock size |
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>> | 64 KiB | 2 KiB |
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>> | 128 KiB | 4 KiB |
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>> | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
>> | MiB, 4 MiB | |
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>> | 8 MiB, 16 MiB | 16 KiB |
>> +‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
>>
>> However, it appears that it’s 8K for the system pool but 32K for the
>> other pools:
>>
>> flag value description
>> ------------------- ------------------------
>> -----------------------------------
>> -f 8192 Minimum fragment (subblock) size in bytes (system pool)
>> 32768 Minimum fragment (subblock) size in bytes (other pools)
>> -i 4096 Inode size in bytes
>> -I 32768 Indirect block size in bytes
>> -m 2 Default number of metadata replicas
>> -M 3 Maximum number of metadata replicas
>> -r 1 Default number of data replicas
>> -R 3 Maximum number of data replicas
>> -j scatter Block allocation type
>> -D nfs4 File locking semantics in effect
>> -k all ACL semantics in effect
>> -n 32 Estimated number of nodes that will mount file system
>> -B 1048576 Block size (system pool)
>> 4194304 Block size (other pools)
>> -Q user;group;fileset Quotas accounting enabled
>> user;group;fileset Quotas enforced
>> none Default quotas enabled
>> --perfileset-quota No Per-fileset quota enforcement
>> --filesetdf No Fileset df enabled?
>> -V 19.01 (5.0.1.0) File system version
>> --create-time Wed Aug 1 11:39:39 2018 File system creation time
>> -z No Is DMAPI enabled?
>> -L 33554432 Logfile size
>> -E Yes Exact mtime mount option
>> -S relatime Suppress atime mount option
>> -K whenpossible Strict replica allocation option
>> --fastea Yes Fast external attributes enabled?
>> --encryption No Encryption enabled?
>> --inode-limit 101095424 Maximum number of inodes
>> --log-replicas 0 Number of log replicas
>> --is4KAligned Yes is4KAligned?
>> --rapid-repair Yes rapidRepair enabled?
>> --write-cache-threshold 0 HAWC Threshold (max 65536)
>> --subblocks-per-full-block 128 Number of subblocks per full block
>> -P system;raid1;raid6 Disk storage pools in file system
>> --file-audit-log No File Audit Logging enabled?
>> --maintenance-mode No Maintenance Mode enabled?
>> -d
>> test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
>> Disks in file system
>> -A yes Automatic mount option
>> -o none Additional mount options
>> -T /gpfs5 Default mount point
>> --mount-priority 0 Mount priority
>>
>> Output of mmcrfs:
>>
>> mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j
>> scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5
>> -v yes --nofilesetdf --metadata-block-size 1M
>>
>> The following disks of gpfs5 will be formatted on node testnsd3:
>> test21A3nsd: size 953609 MB
>> test21A4nsd: size 953609 MB
>> test21B3nsd: size 953609 MB
>> test21B4nsd: size 953609 MB
>> test23Ansd: size 15259744 MB
>> test23Bnsd: size 15259744 MB
>> test23Cnsd: size 1907468 MB
>> test24Ansd: size 15259744 MB
>> test24Bnsd: size 15259744 MB
>> test24Cnsd: size 1907468 MB
>> test25Ansd: size 15259744 MB
>> test25Bnsd: size 15259744 MB
>> test25Cnsd: size 1907468 MB
>> Formatting file system ...
>> Disks up to size 8.29 TB can be added to storage pool system.
>> Disks up to size 16.60 TB can be added to storage pool raid1.
>> Disks up to size 132.62 TB can be added to storage pool raid6.
>> Creating Inode File
>> 8 % complete on Wed Aug 1 11:39:19 2018
>> 18 % complete on Wed Aug 1 11:39:24 2018
>> 27 % complete on Wed Aug 1 11:39:29 2018
>> 37 % complete on Wed Aug 1 11:39:34 2018
>> 48 % complete on Wed Aug 1 11:39:39 2018
>> 60 % complete on Wed Aug 1 11:39:44 2018
>> 72 % complete on Wed Aug 1 11:39:49 2018
>> 83 % complete on Wed Aug 1 11:39:54 2018
>> 95 % complete on Wed Aug 1 11:39:59 2018
>> 100 % complete on Wed Aug 1 11:40:01 2018
>> Creating Allocation Maps
>> Creating Log Files
>> 3 % complete on Wed Aug 1 11:40:07 2018
>> 28 % complete on Wed Aug 1 11:40:14 2018
>> 53 % complete on Wed Aug 1 11:40:19 2018
>> 78 % complete on Wed Aug 1 11:40:24 2018
>> 100 % complete on Wed Aug 1 11:40:25 2018
>> Clearing Inode Allocation Map
>> Clearing Block Allocation Map
>> Formatting Allocation Map for storage pool system
>> 85 % complete on Wed Aug 1 11:40:32 2018
>> 100 % complete on Wed Aug 1 11:40:33 2018
>> Formatting Allocation Map for storage pool raid1
>> 53 % complete on Wed Aug 1 11:40:38 2018
>> 100 % complete on Wed Aug 1 11:40:42 2018
>> Formatting Allocation Map for storage pool raid6
>> 20 % complete on Wed Aug 1 11:40:47 2018
>> 39 % complete on Wed Aug 1 11:40:52 2018
>> 60 % complete on Wed Aug 1 11:40:57 2018
>> 79 % complete on Wed Aug 1 11:41:02 2018
>> 100 % complete on Wed Aug 1 11:41:08 2018
>> Completed creation of file system /dev/gpfs5.
>> mmcrfs: Propagating the cluster configuration data to all
>> affected nodes. This is an asynchronous process.
>>
>> And contents of stanza file:
>>
>> %nsd:
>> nsd=test21A3nsd
>> usage=metadataOnly
>> failureGroup=210
>> pool=system
>> servers=testnsd3,testnsd1,testnsd2
>> device=dm-15
>>
>> %nsd:
>> nsd=test21A4nsd
>> usage=metadataOnly
>> failureGroup=210
>> pool=system
>> servers=testnsd1,testnsd2,testnsd3
>> device=dm-14
>>
>> %nsd:
>> nsd=test21B3nsd
>> usage=metadataOnly
>> failureGroup=211
>> pool=system
>> servers=testnsd1,testnsd2,testnsd3
>> device=dm-17
>>
>> %nsd:
>> nsd=test21B4nsd
>> usage=metadataOnly
>> failureGroup=211
>> pool=system
>> servers=testnsd2,testnsd3,testnsd1
>> device=dm-16
>>
>> %nsd:
>> nsd=test23Ansd
>> usage=dataOnly
>> failureGroup=23
>> pool=raid6
>> servers=testnsd2,testnsd3,testnsd1
>> device=dm-10
>>
>> %nsd:
>> nsd=test23Bnsd
>> usage=dataOnly
>> failureGroup=23
>> pool=raid6
>> servers=testnsd3,testnsd1,testnsd2
>> device=dm-9
>>
>> %nsd:
>> nsd=test23Cnsd
>> usage=dataOnly
>> failureGroup=23
>> pool=raid1
>> servers=testnsd1,testnsd2,testnsd3
>> device=dm-5
>>
>> %nsd:
>> nsd=test24Ansd
>> usage=dataOnly
>> failureGroup=24
>> pool=raid6
>> servers=testnsd3,testnsd1,testnsd2
>> device=dm-6
>>
>> %nsd:
>> nsd=test24Bnsd
>> usage=dataOnly
>> failureGroup=24
>> pool=raid6
>> servers=testnsd1,testnsd2,testnsd3
>> device=dm-0
>>
>> %nsd:
>> nsd=test24Cnsd
>> usage=dataOnly
>> failureGroup=24
>> pool=raid1
>> servers=testnsd2,testnsd3,testnsd1
>> device=dm-2
>>
>> %nsd:
>> nsd=test25Ansd
>> usage=dataOnly
>> failureGroup=25
>> pool=raid6
>> servers=testnsd1,testnsd2,testnsd3
>> device=dm-6
>>
>> %nsd:
>> nsd=test25Bnsd
>> usage=dataOnly
>> failureGroup=25
>> pool=raid6
>> servers=testnsd2,testnsd3,testnsd1
>> device=dm-6
>>
>> %nsd:
>> nsd=test25Cnsd
>> usage=dataOnly
>> failureGroup=25
>> pool=raid1
>> servers=testnsd3,testnsd1,testnsd2
>> device=dm-3
>>
>> %pool:
>> pool=system
>> blockSize=1M
>> usage=metadataOnly
>> layoutMap=scatter
>> allowWriteAffinity=no
>>
>> %pool:
>> pool=raid6
>> blockSize=4M
>> usage=dataOnly
>> layoutMap=scatter
>> allowWriteAffinity=no
>>
>> %pool:
>> pool=raid1
>> blockSize=4M
>> usage=dataOnly
>> layoutMap=scatter
>> allowWriteAffinity=no
>>
>> What am I missing or what have I done wrong? Thanks…
>>
>> Kevin
>>>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and
>> Education
>> *Kevin.Buterbaugh at vanderbilt.edu* <Kevin.Buterbaugh at vanderbilt.edu>-
>> (615)875-9633 <(615)%20875-9633>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834171332&sdata=sFB5TXhhOddzDjupY8G04%2FUb%2BWKO6UDsaS0lWcBsAVE%3D&reserved=0>
>> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834181344&sdata=iyZVZSpq2Z3e6xzMKa2nACI8GATEqkGOaqrZyuvZMjc%3D&reserved=0>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834191353&sdata=AGpbm%2BxjIycToPKKP9Amtzzl6jAn59e3d3kr9R7Setc%3D&reserved=0>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834191353&sdata=2csVvV7tvgg8fMM01RLj5fY8uhvIK44k4hRsD9vjuV0%3D&reserved=0>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834201361&sdata=hsZ8eOtS9sQhGAh76vk3UY3KTpol0VCfAVaD6Kw9m00%3D&reserved=0>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834211369&sdata=enjtshAXuqo0g6fqmUJOnCKL88MujJuDUWTXdauvx2A%3D&reserved=0>
>>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
>
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180801/d3d908e8/attachment-0002.htm>


More information about the gpfsug-discuss mailing list