[gpfsug-discuss] Inode size, and system pool subblock

Wahl, Edward ewahl at osc.edu
Wed Aug 2 17:29:38 BST 2023


>Someone mentioned encryption will bypass this feature, but it's actually encryption that perhaps requires larger inode sizes to store all the key meta info (you can have up to 8 keys per inode I believe).

I believe that is incorrect.  If encryption is used, the size of the inode makes no difference. This is due to the fact that Only data, NOT metadata is encrypted on the file system.  So storing blocks in MD spaces is out.
See the Scale documentation, and older GPFS documentation, for more information.   (such as Encryption - IBM Documentation<https://www.ibm.com/docs/en/storage-scale/5.1.3?topic=administering-encryption> )  Until such time as they start encrypting the metadata, it’s pointless to size MD for small files.

Ed Wahl
Ohio Supercomputer Center

From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> On Behalf Of Alec
Sent: Wednesday, August 2, 2023 12:07 PM
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Inode size, and system pool subblock

I think things are conflated here. . . The inode size is really just a call on how much functionality you need in an inode. I wouldn't even think about disk block size when setting this. Essentially the smaller the inode the less space I

I think things are conflated here...

The inode size is really just a call on how much functionality you need in an inode.  I wouldn't even think about disk block size when setting this.  Essentially the smaller the inode the less space I need for metadata but also the less capacity I have in my inode.

The default is 4k and if you don't change it then GPFS will put up to a 3.8k file in the inode itself vs going to an indirect disk allocation.  Someone mentioned encryption will bypass this feature, but it's actually encryption that perhaps requires larger inode sizes to store all the key meta info (you can have up to 8 keys per inode I believe).

So essentially it you've got a smaller inode size your directories max size will max out sooner, your ACLs could be constrained, large file names can exhaust, you may not have enough space for Encryption details.  But the upshot is you need to dedicate less space to metadata and can handle more file entries.  So if you've got billions of files and are managing replicas then you should consider fine tuning inode size down.

You can go from 3.5% of space going to inodes to 1% if you went from 4k to 512 bytes.. but there is a reason GPFS defaults to 4k... And doesn't expand on it too much.  If you've guessed wrong you're kind of hosed.

None of this has to do with hardware block sizes, subblock allocation and fragment sizes.  And further compounded by 4k native block sizes vs emulated 512 block size some disk hardware does.

For GPFS you generally will have a very large block size 256kb or 1MB and GPFS will divide those blocks into 32 fragments.  So you may have your smallest unit being a 8kb or 32kb fragment.  If you have a dedicated MD pool (highly recommended) you'd definitely specify a smaller block size than 1MB (128kb = 4kb fragments).

The balance you're trying to strike here is the least amount of commands to retrieve your data efficiently.  Think about the roundtrip on the bus being the same for a 4kb read vs a 1mb read so try to maximize this.

Generally the goal of the file system is to ensure that the excess data that is read when trying to pull fragments is as useless as possible.

I may also be confused but I wouldn't worry so much about inode size to block size.. just worry about getting large blocks working well for regular storage pool if your data is huge and using a smaller block size in MD if dedicate pool which is almost always recommended.

Be very careful of specifying a small inode size because it's not just max filenames and max file counts in a directory.. it is much more.. and if you have a lot of small files don't underestimate the advantage of those files being stored directly in the inode.  A 512 byte inode could only store about a 380byte file vs a 4k file storing 3800 byte file.  These files tend to be shell scripts and config files which you really don't want to be waiting around for and occupying a huge 1mb read for and waisting a potentially larger 64kb fragment allocation on.

Alec



On Wed, Aug 2, 2023, 4:47 AM Olaf Weiser <olaf.weiser at de.ibm.com<mailto:olaf.weiser at de.ibm.com>> wrote:
Hallo Peter,

[1] [...] having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so[...]
in short - yes 😉



[2]  [...]  I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.[...]
you may need to consider snapshots and directories , which all contributes to MD space

predicting the space requirements for MD for directories is always hard, because the size of a directory  is depending on the file's name length, the users will create...


further more,  using a less than 4k  inode size makes also not much sense, when taking into account, that NVMEs and other modern block storage devices comes with a hardware block size of 4k (even though GPFS still can deal with 512 Bytes per sector)


hope this helps ..




________________________________
Von: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>> im Auftrag von Peter Chase <peter.chase at metoffice.gov.uk<mailto:peter.chase at metoffice.gov.uk>>
Gesendet: Mittwoch, 2. August 2023 11:09
An: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org> <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Betreff: [EXTERNAL] [gpfsug-discuss] Inode size, and system pool subblock

Good Morning, I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool? I'm looking at an existing filesystem,

Good Morning,

I have a question about inode size vs subblock size. Can anyone think of a reason that the chosen inode size of a scale filesystem should be smaller than the subblock size for the metadata pool?
I'm looking at an existing filesystem, the inode size is 2KiB, and the subblock is 4KiB.
It feels like I'm missing something. If I've understood the docs on blocks and subblocks correctly, it sounds like the subblock is the smallest atomic access size. Meaning with a 4K subblock, and a 2K inode, reading the inode would return its contents and 2K of empty subblock every time. So, in my head (and maybe only there), having a smaller inode size than the subblock size means there's a big wastage on disk usage, with no performance benefit to doing so.
I believe I'm correct in saying that inodes are not the only things to live on the metadata pool, so I assume that some other metadata might benefit from the larger block/subblock size. But looking at the number of inodes, the inode size, and the space consumed in the system pool, it really looks like the majority of space consumed is by inodes.

As I said, I feel like I'm missing something, so if anyone can tell me where I'm wrong it would be greatly appreciated!

Sincerely,


Pete Chase

UKMO
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org__;!!KGKeukY!wKer_px73AVXSgqasA-xymOOL3Y-Ln5AOyO_hz3e81yY2Y3Bx_IhmuPN87Q8-uneGQK5yacvKmWa$>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!KGKeukY!wKer_px73AVXSgqasA-xymOOL3Y-Ln5AOyO_hz3e81yY2Y3Bx_IhmuPN87Q8-uneGQK5yRmAg67I$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20230802/d0bb9618/attachment-0001.htm>


More information about the gpfsug-discuss mailing list