<font size=2 face="sans-serif">Frankly, I just don't "get" what
it is you seem not to be "getting" - perhaps someone else
who does "get" it can rephrase: FORGET about Subblocks
when thinking about inodes being packed into the file of all inodes. </font><br><br><font size=2 face="sans-serif">Additional facts that may address some
of the other concerns:</font><br><br><font size=2 face="sans-serif">I started working on GPFS at version
3.1 or so. AFAIK GPFS always had and has one file of inodes, "packed",
with no wasted space between inodes. Period. Full Stop.</font><br><br><font size=2 face="sans-serif">RAID! Now we come to a mistake
that I've seen made by more than a handful of customers!</font><br><br><font size=2 face="sans-serif">It is generally a mistake to use RAID
with parity (such as classic RAID5) to store metadata.</font><br><br><font size=2 face="sans-serif">Why? Because metadata is often
updated with "small writes" - for example suppose we have
to update some fields in an inode, or an indirect block, or append a log
record...</font><br><font size=2 face="sans-serif">For RAID with parity and large stripe
sizes -- this means that updating just one disk sector can cost a full
stripe read + writing the changed data and parity sectors.</font><br><br><font size=2 face="sans-serif">SO, if you want protection against storage
failures for your metadata, use either RAID mirroring/replication and/or
GPFS metadata replication. (belt and/or suspenders)</font><br><font size=2 face="sans-serif">(Arguments against relying solely on
RAID mirroring: single enclosure/box failure (fire!), single hardware
design (bugs or defects), single firmware/microcode(bugs.))</font><br><br><font size=2 face="sans-serif">Yes, GPFS is part of "the cyber."
We're making it stronger everyday. But it already is great. </font><br><br><font size=2 face="sans-serif">--marc</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">"Buterbaugh, Kevin
L" <Kevin.Buterbaugh@Vanderbilt.Edu></font><br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">gpfsug main discussion
list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">09/29/2016 11:03 AM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">[gpfsug-discuss]
Fwd: Blocksize</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=3>Resending from the right e-mail address...</font><br><br><font size=3>Begin forwarded message:</font><br><br><font size=3 face="sans-serif"><b>From: </b></font><a href="mailto:gpfsug-discuss-owner@spectrumscale.org"><font size=3 color=blue face="sans-serif"><u>gpfsug-discuss-owner@spectrumscale.org</u></font></a><br><font size=3 face="sans-serif"><b>Subject: Re: [gpfsug-discuss] Blocksize</b></font><br><font size=3 face="sans-serif"><b>Date: </b>September 29, 2016 at 10:00:36
AM CDT</font><br><font size=3 face="sans-serif"><b>To: </b></font><a href=mailto:klb@accre.vanderbilt.edu><font size=3 color=blue face="sans-serif"><u>klb@accre.vanderbilt.edu</u></font></a><br><br><font size=3>You are not allowed to post to this mailing list, and
your message has<br>been automatically rejected. If you think that your messages are<br>being rejected in error, contact the mailing list owner at</font><font size=3 color=blue><u><br></u></font><a href="mailto:gpfsug-discuss-owner@spectrumscale.org"><font size=3 color=blue><u>gpfsug-discuss-owner@spectrumscale.org</u></font></a><font size=3>.<br><br></font><br><font size=3 face="sans-serif"><b>From: </b>"Kevin L. Buterbaugh"
<</font><a href=mailto:klb@accre.vanderbilt.edu><font size=3 color=blue face="sans-serif"><u>klb@accre.vanderbilt.edu</u></font></a><font size=3 face="sans-serif">></font><br><font size=3 face="sans-serif"><b>Subject: Re: [gpfsug-discuss] Blocksize</b></font><br><font size=3 face="sans-serif"><b>Date: </b>September 29, 2016 at 10:00:29
AM CDT</font><br><font size=3 face="sans-serif"><b>To: </b>gpfsug main discussion list
<</font><a href="mailto:gpfsug-discuss@spectrumscale.org"><font size=3 color=blue face="sans-serif"><u>gpfsug-discuss@spectrumscale.org</u></font></a><font size=3 face="sans-serif">></font><br><font size=3><br></font><br><font size=3>Hi Marc and others, </font><br><br><font size=3>I understand … I guess I did a poor job of wording my
question, so I’ll try again. The IBM recommendation for metadata
block size seems to be somewhere between 256K - 1 MB, depending on who
responds to the question. If I were to hypothetically use a 256K
metadata block size, does the “1/32nd of a block” come into play like
it does for “not metadata”? I.e. 256 / 32 = 8K, so am I reading
/ writing *2* inodes (assuming 4K inode size) minimum?</font><br><br><font size=3>And here’s a really off the wall question … yesterday
we were discussing the fact that there is now a single inode file. Historically,
we have always used RAID 1 mirrors (first with spinning disk, as of last
fall now on SSD) for metadata and then use GPFS replication on top of that.
But given that there is a single inode file is that “old way” of
doing things still the right way? In other words, could we potentially
be better off by using a couple of 8+2P RAID 6 LUNs?</font><br><br><font size=3>One potential downside of that would be that we would
then only have two NSD servers serving up metadata, so we discussed the
idea of taking each RAID 6 LUN and splitting it up into multiple logical
volumes (all that done on the storage array, of course) and then presenting
those to GPFS as NSDs???</font><br><br><font size=3>Or have I gone from merely asking stupid questions to
Trump-level craziness???? ;-)</font><br><br><font size=3>Kevin</font><br><br><font size=3>On Sep 28, 2016, at 10:23 AM, Marc A Kaplan <</font><a href=mailto:makaplan@us.ibm.com><font size=3 color=blue><u>makaplan@us.ibm.com</u></font></a><font size=3>>
wrote:</font><br><br><font size=2 face="sans-serif">OKAY, I'll say it again. inodes
are PACKED into a single inode file. So a 4KB inode takes 4KB, REGARDLESS
of metadata blocksize. There is no wasted space.</font><font size=3><br></font><font size=2 face="sans-serif"><br>(Of course if you have metadata replication = 2, then yes, double that.
And yes, there overhead for indirect blocks (indices), allocation
maps, etc, etc.)<br><br>And your choice is not just 512 or 4096. Maybe 1KB or 2KB is a good
choice for your data distribution, to optimize packing of data and/or directories
into inodes...</font><font size=3><br></font><font size=2 face="sans-serif"><br>Hmmm... I don't know why the doc leaves out 2048, perhaps a typo...</font><font size=3><br></font><font size=2 face="sans-serif"><br>mmcrfs x2K -i 2048</font><font size=3><br></font><font size=2 face="sans-serif"><br>[root@n2 charts]# mmlsfs x2K -i<br>flag value
description<br>------------------- ------------------------ -----------------------------------<br> -i 2048
Inode size
in bytes</font><font size=3><br></font><font size=2 face="sans-serif"><br>Works for me!</font><font size=3><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at </font><a href=http://spectrumscale.org/><font size=3 color=blue><u>spectrumscale.org</u></font></a><font size=3 color=blue><u><br></u></font><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><font size=3 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></a><br><br><font size=3><br></font><br><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><BR>