<font size=2 face="sans-serif">Frankly, I just don't "get" what

it is you seem not to be "getting"  - perhaps someone else

who does "get" it can rephrase:  FORGET about Subblocks

when thinking about inodes being packed into the file of all inodes.  </font><br><br><font size=2 face="sans-serif">Additional facts that may address some

of the other concerns:</font><br><br><font size=2 face="sans-serif">I started working on GPFS at version

3.1 or so.  AFAIK GPFS always had and has one file of inodes, "packed",

with no wasted space between inodes.  Period. Full Stop.</font><br><br><font size=2 face="sans-serif">RAID!  Now we come to a mistake

that I've seen made by more than a handful of customers!</font><br><br><font size=2 face="sans-serif">It is generally a mistake to use RAID

with parity (such as classic RAID5) to store metadata.</font><br><br><font size=2 face="sans-serif">Why?  Because metadata is often

updated with "small writes"  - for example suppose we have

to update some fields in an inode, or an indirect block, or append a log

record...</font><br><font size=2 face="sans-serif">For RAID with parity and large stripe

sizes -- this means that updating just one disk sector can cost a full

stripe read + writing the changed data and parity sectors.</font><br><br><font size=2 face="sans-serif">SO, if you want protection against storage

failures for your metadata, use either RAID mirroring/replication and/or

GPFS metadata replication.  (belt and/or suspenders)</font><br><font size=2 face="sans-serif">(Arguments against relying solely on

RAID mirroring:  single enclosure/box failure (fire!), single hardware

design (bugs or defects), single firmware/microcode(bugs.))</font><br><br><font size=2 face="sans-serif">Yes, GPFS is part of "the cyber."

 We're making it stronger everyday. But it already is great.  </font><br><br><font size=2 face="sans-serif">--marc</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">"Buterbaugh, Kevin

L" <Kevin.Buterbaugh@Vanderbilt.Edu></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">gpfsug main discussion

list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">09/29/2016 11:03 AM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">[gpfsug-discuss]

Fwd:  Blocksize</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=3>Resending from the right e-mail address...</font><br><br><font size=3>Begin forwarded message:</font><br><br><font size=3 face="sans-serif"><b>From: </b></font><a href="mailto:gpfsug-discuss-owner@spectrumscale.org"><font size=3 color=blue face="sans-serif"><u>gpfsug-discuss-owner@spectrumscale.org</u></font></a><br><font size=3 face="sans-serif"><b>Subject: Re: [gpfsug-discuss] Blocksize</b></font><br><font size=3 face="sans-serif"><b>Date: </b>September 29, 2016 at 10:00:36

AM CDT</font><br><font size=3 face="sans-serif"><b>To: </b></font><a href=mailto:klb@accre.vanderbilt.edu><font size=3 color=blue face="sans-serif"><u>klb@accre.vanderbilt.edu</u></font></a><br><br><font size=3>You are not allowed to post to this mailing list, and

your message has<br>been automatically rejected.  If you think that your messages are<br>being rejected in error, contact the mailing list owner at</font><font size=3 color=blue><u><br></u></font><a href="mailto:gpfsug-discuss-owner@spectrumscale.org"><font size=3 color=blue><u>gpfsug-discuss-owner@spectrumscale.org</u></font></a><font size=3>.<br><br></font><br><font size=3 face="sans-serif"><b>From: </b>"Kevin L. Buterbaugh"

<</font><a href=mailto:klb@accre.vanderbilt.edu><font size=3 color=blue face="sans-serif"><u>klb@accre.vanderbilt.edu</u></font></a><font size=3 face="sans-serif">></font><br><font size=3 face="sans-serif"><b>Subject: Re: [gpfsug-discuss] Blocksize</b></font><br><font size=3 face="sans-serif"><b>Date: </b>September 29, 2016 at 10:00:29

AM CDT</font><br><font size=3 face="sans-serif"><b>To: </b>gpfsug main discussion list

<</font><a href="mailto:gpfsug-discuss@spectrumscale.org"><font size=3 color=blue face="sans-serif"><u>gpfsug-discuss@spectrumscale.org</u></font></a><font size=3 face="sans-serif">></font><br><font size=3><br></font><br><font size=3>Hi Marc and others, </font><br><br><font size=3>I understand … I guess I did a poor job of wording my

question, so I’ll try again.  The IBM recommendation for metadata

block size seems to be somewhere between 256K - 1 MB, depending on who

responds to the question.  If I were to hypothetically use a 256K

metadata block size, does the “1/32nd of a block” come into play like

it does for “not metadata”?  I.e. 256 / 32 = 8K, so am I reading

/ writing *2* inodes (assuming 4K inode size) minimum?</font><br><br><font size=3>And here’s a really off the wall question … yesterday

we were discussing the fact that there is now a single inode file.  Historically,

we have always used RAID 1 mirrors (first with spinning disk, as of last

fall now on SSD) for metadata and then use GPFS replication on top of that.

 But given that there is a single inode file is that “old way” of

doing things still the right way?  In other words, could we potentially

be better off by using a couple of 8+2P RAID 6 LUNs?</font><br><br><font size=3>One potential downside of that would be that we would

then only have two NSD servers serving up metadata, so we discussed the

idea of taking each RAID 6 LUN and splitting it up into multiple logical

volumes (all that done on the storage array, of course) and then presenting

those to GPFS as NSDs???</font><br><br><font size=3>Or have I gone from merely asking stupid questions to

Trump-level craziness????  ;-)</font><br><br><font size=3>Kevin</font><br><br><font size=3>On Sep 28, 2016, at 10:23 AM, Marc A Kaplan <</font><a href=mailto:makaplan@us.ibm.com><font size=3 color=blue><u>makaplan@us.ibm.com</u></font></a><font size=3>>

wrote:</font><br><br><font size=2 face="sans-serif">OKAY, I'll say it again.  inodes

are PACKED into a single inode file.  So a 4KB inode takes 4KB, REGARDLESS

of metadata blocksize.  There is no wasted space.</font><font size=3><br></font><font size=2 face="sans-serif"><br>(Of course if you have metadata replication = 2, then yes, double that.

 And yes, there overhead for indirect blocks (indices), allocation

maps, etc, etc.)<br><br>And your choice is not just 512 or 4096.  Maybe 1KB or 2KB is a good

choice for your data distribution, to optimize packing of data and/or directories

into inodes...</font><font size=3><br></font><font size=2 face="sans-serif"><br>Hmmm... I don't know why the doc leaves out 2048, perhaps a typo...</font><font size=3><br></font><font size=2 face="sans-serif"><br>mmcrfs x2K -i 2048</font><font size=3><br></font><font size=2 face="sans-serif"><br>[root@n2 charts]# mmlsfs x2K -i<br>flag                value  

                 description<br>------------------- ------------------------ -----------------------------------<br> -i                 2048  

                  Inode size

in bytes</font><font size=3><br></font><font size=2 face="sans-serif"><br>Works for me!</font><font size=3><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at </font><a href=http://spectrumscale.org/><font size=3 color=blue><u>spectrumscale.org</u></font></a><font size=3 color=blue><u><br></u></font><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><font size=3 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></a><br><br><font size=3><br></font><br><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br></font></tt><br><BR>