[gpfsug-discuss] RAID type for system pool

Marc A Kaplan makaplan at us.ibm.com
Wed Sep 5 18:20:00 BST 2018


It's good to try to reason and think this out... But there's a good 
likelihood that we don't understand ALL the details, some of which may 
negatively impact performance - so no matter what scheme you come up with 
- test, test, and re-test before deploying and depending on it in 
production.

Having said that, I'm pretty sure that old "spinning" RAID 5 
implementations had horrible performance for GPFS metadata/system pool.
Why? Among other things, the large stripe size vs the almost random small 
writes directed to system pool.

That random-small-writes pattern won't change when we go to SSD RAID 5 - 
so you'd have to see if the SSD implementation is somehow smarter than an 
old fashioned RAID 5 implementation which I believe requires several 
physical reads and writes, for each "small" logical write.
(Top decent google result I found quickly 
http://rickardnobel.se/raid-5-write-penalty/ But you will probably want to 
do more research!)

Consider GPFS small write performance for:  inode updates, log writes, 
small files (possibly in inode), directory updates, allocation map 
updates, index of indirect blocks.



From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   09/05/2018 11:36 AM
Subject:        [gpfsug-discuss] RAID type for system pool
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi All, 

We are in the process of finalizing the purchase of some new storage 
arrays (so no sales people who might be monitoring this list need contact 
me) to life-cycle some older hardware.  One of the things we are 
considering is the purchase of some new SSD’s for our “/home” filesystem 
and I have a question or two related to that.

Currently, the existing home filesystem has it’s metadata on SSD’s … two 
RAID 1 mirrors and metadata replication set to two.  However, the 
filesystem itself is old enough that it uses 512 byte inodes.  We have 
analyzed our users files and know that if we create a new filesystem with 
4K inodes that a very significant portion of the files would now have 
their _data_ stored in the inode as well due to the files being 3.5K or 
smaller (currently all data is on spinning HD RAID 1 mirrors).

Of course, if we increase the size of the inodes by a factor of 8 then we 
also need 8 times as much space to store those inodes.  Given that 
Enterprise class SSDs are still very expensive and our budget is not 
unlimited, we’re trying to get the best bang for the buck.

We have always - even back in the day when our metadata was on spinning 
disk and not SSD - used RAID 1 mirrors and metadata replication of two. 
However, we are wondering if it might be possible to switch to RAID 5? 
Specifically, what we are considering doing is buying 8 new SSDs and 
creating two 3+1P RAID 5 LUNs (metadata replication would stay at two). 
That would give us 50% more usable space than if we configured those same 
8 drives as four RAID 1 mirrors.

Unfortunately, unless I’m misunderstanding something, mean that the RAID 
stripe size and the GPFS block size could not match.  Therefore, even 
though we don’t need the space, would we be much better off to buy 10 SSDs 
and create two 4+1P RAID 5 LUNs?

I’ve searched the mailing list archives and scanned the DeveloperWorks 
wiki and even glanced at the GPFS documentation and haven’t found anything 
that says “bad idea, Kevin”… ;-)

Expanding on this further … if we just present those two RAID 5 LUNs to 
GPFS as NSDs then we can only have two NSD servers as primary for them. So 
another thing we’re considering is to take those RAID 5 LUNs and further 
sub-divide them into a total of 8 logical volumes, each of which could be 
a GPFS NSD and therefore would allow us to have each of our 8 NSD servers 
be primary for one of them.  Even worse idea?!?  Good idea?

Anybody have any better ideas???  ;-)

Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on moving 
to GPFS 5.0.1-x before creating the new filesystem.

Thanks much…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180905/e76ea0df/attachment-0002.htm>


More information about the gpfsug-discuss mailing list