[gpfsug-discuss] gpfs native raid

Aaron Knister aaron.s.knister at nasa.gov
Fri Sep 30 21:56:29 BST 2016


Thanks, Yuri. Your replies are always quite enjoyable to read. I didn't 
realize SES was such a loosely interpreted standard, I just assumed it 
was fairly straightforward. We've got a number of JBODs here we manage 
via SES using the linux enclosure module (e.g. /sys/class/enclosure) and 
they seem to "just work" but we're not doing anything terribly advanced, 
mostly just turning on/off various status LEDs. I should clarify, the 
newer SAS enclosures I've encountered seem quite good, some of the older 
enclosures (e.g. in particular the Xyratex enclosure used by DDN in it's 
S2A units) were a real treat to interact with and didn't seem to follow 
the SES standard in spirit.

I can certainly accept the complexity argument here. I think for our 
purposes a "reasonable level" of support would be all we're after. I'm 
not sure how ZFS would deal with a SCSI reset storm, I suspect the pool 
would just offline itself if enough paths seemed to disappear or timeout.

If I could make GPFS work well with ZFS as the underlying storage target 
I would be quite happy. So far I have struggled to make it performant. 
GPFS seems to assume once a block device accepts the write that it's 
committed to stable storage. With ZFS ZVOL's this isn't the case by 
default. Making it the case (setting the sync=always paremter) causes a 
*massive* degradation in performance. If GPFS were to issue sync 
commands at appropriate intervals then I think we could make this work 
well. I'm not sure how to go about this, though, and given frequent 
enough scsi sync commands to a given lun its performance would likely 
degrade to the current state of zfs with sync=always.

At any rate, we'll see how things go. Thanks again.

-Aaron

On 9/30/16 1:43 AM, Yuri L Volobuev wrote:
> The issue of "GNR as software" is a pretty convoluted mixture of
> technical, business, and resource constraints issues. While some of the
> technical issues can be discussed here, obviously the other
> considerations cannot be discussed in a public forum. So you won't be
> able to get a complete understanding of the situation by discussing it here.
>
>> I understand the support concerns, but I naively thought that assuming
>> the hardware meets a basic set of requirements (e.g. redundant sas
>> paths, x type of drives) it would be fairly supportable with GNR. The
>> DS3700 shelves are re-branded NetApp E-series shelves and pretty vanilla
>> I thought.
>
> Setting business issues aside, this is more complicated on the technical
> level than one may think.  At present, GNR requires a set of twin-tailed
> external disk enclosures.  This is not a particularly exotic kind of
> hardware, but it turns out that this corner of the storage world is
> quite insular.  GNR has a very close relationship with physical disk
> devices, much more so than regular GPFS.  In an ideal world, SCSI and
> SES standards are supposed to provide a framework which would allow
> software like GNR to operate on an arbitrary disk enclosure.  In the
> real world, the actual SES implementations on various enclosures that
> we've been dealing with are, well, peculiar.  Apparently SES is one of
> those standards where vendors feel a lot of freedom in "re-interpreting"
> the standard, and since typically enclosures talk to a small set of RAID
> controllers, there aren't bad enough consequences to force vendors to be
> religious about SES standard compliance.  Furthermore, the SAS fabric
> topology in configurations with an external disk enclosures is
> surprisingly complex, and that complexity predictably leads to complex
> failures which don't exist in simpler configurations.  Thus far, every
> single one of the five enclosures we've had a chance to run GNR on
> required some adjustments, workarounds, hacks, etc.  And the
> consequences of a misbehaving SAS fabric can be quite dire.  There are
> various approaches to dealing with those complications, from running a
> massive 3rd party hardware qualification program to basically declaring
> any complications from an unknown enclosure to be someone else's problem
> (how would ZFS deal with a SCSI reset storm due to a bad SAS expander?),
> but there's much debate on what is the right path to take.  Customer
> input/feedback is obviously very valuable in tilting such discussions in
> the right direction.
>
> yuri
>
> Inactive hide details for Aaron Knister ---09/28/2016 06:44:23
> PM---Thanks Everyone for your replies! (Quick disclaimer, these Aaron
> Knister ---09/28/2016 06:44:23 PM---Thanks Everyone for your replies!
> (Quick disclaimer, these opinions are my own, and not those of my
>
> From: Aaron Knister <aaron.knister at gmail.com>
> To: gpfsug-discuss at spectrumscale.org,
> Date: 09/28/2016 06:44 PM
> Subject: Re: [gpfsug-discuss] gpfs native raid
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>
> ------------------------------------------------------------------------
>
>
>
> Thanks Everyone for your replies! (Quick disclaimer, these opinions are
> my own, and not those of my employer or NASA).
>
> Not knowing what's coming at the NDA session, it seems to boil down to
> "it ain't gonna happen" because of:
>
> - Perceived difficulty in supporting whatever creative hardware
> solutions customers may throw at it.
>
> I understand the support concerns, but I naively thought that assuming
> the hardware meets a basic set of requirements (e.g. redundant sas
> paths, x type of drives) it would be fairly supportable with GNR. The
> DS3700 shelves are re-branded NetApp E-series shelves and pretty vanilla
> I thought.
>
> - IBM would like to monetize the product and compete with the likes of
> DDN/Seagate
>
> This is admittedly a little disappointing. GPFS as long as I've known it
> has been largely hardware vendor agnostic. To see even a slight shift
> towards hardware vendor lockin and certain features only being supported
> and available on IBM hardware is concerning. It's not like the software
> itself is free. Perhaps GNR could be a paid add-on license for non-IBM
> hardware? Just thinking out-loud.
>
> The big things I was looking to GNR for are:
>
> - end-to-end checksums
> - implementing a software RAID layer on (in my case enterprise class) JBODs
>
> I can find a way to do the second thing, but the former I cannot.
> Requiring IBM hardware to get end-to-end checksums is a huge red flag
> for me.  That's something Lustre will do today with ZFS on any hardware
> ZFS will run on (and for free, I might add). I would think GNR being
> openly available to customers would be important for GPFS to compete
> with Lustre. Furthermore, I had opened an RFE (#84523) a while back to
> implement checksumming of data for non-GNR environments. The RFE was
> declined because essentially it would be too hard and it already exists
> for GNR. Well, considering I don't have a GNR environment, and hardware
> vendor lock in is something many sites are not interested in, that's
> somewhat of a problem.
>
> I really hope IBM reconsiders their stance on opening up GNR. The
> current direction, while somewhat understandable, leaves a really bad
> taste in my mouth and is one of the (very few, in my opinion) features
> Lustre has over GPFS.
>
> -Aaron
>
>
> On 9/1/16 9:59 AM, Marc A Kaplan wrote:
>> I've been told that it is a big leap to go from supporting GSS and ESS
>> to allowing and supporting native raid for customers who may throw
>> together "any" combination of hardware they might choose.
>>
>> In particular the GNR "disk hospital" functions...
>> https://www.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs200.doc/bl1adv_introdiskhospital.htm
>> will be tricky to support on umpteen different vendor boxes -- and keep
>> in mind, those will be from IBM competitors!
>>
>> That said, ESS and GSS show that IBM has some good tech in this area and
>> IBM has shown with the Spectrum Scale product (sans GNR) it can support
>> just about any semi-reasonable hardware configuration and a good slew of
>> OS versions and architectures... Heck I have a demo/test version of GPFS
>> running on a 5 year old Thinkpad laptop.... And we have some GSSs in the
>> lab... Not to mention Power hardware and mainframe System Z (think 360,
>> 370, 290, Z)
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list