[gpfsug-discuss] Small cluster

Jan-Frode Myklebust janfrode at tanso.net
Sat Mar 5 13:16:54 GMT 2016


Regarding #1, the FAQ has recommendation to not run CES nodes directly
attached to storage:

""" • NSD server functionality and storage attached to Protocol node. We
recommend that Protocol nodes do not take on these functions
"""

For small CES clusters we're now configuring 2x P822L with one partition on
each server owning FC adapters and acting as NSD server/quorum/manager and
the other partition being CES node accessing disk via IP.

I would much rather have a plain SAN model cluster were all nodes accessed
disk directly (probably still with a dedicated quorum/manager partition),
but this FAQ entry is preventing that..


-jf

fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme <oehmes at us.ibm.com>:

> Hi,
>
> a couple of comments to the various infos in this thread.
>
> 1. the need to run CES on separate nodes is a recommendation, not a
> requirement and the recommendation comes from the fact that if you have
> heavy loaded NAS traffic that gets the system to its knees, you can take
> your NSD service down with you if its on the same box. so as long as you
> have a reasonable performance expectation and size the system correct there
> is no issue.
>
> 2. shared vs FPO vs shared nothing (just replication) . the main issue
> people overlook in this scenario is the absence of read/write caches in FPO
> or shared nothing configurations. every physical disk drive can only do
> ~100 iops and thats independent if the io size is 1 byte or 1 megabyte its
> pretty much the same effort. particular on metadata this bites you really
> badly as every of this tiny i/os eats one of your 100 iops a disk can do
> and quickly you used up all your iops on the drives. if you have any form
> of raid controller (sw or hw) it typically implements at minimum a read
> cache on most systems a read/write cache which will significant increase
> the number of logical i/os one can do against a disk , my best example is
> always if you have a workload that does 4k seq DIO writes to a single disk,
> if you have no raid controller you can do 400k/sec in this workload if you
> have a reasonable ok write cache in front of the cache you can do 50 times
> that much. so especilly if you use snapshots, CES services or anything
> thats metadata intensive you want some type of raid protection with
> caching. btw. replication in the FS makes this even worse as now each write
> turns into 3 iops for the data + additional iops for the log records so you
> eat up your iops very quick .
>
> 3. instead of shared SAN a shared SAS device is significantly cheaper but
> only scales to 2-4 nodes , the benefit is you only need 2 instead of 3
> nodes as you can use the disks as tiebreaker disks. if you also add some
> SSD's for the metadata and make use of HAWC and LROC you might get away
> from not needing a raid controller with cache as HAWC will solve that issue
> for you .
>
> just a few thoughts :-D
>
> sven
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> [image: Inactive hide details for Zachary Giles ---03/04/2016 05:36:50
> PM---SMB too, eh? See this is where it starts to get hard to sca]Zachary
> Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where it starts
> to get hard to scale down. You could do a 3 node GPFS clust
>
> From: Zachary Giles <zgiles at gmail.com>
>
>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>
> Date: 03/04/2016 05:36 PM
>
>
> Subject: Re: [gpfsug-discuss] Small cluster
>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
>
> SMB too, eh? See this is where it starts to get hard to scale down. You
> could do a 3 node GPFS cluster with replication at remote sites, pulling in
> from AFM over the Net. If you want SMB too, you're probably going to need
> another pair of servers to act as the Protocol Servers on top of the 3 GPFS
> servers. I think running them all together is not recommended, and probably
> I'd agree with that.
> Though, you could do it anyway. If it's for read-only and updated daily,
> eh, who cares. Again, depends on your GPFS experience and the balance
> between production, price, and performance :)
>
> On Fri, Mar 4, 2016 at 11:30 AM, *Mark.Bush at siriuscom.com*
> <Mark.Bush at siriuscom.com> <*Mark.Bush at siriuscom.com*
> <Mark.Bush at siriuscom.com>> wrote:
>
>    Yes.  Really the only other option we have (and not a bad one) is
>    getting a v7000 Unified in there (if we can get the price down far
>    enough).  That’s not a bad option since all they really want is SMB shares
>    in the remote.  I just keep thinking a set of servers would do the trick
>    and be cheaper.
>
>
>
>    *From: *Zachary Giles <*zgiles at gmail.com* <zgiles at gmail.com>>
> * Reply-To: *gpfsug main discussion list <
>    *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
> * Date: *Friday, March 4, 2016 at 10:26 AM
>
> * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
>    <gpfsug-discuss at spectrumscale.org>>
> * Subject: *Re: [gpfsug-discuss] Small cluster
>
>    You can do FPO for non-Hadoop workloads. It just alters the disks
>    below the GPFS filesystem layer and looks like a normal GPFS system
>    (mostly).  I do think there were some restrictions on non-FPO nodes
>    mounting FPO filesystems via multi-cluster.. not sure if those are still
>    there.. any input on that from IBM?
>
>    If small enough data, and with 3-way replication, it might just be
>    wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K (just
>    common throwing out numbers), 3 of those per site would fit in your budget.
>
>    Again.. depending on your requirements, stability balance between
>    'science experiment' vs production, GPFS knowledge level, etc etc...
>
>    This is actually an interesting and somewhat missing space for small
>    enterprises. If you just want 10-20TB active-active online everywhere, say,
>    for VMware, or NFS, or something else, there arent all that many good
>    solutions today that scale down far enough and are a decent price. It's
>    easy with many many PB, but small.. idk. I think the above sounds good as
>    anything without going SAN-crazy.
>
>
>
>    On Fri, Mar 4, 2016 at 11:21 AM, *Mark.Bush at siriuscom.com*
>    <Mark.Bush at siriuscom.com> <*Mark.Bush at siriuscom.com*
>    <Mark.Bush at siriuscom.com>> wrote:
>    I guess this is really my question.  Budget is less than $50k per site
>    and they need around 20TB storage.  Two nodes with MD3 or something may
>    work.  But could it work (and be successful) with just servers and internal
>    drives?  Should I do FPO for non hadoop like workloads?  I didn’t think I
>    could get native raid except in the ESS (GSS no longer exists if I remember
>    correctly).  Do I just make replicas and call it good?
>
>
>    Mark
>
>    *From: *<*gpfsug-discuss-bounces at spectrumscale.org*
>    <gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Marc A Kaplan
>    <*makaplan at us.ibm.com* <makaplan at us.ibm.com>>
> * Reply-To: *gpfsug main discussion list <
>    *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
> * Date: *Friday, March 4, 2016 at 10:09 AM
> * To: *gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
>    <gpfsug-discuss at spectrumscale.org>>
> * Subject: *Re: [gpfsug-discuss] Small cluster
>
>    Jon, I don't doubt your experience, but it's not quite fair or even
>    sensible to make a decision today based on what was available in the GPFS
>    2.3 era.
>
>    We are now at GPFS 4.2 with support for 3 way replication and FPO.
>    Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS
>    solutions and more.
>
>    So more choices, more options, making finding an "optimal" solution
>    more difficult.
>
>    To begin with, as with any provisioning problem, one should try to
>    state: requirements, goals, budgets, constraints, failure/tolerance
>    models/assumptions,
>    expected workloads, desired performance, etc, etc.
>
>    This message (including any attachments) is intended only for the use
>    of the individual or entity to which it is addressed and may contain
>    information that is non-public, proprietary, privileged, confidential, and
>    exempt from disclosure under applicable law. If you are not the intended
>    recipient, you are hereby notified that any use, dissemination,
>    distribution, or copying of this communication is strictly prohibited. This
>    message may be viewed by parties at Sirius Computer Solutions other than
>    those named in the message header. This message does not contain an
>    official representation of Sirius Computer Solutions. If you have received
>    this communication in error, notify Sirius Computer Solutions immediately
>    and (i) destroy this message if a facsimile or (ii) delete this message
>    immediately if this is an electronic communication. Thank you.
>
>    *Sirius Computer Solutions* <http://www.siriuscom.com/>
>
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>    <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
>    --
>    Zach Giles
> *zgiles at gmail.com* <zgiles at gmail.com>
>
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>    <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
> --
> Zach Giles
> *zgiles at gmail.com* <zgiles at gmail.com>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/e00bf204/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/e00bf204/attachment-0002.gif>


More information about the gpfsug-discuss mailing list