[gpfsug-discuss] Small cluster

Sven Oehme oehmes at us.ibm.com
Sat Mar 5 13:31:40 GMT 2016


as i stated in my previous post , its a recommendation so people don't
overload the NSD servers to have them become non responsive or even forced
rebooted (e.g. when you configure cNFS auto reboot on same node), it
doesn't mean it doesn't work or is not supported.
if all you are using this cluster for is NAS services, then this
recommendation makes even less sense as the whole purpose on why the
recommendation is there to begin with is that if NFS would overload a node
that also serves as NSD server for other nodes it would impact the other
nodes that use the NSD protocol, but if there are no NSD clients there is
nothing to protect because if NFS is down all clients are not able to
access data, even if your NSD servers are perfectly healthy...

if you have a fairly large system with many NSD Servers, many clients as
well as NAS clients this recommendation is correct, but not in the scenario
you described below.

i will work with the team to come up with a better wording for this in the
FAQ.

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------



From:	Jan-Frode Myklebust <janfrode at tanso.net>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:	Sven Oehme/Almaden/IBM at IBMUS
Date:	03/05/2016 02:17 PM
Subject:	Re: [gpfsug-discuss] Small cluster



Regarding #1, the FAQ has recommendation to not run CES nodes directly
attached to storage:

""" • NSD server functionality and storage attached to Protocol node. We
recommend that Protocol nodes do not take on these functions
"""

For small CES clusters we're now configuring 2x P822L with one partition on
each server owning FC adapters and acting as NSD server/quorum/manager and
the other partition being CES node accessing disk via IP.

I would much rather have a plain SAN model cluster were all nodes accessed
disk directly (probably still with a dedicated quorum/manager partition),
but this FAQ entry is preventing that..


-jf

fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme <oehmes at us.ibm.com>:
  Hi,

  a couple of comments to the various infos in this thread.

  1. the need to run CES on separate nodes is a recommendation, not a
  requirement and the recommendation comes from the fact that if you have
  heavy loaded NAS traffic that gets the system to its knees, you can take
  your NSD service down with you if its on the same box. so as long as you
  have a reasonable performance expectation and size the system correct
  there is no issue.

  2. shared vs FPO vs shared nothing (just replication) . the main issue
  people overlook in this scenario is the absence of read/write caches in
  FPO or shared nothing configurations. every physical disk drive can only
  do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte
  its pretty much the same effort. particular on metadata this bites you
  really badly as every of this tiny i/os eats one of your 100 iops a disk
  can do and quickly you used up all your iops on the drives. if you have
  any form of raid controller (sw or hw) it typically implements at minimum
  a read cache on most systems a read/write cache which will significant
  increase the number of logical i/os one can do against a disk , my best
  example is always if you have a workload that does 4k seq DIO writes to a
  single disk, if you have no raid controller you can do 400k/sec in this
  workload if you have a reasonable ok write cache in front of the cache
  you can do 50 times that much. so especilly if you use snapshots, CES
  services or anything thats metadata intensive you want some type of raid
  protection with caching. btw. replication in the FS makes this even worse
  as now each write turns into 3 iops for the data + additional iops for
  the log records so you eat up your iops very quick .

  3. instead of shared SAN a shared SAS device is significantly cheaper but
  only scales to 2-4 nodes , the benefit is you only need 2 instead of 3
  nodes as you can use the disks as tiebreaker disks. if you also add some
  SSD's for the metadata and make use of HAWC and LROC you might get away
  from not needing a raid controller with cache as HAWC will solve that
  issue for you .

  just a few thoughts :-D

  sven


  ------------------------------------------
  Sven Oehme
  Scalable Storage Research
  email: oehmes at us.ibm.com
  Phone: +1 (408) 824-8904
  IBM Almaden Research Lab
  ------------------------------------------

  Zachary Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where
  it starts to get hard to scale down. You could do a 3 node GPFS clust

  From: Zachary Giles <zgiles at gmail.com>



  To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>


  Date: 03/04/2016 05:36 PM



  Subject: Re: [gpfsug-discuss] Small cluster


  Sent by: gpfsug-discuss-bounces at spectrumscale.org






  SMB too, eh? See this is where it starts to get hard to scale down. You
  could do a 3 node GPFS cluster with replication at remote sites, pulling
  in from AFM over the Net. If you want SMB too, you're probably going to
  need another pair of servers to act as the Protocol Servers on top of the
  3 GPFS servers. I think running them all together is not recommended, and
  probably I'd agree with that.
  Though, you could do it anyway. If it's for read-only and updated daily,
  eh, who cares. Again, depends on your GPFS experience and the balance
  between production, price, and performance :)

  On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com <
  Mark.Bush at siriuscom.com> wrote:
        Yes.  Really the only other option we have (and not a bad one) is
        getting a v7000 Unified in there (if we can get the price down far
        enough).  That’s not a bad option since all they really want is SMB
        shares in the remote.  I just keep thinking a set of servers would
        do the trick and be cheaper.



        From: Zachary Giles <zgiles at gmail.com>
        Reply-To: gpfsug main discussion list <
        gpfsug-discuss at spectrumscale.org>
        Date: Friday, March 4, 2016 at 10:26 AM

        To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
        Subject: Re: [gpfsug-discuss] Small cluster

        You can do FPO for non-Hadoop workloads. It just alters the disks
        below the GPFS filesystem layer and looks like a normal GPFS system
        (mostly).  I do think there were some restrictions on non-FPO nodes
        mounting FPO filesystems via multi-cluster.. not sure if those are
        still there.. any input on that from IBM?

        If small enough data, and with 3-way replication, it might just be
        wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K
        (just common throwing out numbers), 3 of those per site would fit
        in your budget.

        Again.. depending on your requirements, stability balance between
        'science experiment' vs production, GPFS knowledge level, etc
        etc...

        This is actually an interesting and somewhat missing space for
        small enterprises. If you just want 10-20TB active-active online
        everywhere, say, for VMware, or NFS, or something else, there arent
        all that many good solutions today that scale down far enough and
        are a decent price. It's easy with many many PB, but small.. idk. I
        think the above sounds good as anything without going SAN-crazy.



        On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com <
        Mark.Bush at siriuscom.com> wrote:
        I guess this is really my question.  Budget is less than $50k per
        site and they need around 20TB storage.  Two nodes with MD3 or
        something may work.  But could it work (and be successful) with
        just servers and internal drives?  Should I do FPO for non hadoop
        like workloads?  I didn’t think I could get native raid except in
        the ESS (GSS no longer exists if I remember correctly).  Do I just
        make replicas and call it good?


        Mark

        From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc
        A Kaplan <makaplan at us.ibm.com>
        Reply-To: gpfsug main discussion list <
        gpfsug-discuss at spectrumscale.org>
        Date: Friday, March 4, 2016 at 10:09 AM
        To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
        Subject: Re: [gpfsug-discuss] Small cluster

        Jon, I don't doubt your experience, but it's not quite fair or even
        sensible to make a decision today based on what was available in
        the GPFS 2.3 era.

        We are now at GPFS 4.2 with support for 3 way replication and FPO.

        Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS
        solutions and more.

        So more choices, more options, making finding an "optimal" solution
        more difficult.

        To begin with, as with any provisioning problem, one should try to
        state: requirements, goals, budgets, constraints, failure/tolerance
        models/assumptions,
        expected workloads, desired performance, etc, etc.



        This message (including any attachments) is intended only for the
        use of the individual or entity to which it is addressed and may
        contain information that is non-public, proprietary, privileged,
        confidential, and exempt from disclosure under applicable law. If
        you are not the intended recipient, you are hereby notified that
        any use, dissemination, distribution, or copying of this
        communication is strictly prohibited. This message may be viewed by
        parties at Sirius Computer Solutions other than those named in the
        message header. This message does not contain an official
        representation of Sirius Computer Solutions. If you have received
        this communication in error, notify Sirius Computer Solutions
        immediately and (i) destroy this message if a facsimile or (ii)
        delete this message immediately if this is an electronic
        communication. Thank you.


        Sirius Computer Solutions

        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss




        --
        Zach Giles
        zgiles at gmail.com

        _______________________________________________
        gpfsug-discuss mailing list
        gpfsug-discuss at spectrumscale.org
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss





  --
  Zach Giles
  zgiles at gmail.com_______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss



  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment
  "graycol.gif" deleted by Sven Oehme/Almaden/IBM]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/8f1e1590/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/8f1e1590/attachment-0002.gif>


More information about the gpfsug-discuss mailing list