[gpfsug-discuss] Small cluster
Sven Oehme
oehmes at us.ibm.com
Sat Mar 5 13:31:40 GMT 2016
as i stated in my previous post , its a recommendation so people don't
overload the NSD servers to have them become non responsive or even forced
rebooted (e.g. when you configure cNFS auto reboot on same node), it
doesn't mean it doesn't work or is not supported.
if all you are using this cluster for is NAS services, then this
recommendation makes even less sense as the whole purpose on why the
recommendation is there to begin with is that if NFS would overload a node
that also serves as NSD server for other nodes it would impact the other
nodes that use the NSD protocol, but if there are no NSD clients there is
nothing to protect because if NFS is down all clients are not able to
access data, even if your NSD servers are perfectly healthy...
if you have a fairly large system with many NSD Servers, many clients as
well as NAS clients this recommendation is correct, but not in the scenario
you described below.
i will work with the team to come up with a better wording for this in the
FAQ.
------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------
From: Jan-Frode Myklebust <janfrode at tanso.net>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc: Sven Oehme/Almaden/IBM at IBMUS
Date: 03/05/2016 02:17 PM
Subject: Re: [gpfsug-discuss] Small cluster
Regarding #1, the FAQ has recommendation to not run CES nodes directly
attached to storage:
""" • NSD server functionality and storage attached to Protocol node. We
recommend that Protocol nodes do not take on these functions
"""
For small CES clusters we're now configuring 2x P822L with one partition on
each server owning FC adapters and acting as NSD server/quorum/manager and
the other partition being CES node accessing disk via IP.
I would much rather have a plain SAN model cluster were all nodes accessed
disk directly (probably still with a dedicated quorum/manager partition),
but this FAQ entry is preventing that..
-jf
fre. 4. mar. 2016 kl. 19.04 skrev Sven Oehme <oehmes at us.ibm.com>:
Hi,
a couple of comments to the various infos in this thread.
1. the need to run CES on separate nodes is a recommendation, not a
requirement and the recommendation comes from the fact that if you have
heavy loaded NAS traffic that gets the system to its knees, you can take
your NSD service down with you if its on the same box. so as long as you
have a reasonable performance expectation and size the system correct
there is no issue.
2. shared vs FPO vs shared nothing (just replication) . the main issue
people overlook in this scenario is the absence of read/write caches in
FPO or shared nothing configurations. every physical disk drive can only
do ~100 iops and thats independent if the io size is 1 byte or 1 megabyte
its pretty much the same effort. particular on metadata this bites you
really badly as every of this tiny i/os eats one of your 100 iops a disk
can do and quickly you used up all your iops on the drives. if you have
any form of raid controller (sw or hw) it typically implements at minimum
a read cache on most systems a read/write cache which will significant
increase the number of logical i/os one can do against a disk , my best
example is always if you have a workload that does 4k seq DIO writes to a
single disk, if you have no raid controller you can do 400k/sec in this
workload if you have a reasonable ok write cache in front of the cache
you can do 50 times that much. so especilly if you use snapshots, CES
services or anything thats metadata intensive you want some type of raid
protection with caching. btw. replication in the FS makes this even worse
as now each write turns into 3 iops for the data + additional iops for
the log records so you eat up your iops very quick .
3. instead of shared SAN a shared SAS device is significantly cheaper but
only scales to 2-4 nodes , the benefit is you only need 2 instead of 3
nodes as you can use the disks as tiebreaker disks. if you also add some
SSD's for the metadata and make use of HAWC and LROC you might get away
from not needing a raid controller with cache as HAWC will solve that
issue for you .
just a few thoughts :-D
sven
------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------
Zachary Giles ---03/04/2016 05:36:50 PM---SMB too, eh? See this is where
it starts to get hard to scale down. You could do a 3 node GPFS clust
From: Zachary Giles <zgiles at gmail.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 03/04/2016 05:36 PM
Subject: Re: [gpfsug-discuss] Small cluster
Sent by: gpfsug-discuss-bounces at spectrumscale.org
SMB too, eh? See this is where it starts to get hard to scale down. You
could do a 3 node GPFS cluster with replication at remote sites, pulling
in from AFM over the Net. If you want SMB too, you're probably going to
need another pair of servers to act as the Protocol Servers on top of the
3 GPFS servers. I think running them all together is not recommended, and
probably I'd agree with that.
Though, you could do it anyway. If it's for read-only and updated daily,
eh, who cares. Again, depends on your GPFS experience and the balance
between production, price, and performance :)
On Fri, Mar 4, 2016 at 11:30 AM, Mark.Bush at siriuscom.com <
Mark.Bush at siriuscom.com> wrote:
Yes. Really the only other option we have (and not a bad one) is
getting a v7000 Unified in there (if we can get the price down far
enough). That’s not a bad option since all they really want is SMB
shares in the remote. I just keep thinking a set of servers would
do the trick and be cheaper.
From: Zachary Giles <zgiles at gmail.com>
Reply-To: gpfsug main discussion list <
gpfsug-discuss at spectrumscale.org>
Date: Friday, March 4, 2016 at 10:26 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Small cluster
You can do FPO for non-Hadoop workloads. It just alters the disks
below the GPFS filesystem layer and looks like a normal GPFS system
(mostly). I do think there were some restrictions on non-FPO nodes
mounting FPO filesystems via multi-cluster.. not sure if those are
still there.. any input on that from IBM?
If small enough data, and with 3-way replication, it might just be
wise to do internal storage and 3x rep. A 36TB 2U server is ~$10K
(just common throwing out numbers), 3 of those per site would fit
in your budget.
Again.. depending on your requirements, stability balance between
'science experiment' vs production, GPFS knowledge level, etc
etc...
This is actually an interesting and somewhat missing space for
small enterprises. If you just want 10-20TB active-active online
everywhere, say, for VMware, or NFS, or something else, there arent
all that many good solutions today that scale down far enough and
are a decent price. It's easy with many many PB, but small.. idk. I
think the above sounds good as anything without going SAN-crazy.
On Fri, Mar 4, 2016 at 11:21 AM, Mark.Bush at siriuscom.com <
Mark.Bush at siriuscom.com> wrote:
I guess this is really my question. Budget is less than $50k per
site and they need around 20TB storage. Two nodes with MD3 or
something may work. But could it work (and be successful) with
just servers and internal drives? Should I do FPO for non hadoop
like workloads? I didn’t think I could get native raid except in
the ESS (GSS no longer exists if I remember correctly). Do I just
make replicas and call it good?
Mark
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc
A Kaplan <makaplan at us.ibm.com>
Reply-To: gpfsug main discussion list <
gpfsug-discuss at spectrumscale.org>
Date: Friday, March 4, 2016 at 10:09 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Small cluster
Jon, I don't doubt your experience, but it's not quite fair or even
sensible to make a decision today based on what was available in
the GPFS 2.3 era.
We are now at GPFS 4.2 with support for 3 way replication and FPO.
Also we have Raid controllers, IB, and "Native Raid" and ESS, GSS
solutions and more.
So more choices, more options, making finding an "optimal" solution
more difficult.
To begin with, as with any provisioning problem, one should try to
state: requirements, goals, budgets, constraints, failure/tolerance
models/assumptions,
expected workloads, desired performance, etc, etc.
This message (including any attachments) is intended only for the
use of the individual or entity to which it is addressed and may
contain information that is non-public, proprietary, privileged,
confidential, and exempt from disclosure under applicable law. If
you are not the intended recipient, you are hereby notified that
any use, dissemination, distribution, or copying of this
communication is strictly prohibited. This message may be viewed by
parties at Sirius Computer Solutions other than those named in the
message header. This message does not contain an official
representation of Sirius Computer Solutions. If you have received
this communication in error, notify Sirius Computer Solutions
immediately and (i) destroy this message if a facsimile or (ii)
delete this message immediately if this is an electronic
communication. Thank you.
Sirius Computer Solutions
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Zach Giles
zgiles at gmail.com_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment
"graycol.gif" deleted by Sven Oehme/Almaden/IBM]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/8f1e1590/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160305/8f1e1590/attachment-0002.gif>
More information about the gpfsug-discuss
mailing list