[gpfsug-discuss] Small cluster

Yuri L Volobuev volobuev at us.ibm.com
Mon Mar 7 20:58:37 GMT 2016


This use case is a good example of how it's hard to optimize across
multiple criteria.

If you want a pre-packaged solution that's proven and easy to manage,
StorWize V7000 Unified is the ticket.  Design-wise, it's as good a fit for
your requirements as such things get.  Price may be an issue though, as
usual.

If you're OK with rolling your own complex solution, my recommendation
would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external
disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via
the local block device interface.  This avoids the pitfalls of
data/metadata replication, and offers a decent blend of performance, fault
tolerance, and disk management.  You can use disk-based quorum if going
with 2 nodes, or traditional node majority quorum if using 3 nodes, either
way would work.  There's no need to do any separation of roles (CES,
quorum, managers, etc), provided the nodes are adequately provisioned with
memory and aren't routinely overloaded, in which case you just need to add
more nodes instead of partitioning what you have.

Using internal disks and relying on GPFS data/metadata replication, with or
without FPO, would mean taking the hard road.  You may be able to spend the
least on hardware in such a config (although the 33% disk utilization rate
for triplication makes this less clear, if capacity is an issue), but the
operational challenges are going to be substantial.  This would be a viable
config, but there are unavoidable tradeoffs caused by replication: (1)
writes are very expensive, which limits the overall cluster capability for
non-read-only workloads, (2) node and disk failures require a round of
re-replication, or "re-protection", which takes time and bandwidth,
limiting the overall capability further, (3) disk management can be a
challenge, as there's no software/hardware component to assist with
identifying failing/failed disks.  As far as not going off the beaten path,
this is not it... Exporting protocols from a small triplicated file system
is not a typical mode of deployment of Spectrum Scale, you'd be blazing
some new trails.

As stated already in several responses, there's no hard requirement that
CES Protocol nodes must be entirely separate from any other roles in the
general Spectrum Scale deployment scenario.  IBM expressly disallows
co-locating Protocol nodes with ESS servers, due to resource consumption
complications, but for non-ESS cases it's merely a recommendation to run
Protocols on nodes that are not otherwise encumbered by having to provide
other services.  Of course, the config that's the best for performance is
not the cheapest.  CES doesn't reboot nodes to recover from NFS problems,
unlike cNFS (which has to, given its use of kernel NFS stack).  Of course,
a complex software stack is a complex software stack, so there's greater
potential for things to go sideways, in particular due to the lack of
resources.

FPO vs plain replication: this only matters if you have apps that are
capable of exploiting data locality.  FPO changes the way GPFS stripes data
across disks.  Without FPO, GPFS does traditional wide striping of blocks
across all disks in a given storage pool.  When FPO is in use, data in
large files is divided in large (e.g. 1G) chunks, and there's a node that
holds an entire chunk on its internal disks.  An application that knows how
to query data block layout of a given file can then schedule the job that
needs to read from this chunk on the node that holds a local copy.  This
makes a lot of sense for integrated data analytics workloads, a la Map
Reduce with Hadoop, but doesn't make sense for generic apps like Samba.

I'm not sure what language in the FAQ creates the impression that the SAN
deployment model is somehow incompatible with running Procotol services.
This is perfectly fine.

yuri



From:	Jan-Frode Myklebust <janfrode at tanso.net>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
Date:	03/06/2016 10:12 PM
Subject:	Re: [gpfsug-discuss] Small cluster
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



I agree, but would also normally want to stay within whatever is
recommended.

What about quorum/manager functions? Also OK to run these on the CES nodes
in a 2-node cluster, or any reason to partition these out so that we then
have a 4-node cluster running on 2 physical machines?


-jf
søn. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan <makaplan at us.ibm.com>:
  As Sven wrote, the FAQ does not "prevent" anything.  It's just a
  recommendation someone came up with.  Which may or may not apply to your
  situation.

  Partitioning a server into two servers might be a good idea if you really
  need the protection/isolation.  But I expect you are limiting the
  potential performance of the overall system, compared to running a single
  Unix image with multiple processes that can share resource and
  communicate more freely.




  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
  _______________________________________________
  gpfsug-discuss mailing list
  gpfsug-discuss at spectrumscale.org
  http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/f1cc12ae/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/f1cc12ae/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0B132319.gif
Type: image/gif
Size: 21994 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/f1cc12ae/attachment-0005.gif>


More information about the gpfsug-discuss mailing list