[gpfsug-discuss] Small cluster

Mon Mar 7 21:10:48 GMT 2016

Thanks Yuri, this solidifies some of the conclusions I’ve drawn from this conversation.  Thank you all for your responses.  This is a great forum filled with very knowledgeable folks.

Mark

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Yuri L Volobuev <volobuev at us.ibm.com<mailto:volobuev at us.ibm.com>>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Monday, March 7, 2016 at 2:58 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Small cluster

This use case is a good example of how it's hard to optimize across multiple criteria.

If you want a pre-packaged solution that's proven and easy to manage, StorWize V7000 Unified is the ticket. Design-wise, it's as good a fit for your requirements as such things get. Price may be an issue though, as usual.

If you're OK with rolling your own complex solution, my recommendation would be to use a low-end shared (twin-tailed, via SAS or FC SAN) external disk solution, with 2-3 GPFS nodes accessing the disks directly, i.e. via the local block device interface. This avoids the pitfalls of data/metadata replication, and offers a decent blend of performance, fault tolerance, and disk management. You can use disk-based quorum if going with 2 nodes, or traditional node majority quorum if using 3 nodes, either way would work. There's no need to do any separation of roles (CES, quorum, managers, etc), provided the nodes are adequately provisioned with memory and aren't routinely overloaded, in which case you just need to add more nodes instead of partitioning what you have.

Using internal disks and relying on GPFS data/metadata replication, with or without FPO, would mean taking the hard road. You may be able to spend the least on hardware in such a config (although the 33% disk utilization rate for triplication makes this less clear, if capacity is an issue), but the operational challenges are going to be substantial. This would be a viable config, but there are unavoidable tradeoffs caused by replication: (1) writes are very expensive, which limits the overall cluster capability for non-read-only workloads, (2) node and disk failures require a round of re-replication, or "re-protection", which takes time and bandwidth, limiting the overall capability further, (3) disk management can be a challenge, as there's no software/hardware component to assist with identifying failing/failed disks. As far as not going off the beaten path, this is not it... Exporting protocols from a small triplicated file system is not a typical mode of deployment of Spectrum Scale, you'd be blazing some new trails.

As stated already in several responses, there's no hard requirement that CES Protocol nodes must be entirely separate from any other roles in the general Spectrum Scale deployment scenario. IBM expressly disallows co-locating Protocol nodes with ESS servers, due to resource consumption complications, but for non-ESS cases it's merely a recommendation to run Protocols on nodes that are not otherwise encumbered by having to provide other services. Of course, the config that's the best for performance is not the cheapest. CES doesn't reboot nodes to recover from NFS problems, unlike cNFS (which has to, given its use of kernel NFS stack). Of course, a complex software stack is a complex software stack, so there's greater potential for things to go sideways, in particular due to the lack of resources.

FPO vs plain replication: this only matters if you have apps that are capable of exploiting data locality. FPO changes the way GPFS stripes data across disks. Without FPO, GPFS does traditional wide striping of blocks across all disks in a given storage pool. When FPO is in use, data in large files is divided in large (e.g. 1G) chunks, and there's a node that holds an entire chunk on its internal disks. An application that knows how to query data block layout of a given file can then schedule the job that needs to read from this chunk on the node that holds a local copy. This makes a lot of sense for integrated data analytics workloads, a la Map Reduce with Hadoop, but doesn't make sense for generic apps like Samba.

I'm not sure what language in the FAQ creates the impression that the SAN deployment model is somehow incompatible with running Procotol services. This is perfectly fine.

yuri

[Inactive hide details for Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within]Jan-Frode Myklebust ---03/06/2016 10:12:07 PM---I agree, but would also normally want to stay within whatever is recommended.

From: Jan-Frode Myklebust <janfrode at tanso.net<mailto:janfrode at tanso.net>>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>,
Date: 03/06/2016 10:12 PM
Subject: Re: [gpfsug-discuss] Small cluster
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>

________________________________

I agree, but would also normally want to stay within whatever is recommended.

What about quorum/manager functions? Also OK to run these on the CES nodes in a 2-node cluster, or any reason to partition these out so that we then have a 4-node cluster running on 2 physical machines?

-jf
søn. 6. mar. 2016 kl. 21.28 skrev Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>>:

As Sven wrote, the FAQ does not "prevent" anything.  It's just a recommendation someone came up with.  Which may or may not apply to your situation.

Partitioning a server into two servers might be a good idea if you really need the protection/isolation.  But I expect you are limiting the potential performance of the overall system, compared to running a single Unix image with multiple processes that can share resource and communicate more freely.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss[cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@][cid:2__=07BBF5FCDFFC0B518f9e8a93df938690918c07B@]_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/499eb420/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: graycol.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/499eb420/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0B132319.gif
Type: image/gif
Size: 21994 bytes
Desc: 0B132319.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160307/499eb420/attachment-0005.gif>