[gpfsug-discuss] IO performance of replicated GPFS filesystem

Sanchez, Paul Paul.Sanchez at deshaw.com
Tue Dec 1 17:29:48 GMT 2015


All of Marc’s points are good.  A few more things to be aware of with regard to replicated writes:


·         Each client performs its own replication when it writes file data.  So if you have several clients, each writing files concurrently, the “bandwidth burden” of the replication is distributed among them.  It’s typical that your write throughput will be limited by disk in this case.

·         Because clients perform their own write replication, the max write throughput of a NSD client is limited to <50% of its available network bandwidth for 2x replication, or <33% for 3x replication, since it must share the network interface (Ethernet, IB) to access the NSDs in each failure group.

·         If your network topology is asymmetric (e.g. multiple dataceters with higher latency and limited bandwidth between them) you may also benefit from using “readReplicaPolicy=fastest” to keep read traffic “local” and avoid crossing congested or high-latency paths.


From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan
Sent: Tuesday, December 01, 2015 12:02 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] IO performance of replicated GPFS filesystem

Generally yes.  When reading, more disks is always better then fewer disks, both for replication and with striping over several or many disks.
When writing, more disks is good with striping.  But yes, replication costs you extra writes.  Those writes don't necessarily cost you loss of time, provided they can be done concurrently.

When I write "disks" I mean storage devices that can be accessed concurrently.   Watch out for virtual LUNs.
With conventional controllers and drives, it does GPFS little or no good when multiple LUNs map to the same real disk device, since multiple operations to different LUNs will ultimately be serialized at
one real disk arm/head!

For high performance, you should not be thinking about "two NSDs" ... you should be thinking about many NSD, so data and metadata can be striped, and written and read concurrently.
But yes, for replication purposes you have to consider defining and properly configuring at least two "failure groups".



From:        "Tomasz.Wolski at ts.fujitsu.com<mailto:Tomasz.Wolski at ts.fujitsu.com>" <Tomasz.Wolski at ts.fujitsu.com<mailto:Tomasz.Wolski at ts.fujitsu.com>>
To:        "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        11/30/2015 05:46 AM
Subject:        [gpfsug-discuss] IO performance of replicated GPFS filesystem
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



Hi All,

I could use some help of the experts here ☺Please correct me if I’m wrong: I suspect that GPFS filesystem READ performance is better when filesystem is replicated to i.e. two failure groups, where these failure groups are placed on separate RAID controllers. In this case WRITE performance should be worse, since the same data must go to two locations. What about situation where GPFS filesystem has two metadataOnly NSDs which are also replicated? Does metadata READ performance increase in this way as well (and WRITE decreases)?

Best regards,
Tomasz Wolski_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20151201/cb98d9e4/attachment-0002.htm>


More information about the gpfsug-discuss mailing list