<div dir="ltr">Sort of trailing on this thread - Is a bonded active-active 10gig ethernet network enough bandwidth to run data and heartbeat/admin on the same network?  I assume it comes down to a question of latency and congestion but would like to hear others' stories.<div><br></div><div>Is anyone doing anything fancy with QOS to make sure admin/heartbeat traffic is not delayed?<br><div><br></div><div>All of our current clusters use Infiniband for data and mgt traffic, but we are building a cluster that has dual 10gigE to each compute node. The NSD servers have 40gigE connections to the core network where 10gigE switches uplink.</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 22, 2016 at 4:57 AM, Ashish Thandavan <span dir="ltr"><<a href="mailto:ashish.thandavan@cs.ox.ac.uk" target="_blank">ashish.thandavan@cs.ox.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Richard,<br>

<br>

Thank you, that is very good to know!<br>

<br>

Regards,<br>

Ash<div class="HOEnZb"><div class="h5"><br>

<br>

On 22/07/16 09:36, Sobey, Richard A wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Ash<br>

<br>

Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1:<br>

<br>

BONDING_OPTS="mode=1 miimon=200"<br>

<br>

If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team.<br>

<br>

Richard<br>

<br>

-----Original Message-----<br>

From: <a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" target="_blank">gpfsug-discuss-bounces@spectrumscale.org</a> [mailto:<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org" target="_blank">gpfsug-discuss-bounces@spectrumscale.org</a>] On Behalf Of Ashish Thandavan<br>

Sent: 21 July 2016 11:26<br>

To: <a href="mailto:gpfsug-discuss@spectrumscale.org" target="_blank">gpfsug-discuss@spectrumscale.org</a><br>

Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience<br>

<br>

Dear all,<br>

<br>

Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of?<br>

<br>

I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use?<br>

<br>

Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this.<br>

Is there a recommended bonding mode?<br>

<br>

If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down?<br>

<br>

Thank you,<br>

<br>

Regards,<br>

Ash<br>

<br>

<br>

<br>

--<br>

-------------------------<br>

Ashish Thandavan<br>

<br>

UNIX Support Computing Officer<br>

Department of Computer Science<br>

University of Oxford<br>

Wolfson Building<br>

Parks Road<br>

Oxford OX1 3QD<br>

<br>

Phone: 01865 610733<br>

Email: <a href="mailto:ashish.thandavan@cs.ox.ac.uk" target="_blank">ashish.thandavan@cs.ox.ac.uk</a><br>

<br>

_______________________________________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

_______________________________________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

</blockquote>

<br>

-- <br>

-------------------------<br>

Ashish Thandavan<br>

<br>

UNIX Support Computing Officer<br>

Department of Computer Science<br>

University of Oxford<br>

Wolfson Building<br>

Parks Road<br>

Oxford OX1 3QD<br>

<br>

Phone: 01865 610733<br>

Email: <a href="mailto:ashish.thandavan@cs.ox.ac.uk" target="_blank">ashish.thandavan@cs.ox.ac.uk</a><br>

<br>

_______________________________________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</a><br>

</div></div></blockquote></div><br></div>