[gpfsug-discuss] Tuning AFM for high throughput/high IO over _really_ long distances (Jan-Frode Myklebust)

Scott Fadden sfadden at us.ibm.com
Wed Nov 9 18:24:15 GMT 2016


So you are using the NSD protocol for data transfers over multi-cluster? 
If so the TCP and thread tuning should help as well. 


Scott Fadden
Spectrum Scale - Technical Marketing 
Phone: (503) 880-5833 
sfadden at us.ibm.com
http://www.ibm.com/systems/storage/spectrum/scale



From:   Jake Carroll <jake.carroll at uq.edu.au>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   11/09/2016 10:09 AM
Subject:        Re: [gpfsug-discuss] Tuning AFM for high throughput/high 
IO over _really_ long distances (Jan-Frode Myklebust)
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi jf…

 
>>    Mostly curious, don't have experience in such environments, but ... 
Is this
    AFM over NFS or NSD protocol? Might be interesting to try the other 
option
    -- and also check how nsdperf performs over such distance/latency.
 
As it turns out, it seems, very few people do. 

I will test nsdperf over it and see how it performs. And yes, it is AFM → 
AFM. No NFS involved here!

-jc


 
    ------------------------------
 
    Message: 2
    Date: Wed, 9 Nov 2016 17:39:05 +0000
    From: Jake Carroll <jake.carroll at uq.edu.au>
    To: "gpfsug-discuss at spectrumscale.org"
                 <gpfsug-discuss at spectrumscale.org>
    Subject: [gpfsug-discuss] Tuning AFM for high throughput/high IO over
                 _really_ long distances
    Message-ID: <83652C3D-0802-4CC2-B636-9FAA31EF5AF0 at uq.edu.au>
    Content-Type: text/plain; charset="utf-8"
 
    Hi.
 
    I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over 
a really long distance. About 180ms of latency between the two clusters 
and around 13,000km of optical path. Fortunately for me, I?ve actually got 
near theoretical maximum IO over the NIC?s between the clusters and I?m 
iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 
all the way through.
 
    Anyway ? I?m finding my AFM traffic to be dragging its feet and I 
don?t really understand why that might be. I?ve verified the links and 
transports ability as I said above with iPerf, and CERN?s FDT to near 
10Gbit/sec.
 
    I also verified the clusters on both sides in terms of disk IO and 
they both seem easily capable in IOZone and IOR tests of multiple GB/sec 
of throughput.
 
    So ? my questions:
 
 
    1.       Are there very specific tunings AFM needs for high 
latency/long distance IO?
 
    2.       Are there very specific NIC/TCP-stack tunings (beyond the 
type of thing we already have in place) that benefits AFM over really long 
distances and high latency?
 
    3.       We are seeing on the ?cache? side really lazy/sticky ?ls 
?als? in the home mount. It sometimes takes 20 to 30 seconds before the 
command line will report back with a long listing of files. Any ideas why 
it?d take that long to get a response from ?home?.
 
    We?ve got our TCP stack setup fairly aggressively, on all hosts that 
participate in these two clusters.
 
    ethtool -C enp2s0f0 adaptive-rx off
    ifconfig enp2s0f0 txqueuelen 10000
    sysctl -w net.core.rmem_max=536870912
    sysctl -w net.core.wmem_max=536870912
    sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
    sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
    sysctl -w net.core.netdev_max_backlog=250000
    sysctl -w net.ipv4.tcp_congestion_control=htcp
    sysctl -w net.ipv4.tcp_mtu_probing=1
 
    I modified a couple of small things on the AFM ?cache? side to see if 
it?d make a difference such as:
 
    mmchconfig afmNumWriteThreads=4
    mmchconfig afmNumReadThreads=4
 
    But no difference so far.
 
    Thoughts would be appreciated. I?ve done this before over much shorter 
distances (30Km) and I?ve flattened a 10GbE wire without really 
tuning?anything. Are my large in-flight-packets 
numbers/long-time-to-acknowledgement semantics going to hurt here? I 
really thought AFM might be well designed for exactly this kind of work at 
long distance *and* high throughput ? so I must be missing something!
 
    -jc
 
 
 
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/d4f4d9a7/attachment-0001.html
>
 
    ------------------------------
 
    Message: 3
    Date: Wed, 09 Nov 2016 18:05:21 +0000
    From: Jan-Frode Myklebust <janfrode at tanso.net>
    To: "gpfsug-discuss at spectrumscale.org"
                 <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO
                 over _really_ long distances
    Message-ID:
 <CAHwPathy=4z=jDXN5qa3ys+Z-_7n=tsJh7cZ3ZKzFwQMG34zwg at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
 
    Mostly curious, don't have experience in such environments, but ... Is 
this
    AFM over NFS or NSD protocol? Might be interesting to try the other 
option
    -- and also check how nsdperf performs over such distance/latency.
 
 
 
    -jf
    ons. 9. nov. 2016 kl. 18.39 skrev Jake Carroll 
<jake.carroll at uq.edu.au>:
 
    > Hi.
    >
    >
    >
    > I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up 
over a
    > really long distance. About 180ms of latency between the two 
clusters and
    > around 13,000km of optical path. Fortunately for me, I?ve actually 
got near
    > theoretical maximum IO over the NIC?s between the clusters and I?m
    > iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All 
MTU9000
    > all the way through.
    >
    >
    >
    > Anyway ? I?m finding my AFM traffic to be dragging its feet and I 
don?t
    > really understand why that might be. I?ve verified the links and 
transports
    > ability as I said above with iPerf, and CERN?s FDT to near 
10Gbit/sec.
    >
    >
    >
    > I also verified the clusters on both sides in terms of disk IO and 
they
    > both seem easily capable in IOZone and IOR tests of multiple GB/sec 
of
    > throughput.
    >
    >
    >
    > So ? my questions:
    >
    >
    >
    > 1.       Are there very specific tunings AFM needs for high 
latency/long
    > distance IO?
    >
    > 2.       Are there very specific NIC/TCP-stack tunings (beyond the 
type
    > of thing we already have in place) that benefits AFM over really 
long
    > distances and high latency?
    >
    > 3.       We are seeing on the ?cache? side really lazy/sticky ?ls 
?als?
    > in the home mount. It sometimes takes 20 to 30 seconds before the 
command
    > line will report back with a long listing of files. Any ideas why 
it?d take
    > that long to get a response from ?home?.
    >
    >
    >
    > We?ve got our TCP stack setup fairly aggressively, on all hosts that
    > participate in these two clusters.
    >
    >
    >
    > ethtool -C enp2s0f0 adaptive-rx off
    >
    > ifconfig enp2s0f0 txqueuelen 10000
    >
    > sysctl -w net.core.rmem_max=536870912
    >
    > sysctl -w net.core.wmem_max=536870912
    >
    > sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
    >
    > sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
    >
    > sysctl -w net.core.netdev_max_backlog=250000
    >
    > sysctl -w net.ipv4.tcp_congestion_control=htcp
    >
    > sysctl -w net.ipv4.tcp_mtu_probing=1
    >
    >
    >
    > I modified a couple of small things on the AFM ?cache? side to see 
if it?d
    > make a difference such as:
    >
    >
    >
    > mmchconfig afmNumWriteThreads=4
    >
    > mmchconfig afmNumReadThreads=4
    >
    >
    >
    > But no difference so far.
    >
    >
    >
    > Thoughts would be appreciated. I?ve done this before over much 
shorter
    > distances (30Km) and I?ve flattened a 10GbE wire without really
    > tuning?anything. Are my large in-flight-packets
    > numbers/long-time-to-acknowledgement semantics going to hurt here? I 
really
    > thought AFM might be well designed for exactly this kind of work at 
long
    > distance **and** high throughput ? so I must be missing something!
    >
    >
    >
    > -jc
    >
    >
    >
    >
    >
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <
http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/f44369ab/attachment.html
>
 
    ------------------------------
 
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
 
    End of gpfsug-discuss Digest, Vol 58, Issue 12
    **********************************************
 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161109/a3656a78/attachment-0002.htm>


More information about the gpfsug-discuss mailing list