[gpfsug-discuss] Achieving high parallelism with AFM using NFS?

Mon Nov 14 09:59:26 GMT 2016

Hello Jake,

You will have to set the mapping to include all the GW's that you want to 
involve in the transfer. Please refer to the example provided in the 
Knowledge Centre:
http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1ins_paralleldatatransfersafm.htm

Thanks and Regards
Radhika

Message: 1
Date: Sun, 13 Nov 2016 14:18:38 +0000
From: Jake Carroll <jake.carroll at uq.edu.au>
To: "gpfsug-discuss at spectrumscale.org"
                 <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Achieving high parallelism with AFM using
                 NFS?
Message-ID: <025F8914-F7A0-465F-9B99-961F70DA2B03 at uq.edu.au>
Content-Type: text/plain; charset="utf-8"

Hi all.

After some help from IBM, we?ve concluded (and been told) that AFM over 
the NSD protocol when latency is greater than around 50ms on the RTT is 
effectively unusable. We?ve proven that now, so it is time to move on from 
the NSD protocol being an effective option in those conditions (unless IBM 
can consider it something worthy of an RFE and can fix it!).

The problem we face now, is one of parallelism and filling that 
10GbE/40GbE/100GbE pipe efficiently, when using NFS as the transport 
provider for AFM.

On my test cluster at ?Cache? side I?ve got two or three gateways:

[root at mc-5 ~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         sdx-gpfs.xxxxxxxxxxxxxxxx
  GPFS cluster id:           12880500218013865782
  GPFS UID domain:           sdx-gpfs. xxxxxxxxxxxxxxxx
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name           IP address    Admin node name 
Designation
---------------------------------------------------------------------------------------
   1   mc-5. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-5.hidden.net 
quorum-manager
   2   mc-6. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-6. hidden.net 
quorum-manager-gateway
   3   mc-7. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-7. hidden.net 
quorum-manager-gateway
   4   mc-8. xxxxxxxxxxxxxxxx.net  ip.addresses.hidden  mc-8. hidden.net 
quorum-manager-gateway

The bit I really don?t get is:

1.       Why no traffic ever seems to go through mc-6 or mc-8 back to my 
?home? directly and

2.       Why it only ever lists my AFM-cache fileset being associated with 
one gateway (mc-7).

I can see traffic flowing through mc-6 sometimes?but when it does, it all 
seems to channel back through mc-7 THEN back to the AFM-home. Am I missing 
something?

This is where I see one of the gateway?s listed (but never the others?).

[root at mc-5 ~]# mmafmctl afmcachefs getstate
Fileset Name    Fileset Target                                Cache State  
       Gateway Node    Queue Length   Queue numExec
------------    -------------- -------------        ------------ 
------------   -------------
afm-home        nfs://omnipath2/gpfs-flash/afm-home           Active   
mc-7            0              746636

I got told I needed to setup ?explicit maps? back to my home cluster to 
achieve parallelism:

[root at mc-5 ~]# mmafmconfig show
Map name:             omnipath1
Export server map:    address.is.hidden.100/mc-6.ip.address.hidden

Map name:             omnipath2
Export server map:    address.is.hidden.101/mc-7.ip.address.hidden

But ? I have never seen any traffic come back from mc-6 to omnipath1.

What am I missing, and how do I actually achieve significant enough 
parallelism over an NFS transport to fill my 10GbE pipe?

I?ve seen maybe a couple of gigabits per second from the mc-7 host writing 
back to the omnipath2 host ? and that was really trying my level best to 
put as many files onto the afm-cache at this side and hoping that enough 
threads pick up enough different files to start transferring files down 
the AFM simultaneously ? but what I?d really like is those large files (or 
small, up to the thresholds set) to break into parallel chunks and ALL 
transfer as fast as possible, utilising as much of the 10GbE as they can.

Maybe I am missing fundamental principles in the way AFM works?

Thanks.

-jc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161114/2625551d/attachment-0002.htm>