<font size=3>>The plan is to load the new cache from the old GPFS then

dump once the cache is full.</font><br><br><font size=3>>We've already increase numThreashThreads from 4 to

8 and seen only marginal improvements, we could attempt to increase this

further.</font><br><br><br><font size=3>AFM have replication performance issues with small files

on high latency networks. There is a plan to fix these issues.</font><br><br><font size=3>>I'm also wondering if its worth increasing the Refresh

Intervals to speed up read of already cache files. At this stage we want

to fill the cache and don't care about write back until we switch the target

to the >new NFS/GPFS from our old GPFS storage to a new box back at

our off-site location, (otherwise known as the office)</font><br><br><font size=2 face="sans-serif">Increasing the refresh intervals will

improve the application performance at cache site. It is better to set

large refresh intervals if the cache is the only writer.</font><br><br><font size=2 face="sans-serif">~Venkat (vpuvvada@in.ibm.com)</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Peter Childs <p.childs@qmul.ac.uk></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">"gpfsug-discuss@spectrumscale.org"

<gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">01/04/2018 04:47 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [gpfsug-discuss]

Use AFM for migration of many small files</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><font size=3>We are doing something very similar using 4.2.3-4 and

GPFS 4.2.1-1 on the nfs backend. Did you have any success?</font><br><br><font size=3>The plan is to load the new cache from the old GPFS then

dump once the cache is full.</font><br><br><font size=3>We've already increase numThreashThreads from 4 to 8 and

seen only marginal improvements, we could attempt to increase this further.</font><br><br><font size=3>I'm also wondering if its worth increasing the Refresh

Intervals to speed up read of already cache files. At this stage we want

to fill the cache and don't care about write back until we switch the target

to the new NFS/GPFS from our old GPFS storage to a new box back at our

off-site location, (otherwise known as the office)</font><br><br><font size=3>[</font><a href=mailto:root@afmgateway1><font size=3 color=blue><u>root@afmgateway1</u></font></a><font size=3>scratch]# mmlsfileset home home -L --afm</font><br><font size=3>Filesets in file system 'home':</font><br><br><font size=3>Attributes for fileset home:</font><br><font size=3>=============================</font><br><font size=3>Status              

                   Linked</font><br><font size=3>Path              

                     /data2/home</font><br><font size=3>Id                

                     42</font><br><font size=3>Root inode              

               1343225859</font><br><font size=3>Parent Id              

                0</font><br><font size=3>Created              

                  Wed Jan

 3 12:32:33 2018</font><br><font size=3>Comment              

                  </font><br><font size=3>Inode space            

                41    

 </font><br><font size=3>Maximum number of inodes          

     100000000</font><br><font size=3>Allocated inodes            

           15468544</font><br><font size=3>Permission change flag          

       chmodAndSetacl</font><br><font size=3>afm-associated            

             Yes</font><br><font size=3>Target              

                   nfs://afm1/gpfs/data1/afm/home</font><br><font size=3>Mode              

                     single-writer</font><br><font size=3>File Lookup Refresh Interval        

   30 (default)</font><br><font size=3>File Open Refresh Interval        

     30 (default)</font><br><font size=3>Dir Lookup Refresh Interval        

    60 (default)</font><br><font size=3>Dir Open Refresh Interval        

      60 (default)</font><br><font size=3>Async Delay            

                15 (default)</font><br><font size=3>Last pSnapId            

               0</font><br><font size=3>Display Home Snapshots          

       no</font><br><font size=3>Number of Gateway Flush Threads        

8</font><br><font size=3>Prefetch Threshold          

           0 (default)</font><br><font size=3>Eviction Enabled            

           no</font><br><br><font size=3>Thanks in advance.</font><br><br><font size=3>Peter Childs</font><br><br><br><br><font size=3>On Tue, 2017-09-05 at 19:57 +0530, Venkateswara R Puvvada

wrote:</font><br><tt><font size=2>Which version of Spectrum Scale ? What is the fileset

mode ?</font></tt><font size=3><br></font><tt><font size=2><br>>We use AFM prefetch to migrate data between two clusters (using NFS).

This works fine with large files, say 1+GB. But we have millions of smaller

files,  about 1MB each. Here >I see just ~150MB/s – compare this

to the 1000+MB/s we get for larger files.<br><br>How was the performance measured ? If parallel IO is enabled, AFM uses

multiple gateway nodes to prefetch the large files (if file size if more

than 1GB). Performance difference between small and lager file is huge

(1000MB - 150MB = 850MB) here, and generally it is not the case. How many

files were present in list file for prefetch ? Could you also share full

internaldump from the gateway node ? <br><br>>I assume that we would need more parallelism, does prefetch pull just

one file at a time? So each file needs  some or many metadata operations

plus a single  or just a few >read and writes. Doing this sequentially

adds up all the latencies of NFS+GPFS. This is my explanation. With larger

files gpfs prefetch on home will help.</font></tt><font size=3><br></font><tt><font size=2><br>AFM prefetches the files on multiple threads. Default flush threads for

prefetch are 36 (fileset.afmNumFlushThreads (default 4) + afmNumIOFlushThreads

(default 32)). <br><br>>Please can anybody comment: Is this right, does AFM prefetch handle

one file at a time in a sequential manner? And is there any way to change

this behavior? Or am I wrong and >I need to look elsewhere to get better

performance for prefetch of many smaller files?</font></tt><font size=3><br></font><font size=2 face="sans-serif"><br>See above, AFM reads files on multiple threads parallelly.  Try increasing

the afmNumFlushThreads on fileset and verify if it improves the performance.</font><font size=3><br></font><font size=2 face="sans-serif"><br>~Venkat (vpuvvada@in.ibm.com)</font><font size=3><br><br><br></font><font size=1 color=#5f5f5f face="sans-serif"><br>From:        </font><font size=1 face="sans-serif">"Billich

Heinrich Rainer (PSI)" <heiner.billich@psi.ch></font><font size=1 color=#5f5f5f face="sans-serif"><br>To:        </font><font size=1 face="sans-serif">"gpfsug-discuss@spectrumscale.org"

<gpfsug-discuss@spectrumscale.org></font><font size=1 color=#5f5f5f face="sans-serif"><br>Date:        </font><font size=1 face="sans-serif">09/04/2017

10:18 PM</font><font size=1 color=#5f5f5f face="sans-serif"><br>Subject:        </font><font size=1 face="sans-serif">[gpfsug-discuss]

Use AFM for migration of many small files</font><font size=1 color=#5f5f5f face="sans-serif"><br>Sent by:        </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><font size=3><br><br><br></font><tt><font size=2><br>Hello,<br><br><br><br>We use AFM prefetch to migrate data between two clusters (using NFS). This

works fine with large files, say 1+GB. But we have millions of smaller

files,  about 1MB each. Here I see just ~150MB/s – compare this to

the 1000+MB/s we get for larger files.<br><br><br><br>I assume that we would need more parallelism, does prefetch pull just one

file at a time? So each file needs  some or many metadata operations

plus a single  or just a few read and writes. Doing this sequentially

adds up all the latencies of NFS+GPFS. This is my explanation. With larger

files gpfs prefetch on home will help.<br><br><br><br>Please can anybody comment: Is this right, does AFM prefetch handle one

file at a time in a sequential manner? And is there any way to change this

behavior? Or am I wrong and I need to look elsewhere to get better performance

for prefetch of many smaller files?<br><br><br><br>We will migrate several filesets in parallel, but still with individual

filesets up to 350TB in size 150MB/s isn’t fun. Also just about 150 files/s

seconds looks poor.<br><br><br><br>The setup is quite new, hence there may be other places to look at. <br><br>It’s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache.<br><br><br><br>Thank you,<br><br><br><br>Heiner<br><br>--,<br><br>Paul Scherrer Institut<br><br>Science IT<br><br>Heiner Billich<br><br>WHGA 106<br><br>CH 5232  Villigen PSI<br><br>056 310 36 02<br></font></tt><font size=3 color=blue><u><br></u></font><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e="><tt><font size=2 color=blue><u>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e=</u></font></tt></a><tt><font size=2><br><br><br><br>   <br><br><br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org</font></tt><font size=3 color=blue><u><br></u></font><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e="><tt><font size=2 color=blue><u>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e=</u></font></tt></a><tt><font size=2><br></font></tt><font size=3><br><br><br></font><br><tt><font size=3>_______________________________________________<br><br>gpfsug-discuss mailing list<br><br>gpfsug-discuss at spectrumscale.org<br><br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=07QQkI0Rg8NyUEgPIuJwfg3elEXqTpOjIFpy2WbaEg0&s=kGEDPbMo64yU7Tcwu61ggT89tfq_3QdX-r6NoANXh78&e="><tt><font size=3 color=blue><u>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</u></font></tt></a><tt><font size=3><br><br></font></tt><br><tt><font size=3>-- </font></tt><br><font size=3>Peter Childs</font><br><font size=3>ITS Research Storage</font><br><font size=3>Queen Mary, University of London</font><br><tt><font size=2>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=07QQkI0Rg8NyUEgPIuJwfg3elEXqTpOjIFpy2WbaEg0&s=kGEDPbMo64yU7Tcwu61ggT89tfq_3QdX-r6NoANXh78&e="><tt><font size=2>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=07QQkI0Rg8NyUEgPIuJwfg3elEXqTpOjIFpy2WbaEg0&s=kGEDPbMo64yU7Tcwu61ggT89tfq_3QdX-r6NoANXh78&e=</font></tt></a><tt><font size=2><br></font></tt><br><br><BR>