<tt><font size=2>Which version of Spectrum Scale ? What is the fileset


mode ?</font></tt><br><br><tt><font size=2>>We use AFM prefetch to migrate data between two


clusters (using NFS). This works fine with large files, say 1+GB. But we


have millions of smaller files,  about 1MB each. Here >I see just


~150MB/s – compare this to the 1000+MB/s we get for larger files.<br><br>How was the performance measured ? If parallel IO is enabled, AFM uses


multiple gateway nodes to prefetch the large files (if file size if more


than 1GB). Performance difference between small and lager file is huge


(1000MB - 150MB = 850MB) here, and generally it is not the case. How many


files were present in list file for prefetch ? Could you also share full


internaldump from the gateway node ? <br><br>>I assume that we would need more parallelism, does prefetch pull just


one file at a time? So each file needs  some or many metadata operations


plus a single  or just a few >read and writes. Doing this sequentially


adds up all the latencies of NFS+GPFS. This is my explanation. With larger


files gpfs prefetch on home will help.<br></font></tt><br><tt><font size=2>AFM prefetches the files on multiple threads. Default


flush threads for prefetch are 36 (fileset.afmNumFlushThreads (default


4) + afmNumIOFlushThreads (default 32)). <br><br>>Please can anybody comment: Is this right, does AFM prefetch handle


one file at a time in a sequential manner? And is there any way to change


this behavior? Or am I wrong and >I need to look elsewhere to get better


performance for prefetch of many smaller files?<br></font></tt><br><font size=2 face="sans-serif">See above, AFM reads files on multiple


threads parallelly.  Try increasing the afmNumFlushThreads on fileset


and verify if it improves the performance.</font><br><br><font size=2 face="sans-serif">~Venkat (vpuvvada@in.ibm.com)</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      


 </font><font size=1 face="sans-serif">"Billich Heinrich


Rainer (PSI)" <heiner.billich@psi.ch></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      


 </font><font size=1 face="sans-serif">"gpfsug-discuss@spectrumscale.org"


<gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      


 </font><font size=1 face="sans-serif">09/04/2017 10:18 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    


   </font><font size=1 face="sans-serif">[gpfsug-discuss]


Use AFM for migration of many small files</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    


   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><tt><font size=2>Hello,<br><br><br><br>We use AFM prefetch to migrate data between two clusters (using NFS). This


works fine with large files, say 1+GB. But we have millions of smaller


files,  about 1MB each. Here I see just ~150MB/s – compare this to


the 1000+MB/s we get for larger files.<br><br><br><br>I assume that we would need more parallelism, does prefetch pull just one


file at a time? So each file needs  some or many metadata operations


plus a single  or just a few read and writes. Doing this sequentially


adds up all the latencies of NFS+GPFS. This is my explanation. With larger


files gpfs prefetch on home will help.<br><br><br><br>Please can anybody comment: Is this right, does AFM prefetch handle one


file at a time in a sequential manner? And is there any way to change this


behavior? Or am I wrong and I need to look elsewhere to get better performance


for prefetch of many smaller files?<br><br><br><br>We will migrate several filesets in parallel, but still with individual


filesets up to 350TB in size 150MB/s isn’t fun. Also just about 150 files/s


seconds looks poor.<br><br><br><br>The setup is quite new, hence there may be other places to look at. <br><br>It’s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache.<br><br><br><br>Thank you,<br><br><br><br>Heiner<br><br>--,<br><br>Paul Scherrer Institut<br><br>Science IT<br><br>Heiner Billich<br><br>WHGA 106<br><br>CH 5232  Villigen PSI<br><br>056 310 36 02<br><br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e="><tt><font size=2>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e=</font></tt></a><tt><font size=2><br><br> <br><br>    <br><br><br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e="><tt><font size=2>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e=</font></tt></a><tt><font size=2><br><br></font></tt><br><br><BR>