[gpfsug-discuss] Use AFM for migration of many small files
Venkateswara R Puvvada
vpuvvada at in.ibm.com
Tue Sep 5 15:27:21 BST 2017
Which version of Spectrum Scale ? What is the fileset mode ?
>We use AFM prefetch to migrate data between two clusters (using NFS).
This works fine with large files, say 1+GB. But we have millions of
smaller files, about 1MB each. Here >I see just ~150MB/s – compare this
to the 1000+MB/s we get for larger files.
How was the performance measured ? If parallel IO is enabled, AFM uses
multiple gateway nodes to prefetch the large files (if file size if more
than 1GB). Performance difference between small and lager file is huge
(1000MB - 150MB = 850MB) here, and generally it is not the case. How many
files were present in list file for prefetch ? Could you also share full
internaldump from the gateway node ?
>I assume that we would need more parallelism, does prefetch pull just one
file at a time? So each file needs some or many metadata operations plus
a single or just a few >read and writes. Doing this sequentially adds up
all the latencies of NFS+GPFS. This is my explanation. With larger files
gpfs prefetch on home will help.
AFM prefetches the files on multiple threads. Default flush threads for
prefetch are 36 (fileset.afmNumFlushThreads (default 4) +
afmNumIOFlushThreads (default 32)).
>Please can anybody comment: Is this right, does AFM prefetch handle one
file at a time in a sequential manner? And is there any way to change this
behavior? Or am I wrong and >I need to look elsewhere to get better
performance for prefetch of many smaller files?
See above, AFM reads files on multiple threads parallelly. Try increasing
the afmNumFlushThreads on fileset and verify if it improves the
performance.
~Venkat (vpuvvada at in.ibm.com)
From: "Billich Heinrich Rainer (PSI)" <heiner.billich at psi.ch>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 09/04/2017 10:18 PM
Subject: [gpfsug-discuss] Use AFM for migration of many small files
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Hello,
We use AFM prefetch to migrate data between two clusters (using NFS). This
works fine with large files, say 1+GB. But we have millions of smaller
files, about 1MB each. Here I see just ~150MB/s – compare this to the
1000+MB/s we get for larger files.
I assume that we would need more parallelism, does prefetch pull just one
file at a time? So each file needs some or many metadata operations plus
a single or just a few read and writes. Doing this sequentially adds up
all the latencies of NFS+GPFS. This is my explanation. With larger files
gpfs prefetch on home will help.
Please can anybody comment: Is this right, does AFM prefetch handle one
file at a time in a sequential manner? And is there any way to change this
behavior? Or am I wrong and I need to look elsewhere to get better
performance for prefetch of many smaller files?
We will migrate several filesets in parallel, but still with individual
filesets up to 350TB in size 150MB/s isn’t fun. Also just about 150
files/s seconds looks poor.
The setup is quite new, hence there may be other places to look at.
It’s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache.
Thank you,
Heiner
--,
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232 Villigen PSI
056 310 36 02
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e=
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170905/3b28f7f8/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list