[gpfsug-discuss] Online data migration tool

Alex Chekholko alex at calicolabs.com
Fri Dec 15 20:48:16 GMT 2017


Hey Aaron,

Can you define your sizes for "large blocks" and "small files"?  If you
dial one up and the other down, your performance will be worse.  And in any
case it's a pathological corner case so it shouldn't matter much for your
workflow, unless you've designed your system with the wrong values.

For example, for bioinformatics workloads, I prefer to use 256KB filesystem
block size, and I'd consider 4MB+ to be "large block size", which would
make the filesystem obviously unsuitable for processing millions of 8KB
files.

You can make a histogram of file sizes in your existing filesystems and
then make your subblock size (1/32 of block size) on the smaller end of
that.   Also definitely use the "small file in inode" feature and put your
metadata on SSD.

Regards,
Alex

On Fri, Dec 15, 2017 at 11:49 AM, Aaron Knister <aaron.s.knister at nasa.gov>
wrote:

> Thanks, Bill.
>
> I still don't feel like I've got an clear answer from IBM and frankly
> the core issue of a lack of migration tool was totally dodged.
>
> Again in Sven's presentation from SSUG @ SC17
> (http://files.gpfsug.org/presentations/2017/SC17/SC17-UG-CORAL_V3.pdf)
> he mentions "It has a significant performance penalty for small files in
> large block size filesystems" and the demonstrates that with several
> mdtest runs (which show the effect with and without the >32 subblocks
> code):
>
>
> 4.2.1 base code - SUMMARY: (of 3 iterations)
> File creation : Mean = 2237.644
>
> zero-end-of-file-padding (4.2.2 + ifdef for zero padding):  SUMMARY: (of
> 3 iterations)
> File creation : Mean = 12866.842
>
> more sub blocks per block (4.2.2 + morethan32subblock code):
> File creation : Mean = 40316.721
>
> Can someone (ideally Sven) give me a straight answer as to whether or
> not the > 32 subblock code actually makes a performance difference for
> small files in large block filesystems? And if not, help me understand
> why his slides and provided benchmark data have consistently indicated
> it does?
>
> -Aaron
>
> On 12/1/17 11:44 AM, Bill Hartner wrote:
> > ESS GL4 4u106 w/ 10 TB drives - same HW Sven reported some of the
> > results @ user group meeting.
> >
> > -Bill
> >
> > Bill Hartner
> > IBM Systems
> > Scalable I/O Development
> > Austin, Texas
> > bhartner at us.ibm.com
> > home office 512-784-0980
> >
> >
> > Inactive hide details for Jan-Frode Myklebust ---12/01/2017 06:53:44
> > AM---Bill, could you say something about what the metadataJan-Frode
> > Myklebust ---12/01/2017 06:53:44 AM---Bill, could you say something
> > about what the metadata-storage here was? ESS/NL-SAS/3way replication?
> >
> > From: Jan-Frode Myklebust <janfrode at tanso.net>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date: 12/01/2017 06:53 AM
> > Subject: Re: [gpfsug-discuss] Online data migration tool
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >
> > ------------------------------------------------------------------------
> >
> >
> >
> > Bill, could you say something about what the metadata-storage here was?
> > ESS/NL-SAS/3way replication?
> >
> > I just asked about this in the internal slack channel #scale-help today..
> >
> >
> >
> > -jf
> >
> > fre. 1. des. 2017 kl. 13:44 skrev Bill Hartner <_bhartner at us.ibm.com_
> > <mailto:bhartner at us.ibm.com>>:
> >
> >     > "It has a significant performance penalty for small files in large
> >     > block size filesystems"
> >
> >     Aaron,
> >
> >     Below are mdtest results for a test we ran for CORAL - file size was
> >     32k.
> >
> >     We have not gone back and ran the test on a file system formatted
> >     without > 32 subblocks. We'll do that at some point...
> >
> >     -Bill
> >
> >     -- started at 10/28/2017 17:51:38 --
> >
> >     mdtest-1.9.3 was launched with 228 total task(s) on 12 node(s)
> >     Command line used: /tmp/mdtest-binary-dir/mdtest -d
> >     /ibm/fs2-16m-10/mdtest-60000 -i 3 -n 294912 -w 32768 -C -F -r -p 360
> >     -u -y
> >     Path: /ibm/fs2-16m-10
> >     FS: 128.1 TiB Used FS: 0.3% Inodes: 476.8 Mi Used Inodes: 0.0%
> >
> >     228 tasks, 67239936 files
> >
> >     SUMMARY: (of 3 iterations)
> >     Operation Max Min Mean Std Dev
> >     --------- --- --- ---- -------
> >     File creation : 51953.498 50558.517 51423.221 616.643
> >     File stat : 0.000 0.000 0.000 0.000
> >     File read : 0.000 0.000 0.000 0.000
> >     File removal : 96746.376 92149.535 94658.774 1900.187
> >     Tree creation : 1.588 0.070 0.599 0.700
> >     Tree removal : 0.213 0.034 0.097 0.082
> >
> >     -- finished at 10/28/2017 19:51:54 --
> >
> >     Bill Hartner
> >     IBM Systems
> >     Scalable I/O Development
> >     Austin, Texas_
> >     __bhartner at us.ibm.com_ <mailto:bhartner at us.ibm.com>
> >     home office 512-784-0980
> >
> >     _
> >     __gpfsug-discuss-bounces at spectrumscale.org_
> >     <mailto:gpfsug-discuss-bounces at spectrumscale.org> wrote on
> >     11/29/2017 04:41:48 PM:
> >
> >     > From: Aaron Knister <_aaron.knister at gmail.com_
> >     <mailto:aaron.knister at gmail.com>>
> >
> >
> >     > To: gpfsug main discussion list
> >     <_gpfsug-discuss at spectrumscale.org_
> >     <mailto:gpfsug-discuss at spectrumscale.org>>
> >
> >     > Date: 11/29/2017 04:42 PM
> >
> >
> >     > Subject: Re: [gpfsug-discuss] Online data migration tool
> >     > Sent by: _gpfsug-discuss-bounces at spectrumscale.org_
> >     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> >
> >     >
> >
> >     > Thanks, Nikhil. Most of that was consistent with my understnading,
> >     > however I was under the impression that the >32 subblocks code is
> >     > required to achieve the touted 50k file creates/second that Sven
> has
> >     > talked about a bunch of times:
> >     >
> >     >
> >     _http://files.gpfsug.org/presentations/2017/Manchester/
> 08_Research_Topics.pdf_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__files.gpfsug.org_presentations_2017_Manchester_
> 08-5FResearch-5FTopics.pdf&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=UGLr4Z6sa2yWvKL99g7SuQdgwxnoZwhVmDuIbYsLqYY&e=>
> >     >
> >     _http://files.gpfsug.org/presentations/2017/Ehningen/
> 31_-_SSUG17DE_-_ <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__files.gpfsug.org_presentations_2017_Ehningen_
> 31-5F-2D-5FSSUG17DE-5F-2D&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=Il2rMx4AtGwjVRzX89kobZ0W25vW8TGm0KJevLd7KQ8&e=>
> >     > _Sven_Oehme_-_News_from_Research.pdf
> >     > _http://files.gpfsug.org/presentations/2016/SC16/12_-_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__files.gpfsug.org_presentations_2016_SC16_12-5F-
> 2D&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaW
> UGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=u_qcvB--
> uvtByHp9H471EowagobMpPLXYT_FFzMkQiw&e=>
> >     > _Sven_Oehme_Dean_Hildebrand_-_News_from_IBM_Research.pdf
> >
> >
> >     > from those presentations regarding 32 subblocks:
> >     >
> >     > "It has a significant performance penalty for small files in large
> >     > block size filesystems"
> >
> >     > although I'm not clear on the specific definition of "large". Many
> >     > filesystems I encounter only have a 1M block size so it may not
> >     > matter there, although that same presentation clearly shows the
> >     > benefit of larger block sizes which is yet *another* thing for
> which
> >     > a migration tool would be helpful.
> >
> >     > -Aaron
> >     >
> >     > On Wed, Nov 29, 2017 at 2:08 PM, Nikhil Khandelwal
> >     <_nikhilk at us.ibm.com_ <mailto:nikhilk at us.ibm.com>> wrote:
> >
> >     > Hi,
> >     >
> >     > I would like to clarify migration path to 5.0.0 from 4.X.X
> clusters.
> >     > For all Spectrum Scale clusters that are currently at 4.X.X, it is
> >     > possible to migrate to 5.0.0 with no offline data migration and no
> >     > need to move data. Once these clusters are at 5.0.0, they will
> >     > benefit from the performance improvements, new features (such as
> >     > file audit logging), and various enhancements that are included in
> >     5.0.0.
> >     >
> >     > That being said, there is one enhancement that will not be applied
> >     > to these clusters, and that is the increased number of sub-blocks
> >     > per block for small file allocation. This means that for file
> >     > systems with a large block size and a lot of small files, the
> >     > overall space utilization will be the same it currently is in
> 4.X.X.
> >     > Since file systems created at 4.X.X and earlier used a block size
> >     > that kept this allocation in mind, there should be very little
> >     > impact on existing file systems.
> >     >
> >     > Outside of that one particular function, the remainder of the
> >     > performance improvements, metadata improvements, updated
> >     > compatibility, new functionality, and all of the other enhancements
> >     > will be immediately available to you once you complete the upgrade
> >     > to 5.0.0 -- with no need to reformat, move data, or take your data
> >     offline.
> >     >
> >     > I hope that clarifies things a little and makes the upgrade path
> >     > more accessible.
> >     >
> >     > Please let me know if there are any other questions or concerns.
> >     >
> >     > Thank you,
> >     > Nikhil Khandelwal
> >     > Spectrum Scale Development
> >     > Client Adoption
> >     >
> >     > _______________________________________________
> >     > gpfsug-discuss mailing list
> >     > gpfsug-discuss at _spectrumscale.org_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=>
> >     > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=
> DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaW
> UGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_
> TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=>
> >
> >     > _______________________________________________
> >     > gpfsug-discuss mailing list
> >     > gpfsug-discuss at _spectrumscale.org_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=>
> >
> >     > _https://urldefense.proofpoint.com/v2/url?_
> >     >
> >     u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=
> DwICAg&c=jf_iaSHvJObTbx-
> >     >
> >     siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=
> DHoqgBeMFgcM0LpXEI0VCYvvb8ollct5aSYUDln2t68&s=iOxGm-853L_
> W0XkB3jGsGzCTVlSYUvANOTSewcR_Ue8&e=
> >
> >     _______________________________________________
> >     gpfsug-discuss mailing list
> >     gpfsug-discuss at _spectrumscale.org_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=
> >_
> >     __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> >     <https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=
> DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaW
> UGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_
> TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=>___________________
> ____________________________
> >     gpfsug-discuss mailing list
> >     gpfsug-discuss at spectrumscale.org
> >     https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.
> org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=
> Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_
> jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171215/5f33cc4a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list