[gpfsug-discuss] Online data migration tool

Aaron Knister aaron.s.knister at nasa.gov
Fri Dec 15 19:49:59 GMT 2017


Thanks, Bill.

I still don't feel like I've got an clear answer from IBM and frankly
the core issue of a lack of migration tool was totally dodged.

Again in Sven's presentation from SSUG @ SC17
(http://files.gpfsug.org/presentations/2017/SC17/SC17-UG-CORAL_V3.pdf)
he mentions "It has a significant performance penalty for small files in
large block size filesystems" and the demonstrates that with several
mdtest runs (which show the effect with and without the >32 subblocks code):


4.2.1 base code - SUMMARY: (of 3 iterations)
File creation : Mean = 2237.644

zero-end-of-file-padding (4.2.2 + ifdef for zero padding):  SUMMARY: (of
3 iterations)
File creation : Mean = 12866.842

more sub blocks per block (4.2.2 + morethan32subblock code):
File creation : Mean = 40316.721

Can someone (ideally Sven) give me a straight answer as to whether or
not the > 32 subblock code actually makes a performance difference for
small files in large block filesystems? And if not, help me understand
why his slides and provided benchmark data have consistently indicated
it does?

-Aaron

On 12/1/17 11:44 AM, Bill Hartner wrote:
> ESS GL4 4u106 w/ 10 TB drives - same HW Sven reported some of the
> results @ user group meeting.
> 
> -Bill
> 
> Bill Hartner
> IBM Systems
> Scalable I/O Development
> Austin, Texas
> bhartner at us.ibm.com
> home office 512-784-0980
> 
> 
> Inactive hide details for Jan-Frode Myklebust ---12/01/2017 06:53:44
> AM---Bill, could you say something about what the metadataJan-Frode
> Myklebust ---12/01/2017 06:53:44 AM---Bill, could you say something
> about what the metadata-storage here was? ESS/NL-SAS/3way replication?
> 
> From: Jan-Frode Myklebust <janfrode at tanso.net>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 12/01/2017 06:53 AM
> Subject: Re: [gpfsug-discuss] Online data migration tool
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> 
> ------------------------------------------------------------------------
> 
> 
> 
> Bill, could you say something about what the metadata-storage here was?
> ESS/NL-SAS/3way replication?
> 
> I just asked about this in the internal slack channel #scale-help today..
> 
> 
> 
> -jf
> 
> fre. 1. des. 2017 kl. 13:44 skrev Bill Hartner <_bhartner at us.ibm.com_
> <mailto:bhartner at us.ibm.com>>:
> 
>     > "It has a significant performance penalty for small files in large
>     > block size filesystems"
> 
>     Aaron,
> 
>     Below are mdtest results for a test we ran for CORAL - file size was
>     32k.
> 
>     We have not gone back and ran the test on a file system formatted
>     without > 32 subblocks. We'll do that at some point...
> 
>     -Bill
> 
>     -- started at 10/28/2017 17:51:38 --
> 
>     mdtest-1.9.3 was launched with 228 total task(s) on 12 node(s)
>     Command line used: /tmp/mdtest-binary-dir/mdtest -d
>     /ibm/fs2-16m-10/mdtest-60000 -i 3 -n 294912 -w 32768 -C -F -r -p 360
>     -u -y
>     Path: /ibm/fs2-16m-10
>     FS: 128.1 TiB Used FS: 0.3% Inodes: 476.8 Mi Used Inodes: 0.0%
> 
>     228 tasks, 67239936 files
> 
>     SUMMARY: (of 3 iterations)
>     Operation Max Min Mean Std Dev
>     --------- --- --- ---- -------
>     File creation : 51953.498 50558.517 51423.221 616.643
>     File stat : 0.000 0.000 0.000 0.000
>     File read : 0.000 0.000 0.000 0.000
>     File removal : 96746.376 92149.535 94658.774 1900.187
>     Tree creation : 1.588 0.070 0.599 0.700
>     Tree removal : 0.213 0.034 0.097 0.082
> 
>     -- finished at 10/28/2017 19:51:54 --
> 
>     Bill Hartner
>     IBM Systems
>     Scalable I/O Development
>     Austin, Texas_
>     __bhartner at us.ibm.com_ <mailto:bhartner at us.ibm.com>
>     home office 512-784-0980
> 
>     _
>     __gpfsug-discuss-bounces at spectrumscale.org_
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org> wrote on
>     11/29/2017 04:41:48 PM:
> 
>     > From: Aaron Knister <_aaron.knister at gmail.com_
>     <mailto:aaron.knister at gmail.com>>
> 
> 
>     > To: gpfsug main discussion list
>     <_gpfsug-discuss at spectrumscale.org_
>     <mailto:gpfsug-discuss at spectrumscale.org>>
> 
>     > Date: 11/29/2017 04:42 PM
> 
> 
>     > Subject: Re: [gpfsug-discuss] Online data migration tool
>     > Sent by: _gpfsug-discuss-bounces at spectrumscale.org_
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> 
>     >
> 
>     > Thanks, Nikhil. Most of that was consistent with my understnading,
>     > however I was under the impression that the >32 subblocks code is
>     > required to achieve the touted 50k file creates/second that Sven has
>     > talked about a bunch of times:
>     >
>     >
>     _http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2017_Manchester_08-5FResearch-5FTopics.pdf&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=UGLr4Z6sa2yWvKL99g7SuQdgwxnoZwhVmDuIbYsLqYY&e=>
>     >
>     _http://files.gpfsug.org/presentations/2017/Ehningen/31_-_SSUG17DE_-_ <https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2017_Ehningen_31-5F-2D-5FSSUG17DE-5F-2D&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=Il2rMx4AtGwjVRzX89kobZ0W25vW8TGm0KJevLd7KQ8&e=>
>     > _Sven_Oehme_-_News_from_Research.pdf
>     > _http://files.gpfsug.org/presentations/2016/SC16/12_-_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2016_SC16_12-5F-2D&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=u_qcvB--uvtByHp9H471EowagobMpPLXYT_FFzMkQiw&e=>
>     > _Sven_Oehme_Dean_Hildebrand_-_News_from_IBM_Research.pdf
> 
> 
>     > from those presentations regarding 32 subblocks:
>     >
>     > "It has a significant performance penalty for small files in large
>     > block size filesystems"
> 
>     > although I'm not clear on the specific definition of "large". Many
>     > filesystems I encounter only have a 1M block size so it may not
>     > matter there, although that same presentation clearly shows the
>     > benefit of larger block sizes which is yet *another* thing for which
>     > a migration tool would be helpful.
> 
>     > -Aaron
>     >
>     > On Wed, Nov 29, 2017 at 2:08 PM, Nikhil Khandelwal
>     <_nikhilk at us.ibm.com_ <mailto:nikhilk at us.ibm.com>> wrote:
> 
>     > Hi,
>     >
>     > I would like to clarify migration path to 5.0.0 from 4.X.X clusters.
>     > For all Spectrum Scale clusters that are currently at 4.X.X, it is
>     > possible to migrate to 5.0.0 with no offline data migration and no
>     > need to move data. Once these clusters are at 5.0.0, they will
>     > benefit from the performance improvements, new features (such as
>     > file audit logging), and various enhancements that are included in
>     5.0.0.
>     >
>     > That being said, there is one enhancement that will not be applied
>     > to these clusters, and that is the increased number of sub-blocks
>     > per block for small file allocation. This means that for file
>     > systems with a large block size and a lot of small files, the
>     > overall space utilization will be the same it currently is in 4.X.X.
>     > Since file systems created at 4.X.X and earlier used a block size
>     > that kept this allocation in mind, there should be very little
>     > impact on existing file systems.
>     >
>     > Outside of that one particular function, the remainder of the
>     > performance improvements, metadata improvements, updated
>     > compatibility, new functionality, and all of the other enhancements
>     > will be immediately available to you once you complete the upgrade
>     > to 5.0.0 -- with no need to reformat, move data, or take your data
>     offline.
>     >
>     > I hope that clarifies things a little and makes the upgrade path
>     > more accessible.
>     >
>     > Please let me know if there are any other questions or concerns.
>     >
>     > Thank you,
>     > Nikhil Khandelwal
>     > Spectrum Scale Development
>     > Client Adoption
>     >
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at _spectrumscale.org_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=>
>     > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=>
> 
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at _spectrumscale.org_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=>
> 
>     > _https://urldefense.proofpoint.com/v2/url?_
>     >
>     u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
>     >
>     siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=DHoqgBeMFgcM0LpXEI0VCYvvb8ollct5aSYUDln2t68&s=iOxGm-853L_W0XkB3jGsGzCTVlSYUvANOTSewcR_Ue8&e=
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at _spectrumscale.org_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=Q-P8kRqnjsWB7ePz6YtA3U0xguo7-lVWKmb_zyZPndE&e=>_
>     __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=>_______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Ew59QH6nxuyx6oTs7a8AYX7kKG3gaWUGDGo5ZZr3wQ4&m=KLv9eH4GG8WlXC5ENj_jXnzCpm60QSNAADfp6s94oa4&s=WolSBY_TPJVJVPj5WEZ6JAbDZQK3j7oqn8u_Y5xORkE&e=
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list