[gpfsug-discuss] Follow-up: migrating billions of files

Stephen Ulmer ulmer at ulmer.org
Wed Mar 6 15:49:32 GMT 2019


In the case where tar -C doesn’t work, you can always use a subshell (I do this regularly):

	tar -cf . | ssh someguy at otherhost "(cd targetdir; tar -xvf - )"

Only use -v on one end. :)

Also, for parallel work that’s not designed that way, don't underestimate the -P option to GNU and BSD xargs! With the amount of stuff to be copied, making sure a subjob doesn’t finish right after you go home leaving a slot idle for several hours is a medium deal.

In Bob’s case, however, treating it like a DR exercise where users "restore" their own files by accessing them (using AFM instead of HSM) is probably the most convenient.

-- 
Stephen



> On Mar 6, 2019, at 8:13 AM, Uwe Falke <UWEFALKE at de.ibm.com <mailto:UWEFALKE at de.ibm.com>> wrote:
> 
> Hi, in that case I'd open several tar pipes in parallel, maybe using 
> directories carefully selected, like 
> 
>  tar -c <source_dir> | ssh <target_host>  "tar -x"
> 
> I am not quite sure whether "-C /" for tar works here ("tar -C / -x"), but 
> along these lines might be a good efficient method. target_hosts should be 
> all nodes haveing the target file system mounted, and you should start 
> those pipes on the nodes with the source file system. 
> It is best to start with the largest directories, and use some 
> masterscript to start the tar pipes controlled by semaphores  to not 
> overload anything. 
> 
> 
> 
> Mit freundlichen Grüßen / Kind regards
> 
> 
> Dr. Uwe Falke
> 
> IT Specialist
> High Performance Computing Services / Integrated Technology Services / 
> Data Center Services
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefalke at de.ibm.com <mailto:uwefalke at de.ibm.com>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
> Thomas Wolter, Sven Schooß
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
> HRB 17122 
> 
> 
> 
> 
> From:   "Oesterlin, Robert" <Robert.Oesterlin at nuance.com <mailto:Robert.Oesterlin at nuance.com>>
> To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org <mailto:gpfsug-discuss at spectrumscale.org>>
> Date:   06/03/2019 13:44
> Subject:        [gpfsug-discuss] Follow-up: migrating billions of files
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> 
> 
> 
> Some of you had questions to my original post. More information:
> 
> Source:
> - Files are straight GPFS/Posix - no extended NFSV4 ACLs
> - A solution that requires $?s to be spent on software (ie, Aspera) isn?t 
> a very viable option
> - Both source and target clusters are in the same DC
> - Source is stand-alone NSD servers (bonded 10g-E) and 8gb FC SAN storage
> - Approx 40 file systems, a few large ones with 300M-400M files each, 
> others smaller
> - no independent file sets
> - migration must pose minimal disruption to existing users
> 
> Target architecture is a small number of file systems (2-3) on ESS with 
> independent filesets
> - Target (ESS) will have multiple 40gb-E links on each NSD server (GS4)
> 
> My current thinking is AFM with a pre-populate of the file space and 
> switch the clients over to have them pull data they need (most of the data 
> is older and less active) and them let AFM populate the rest in the 
> background.
> 
> 
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=J5RpIj-EzFyU_dM9I4P8SrpHMikte_pn9sbllFcOvyM&s=fEwDQyDSL7hvOVPbg_n8o_LDz-cLqSI6lQtSzmhaSoI&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=J5RpIj-EzFyU_dM9I4P8SrpHMikte_pn9sbllFcOvyM&s=fEwDQyDSL7hvOVPbg_n8o_LDz-cLqSI6lQtSzmhaSoI&e=>
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190306/7f02336f/attachment-0002.htm>


More information about the gpfsug-discuss mailing list