[gpfsug-discuss] Follow-up: migrating billions of files
Uwe Falke
UWEFALKE at de.ibm.com
Wed Mar 6 13:13:16 GMT 2019
Hi, in that case I'd open several tar pipes in parallel, maybe using
directories carefully selected, like
tar -c <source_dir> | ssh <target_host> "tar -x"
I am not quite sure whether "-C /" for tar works here ("tar -C / -x"), but
along these lines might be a good efficient method. target_hosts should be
all nodes haveing the target file system mounted, and you should start
those pipes on the nodes with the source file system.
It is best to start with the largest directories, and use some
masterscript to start the tar pipes controlled by semaphores to not
overload anything.
Mit freundlichen Grüßen / Kind regards
Dr. Uwe Falke
IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122
From: "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 06/03/2019 13:44
Subject: [gpfsug-discuss] Follow-up: migrating billions of files
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Some of you had questions to my original post. More information:
Source:
- Files are straight GPFS/Posix - no extended NFSV4 ACLs
- A solution that requires $?s to be spent on software (ie, Aspera) isn?t
a very viable option
- Both source and target clusters are in the same DC
- Source is stand-alone NSD servers (bonded 10g-E) and 8gb FC SAN storage
- Approx 40 file systems, a few large ones with 300M-400M files each,
others smaller
- no independent file sets
- migration must pose minimal disruption to existing users
Target architecture is a small number of file systems (2-3) on ESS with
independent filesets
- Target (ESS) will have multiple 40gb-E links on each NSD server (GS4)
My current thinking is AFM with a pre-populate of the file space and
switch the clients over to have them pull data they need (most of the data
is older and less active) and them let AFM populate the rest in the
background.
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=J5RpIj-EzFyU_dM9I4P8SrpHMikte_pn9sbllFcOvyM&s=fEwDQyDSL7hvOVPbg_n8o_LDz-cLqSI6lQtSzmhaSoI&e=
More information about the gpfsug-discuss
mailing list