[gpfsug-discuss] Best way to migrate data

Christopher Black cblack at nygenome.org
Thu Oct 18 20:13:29 BST 2018


Other tools and approaches that we've found helpful:
msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files
Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions

msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously.

Best,
Chris

Refs:
https://github.com/jbd/msrsync
https://www.globus.org/

On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" <gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul.Sanchez at deshaw.com> wrote:

    Sharding can also work, if you have a storage-connected compute grid in your environment:  If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously.  It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines.

    -Paul
    -----Original Message-----
    From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L
    Sent: Thursday, October 18, 2018 2:26 PM
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Best way to migrate data

    Hi Dwayne,

    I’m assuming you can’t just let an rsync run, possibly throttled in some way?  If not, and if you’re just tapping out your network, then would it be possible to go old school?  We have parts of the Medical Center here where their network connections are … um, less than robust.  So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server.

    HTH, and I really hope that someone has a better idea than that!

    Kevin

    > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote:
    >
    > Hi,
    >
    > Just wondering what the best recipe for migrating a user’s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I’m currently using rsync and it has maxed out the client system’s IB interface.
    >
    > Best,
    > Dwayne
    > —
    > Dwayne Hart | Systems Administrator IV
    >
    > CHIA, Faculty of Medicine
    > Memorial University of Newfoundland
    > 300 Prince Philip Drive
    > St. John’s, Newfoundland | A1B 3V6
    > Craig L Dobbin Building | 4M409
    > T 709 864 6631
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e=

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e=
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e=


________________________________

This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.


More information about the gpfsug-discuss mailing list