[gpfsug-discuss] gpfsug-discuss Digest, Vol 81, Issue 44

Chris Schlipalius chris.schlipalius at pawsey.org.au
Tue Oct 23 01:01:41 BST 2018


Hi So when we have migrated 1.6PB of data from one GPFS filesystems to another GPFS (over IB), we used dcp in github (with mmdsh). It just can be problematic to compile.

I have used rsync with attrib and ACLs’s preserved in my previous job – aka 
rsync -aAvz

But DCP parallelises better, checksumming files and dirs. works and we used that to ensure nothing was lost.
Worth a go!

Regards,
Chris Schlipalius
 
Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO)
13 Burvill Court
Kensington  WA  6151
Australia
 
Tel  +61 8 6436 8815  
Email  chris.schlipalius at pawsey.org.au
Web  www.pawsey.org.au <http://www.pawsey.org.au/>
 
 
 

 
 

On 23/10/18, 4:08 am, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" <gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org> wrote:

    Send gpfsug-discuss mailing list submissions to
    	gpfsug-discuss at spectrumscale.org
    
    To subscribe or unsubscribe via the World Wide Web, visit
    	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    or, via email, send a message with subject or body 'help' to
    	gpfsug-discuss-request at spectrumscale.org
    
    You can reach the person managing the list at
    	gpfsug-discuss-owner at spectrumscale.org
    
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of gpfsug-discuss digest..."
    
    
    Today's Topics:
    
       1. Re: Best way to migrate data (Ryan Novosielski)
       2. Re: Best way to migrate data (Sven Oehme)
       3. Re: Best way to migrate data : mmfind ... mmxcp (Marc A Kaplan)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Mon, 22 Oct 2018 15:21:06 +0000
    From: Ryan Novosielski <novosirj at rutgers.edu>
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Best way to migrate data
    Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6 at rutgers.edu>
    Content-Type: text/plain; charset="utf-8"
    
    It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this.
    
    I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata.
    
    > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote:
    > 
    > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs.
    > 
    > Best,
    > Dwayne
    > ?
    > Dwayne Hart | Systems Administrator IV
    > 
    > CHIA, Faculty of Medicine 
    > Memorial University of Newfoundland 
    > 300 Prince Philip Drive
    > St. John?s, Newfoundland | A1B 3V6
    > Craig L Dobbin Building | 4M409
    > T 709 864 6631
    > 
    >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski <novosirj at rutgers.edu> wrote:
    >> 
    >> -----BEGIN PGP SIGNED MESSAGE-----
    >> Hash: SHA1
    >> 
    >> We use parsyncfp. Our target is not GPFS, though. I was really hoping
    >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably
    >> tell you that HSM is the way to go (we asked something similar for a
    >> replacement for our current setup or for distributed storage).
    >> 
    >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote:
    >>> Hi,
    >>> 
    >>> Just wondering what the best recipe for migrating a user?s home
    >>> directory content from one GFPS file system to another which hosts
    >>> a larger research GPFS file system? I?m currently using rsync and
    >>> it has maxed out the client system?s IB interface.
    >>> 
    >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV
    >>> 
    >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300
    >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L
    >>> Dobbin Building | 4M409 T 709 864 6631 
    >>> _______________________________________________ gpfsug-discuss
    >>> mailing list gpfsug-discuss at spectrumscale.org 
    >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >>> 
    >> 
    >> - -- 
    >> ____
    >> || \\UTGERS,     |----------------------*O*------------------------
    >> ||_// the State  |    Ryan Novosielski - novosirj at rutgers.edu
    >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
    >> ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
    >>     `'
    >> -----BEGIN PGP SIGNATURE-----
    >> 
    >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG
    >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk
    >> =dMDg
    >> -----END PGP SIGNATURE-----
    >> _______________________________________________
    >> gpfsug-discuss mailing list
    >> gpfsug-discuss at spectrumscale.org
    >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    ------------------------------
    
    Message: 2
    Date: Mon, 22 Oct 2018 11:11:06 -0700
    From: Sven Oehme <oehmes at gmail.com>
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Best way to migrate data
    Message-ID:
    	<CALssuR35D=AEchSzezNzqpB=Tx2+CQx-OMJL8XwarDhKO5rttQ at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    i am not sure if that was mentioned already but in some version of V5.0.X
    based on my suggestion a tool was added by mark on a AS-IS basis (thanks
    mark) to do what you want with one exception :
    
    /usr/lpp/mmfs/samples/ilm/mmxcp -h
    Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count
    source_pathname1 source_pathname2 ...
    
     Run "cp" in a  mmfind ... -xarg ... pipeline, e.g.
    
      mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight
    DIRECTORY_HASH -xargs mmxcp -t /target -p 2
    
     Options:
      -t target_path : Copy files to this path.
      -p strip_count : Remove this many directory names from the pathnames of
    the source files.
      -a  : pass -a to cp
      -v  : pass -v to cp
    
    this is essentially a parallel copy tool using the policy with all its
    goddies.
    the one critical part thats missing is that it doesn't copy any GPFS
    specific metadata which unfortunate includes NFSV4 ACL's. the reason for
    that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of
    the regular Linux tools uses the proprietary interface into GPFS to extract
    and apply them (this is what allows this magic unsupported version of rsync
    https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer
    the acls and other attributes). so a worth while RFE would be to either
    expose all special GPFS bits as xattrs or provide at least a maintained
    version of sync, cp or whatever which allows the transfer of this data.
    
    Sven
    
    On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski <novosirj at rutgers.edu>
    wrote:
    
    > It seems like the primary way that this helps us is that we transfer user
    > home directories and many of them have VERY large numbers of small files
    > (in the millions), so running multiple simultaneous rsyncs allows the
    > transfer to continue past that one slow area. I guess it balances the
    > bandwidth constraint and the I/O constraints on generating a file list.
    > There are unfortunately one or two known bugs that slow it down ? it keeps
    > track of its rsync PIDs but sometimes a former rsync PID is reused by the
    > system which it counts against the number of running rsyncs. It can also
    > think rsync is still running at the end when it?s really something else now
    > using the PID. I know the author is looking at that. For shorter transfers,
    > you likely won?t run into this.
    >
    > I?m not sure I have the time or the programming ability to make this
    > happen, but it seems to me that one could make some major gains by
    > replacing fpart with mmfind in a GPFS environment. Generating lists of
    > files takes a significant amount of time and mmfind can probably do it
    > faster than anything else that does not have direct access to GPFS metadata.
    >
    > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote:
    > >
    > > Thank you Ryan. I?ll have a more in-depth look at this application later
    > today and see how it deals with some of the large genetic files that are
    > generated by the sequencer. By copying it from GPFS fs to another GPFS fs.
    > >
    > > Best,
    > > Dwayne
    > > ?
    > > Dwayne Hart | Systems Administrator IV
    > >
    > > CHIA, Faculty of Medicine
    > > Memorial University of Newfoundland
    > > 300 Prince Philip Drive
    > > St. John?s, Newfoundland | A1B 3V6
    > > Craig L Dobbin Building | 4M409
    > > T 709 864 6631 <(709)%20864-6631>
    > >
    > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski <novosirj at rutgers.edu>
    > wrote:
    > >>
    > >> -----BEGIN PGP SIGNED MESSAGE-----
    > >> Hash: SHA1
    > >>
    > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping
    > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably
    > >> tell you that HSM is the way to go (we asked something similar for a
    > >> replacement for our current setup or for distributed storage).
    > >>
    > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote:
    > >>> Hi,
    > >>>
    > >>> Just wondering what the best recipe for migrating a user?s home
    > >>> directory content from one GFPS file system to another which hosts
    > >>> a larger research GPFS file system? I?m currently using rsync and
    > >>> it has maxed out the client system?s IB interface.
    > >>>
    > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV
    > >>>
    > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300
    > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L
    > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631>
    > >>> _______________________________________________ gpfsug-discuss
    > >>> mailing list gpfsug-discuss at spectrumscale.org
    > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > >>>
    > >>
    > >> - --
    > >> ____
    > >> || \\UTGERS,     |----------------------*O*------------------------
    > >> ||_// the State  |    Ryan Novosielski - novosirj at rutgers.edu
    > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922>
    > ~*~ RBHS Campus
    > >> ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
    > >>     `'
    > >> -----BEGIN PGP SIGNATURE-----
    > >>
    > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG
    > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk
    > >> =dMDg
    > >> -----END PGP SIGNATURE-----
    > >> _______________________________________________
    > >> gpfsug-discuss mailing list
    > >> gpfsug-discuss at spectrumscale.org
    > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > > _______________________________________________
    > > gpfsug-discuss mailing list
    > > gpfsug-discuss at spectrumscale.org
    > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181022/5eda4214/attachment-0001.html>
    
    ------------------------------
    
    Message: 3
    Date: Mon, 22 Oct 2018 16:08:49 -0400
    From: "Marc A Kaplan" <makaplan at us.ibm.com>
    To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
    Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ...
    	mmxcp
    Message-ID:
    	<OFF1CD736D.7F0A9AEA-ON8525832E.006E23D5-8525832E.006EAC92 at notes.na.collabserv.com>
    	
    Content-Type: text/plain; charset="us-ascii"
    
    Rather than hack rsync or cp ... I proposed a smallish utility that would 
    copy those extended attributes and ACLs that cp -a just skips over.
    This can be done using the documented GPFS APIs that were designed for 
    backup and restore of files.
    SMOP and then add it as an option to samples/ilm/mmxcp
    
    Sorry I haven't gotten around to doing this ... Seems like a modest sized 
    project...  Avoids boiling the ocean and reinventing or hacking rsync.
    
    -- marc K
    
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181022/35e81523/attachment.html>
    
    ------------------------------
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    
    
    End of gpfsug-discuss Digest, Vol 81, Issue 44
    **********************************************
    





More information about the gpfsug-discuss mailing list