[gpfsug-discuss] searching for mmcp or mmcopy - optimized bulk copy for spectrum scale?

Stephen Ulmer ulmer at ulmer.org
Mon Dec 5 15:51:14 GMT 2016


This is not the answer to not writing it yourself:

However, be aware that GNU xargs has the -P x option, which will try to keep x batches running. It’s a good way to optimize the number of threads for anything you’re multiprocessing in the shell. So you can build a list and have xargs fork x copies of rsync or cp at a time (with -n y items in each batch). Not waiting to start the next batch when one finishes can add up to lots of MB*s very quickly.


This is not the answer to anything, and is probably a waste of your time:

I started to comment that if GPFS did provide such a “data path shortcut”, I think I’d want it to work between any two allocation areas — even two independent filesets in the same file system. Then I started working though the possibilities for just doing that… and it devolved into the realization that we’ve got to copy the data most of the time (unless it’s in the same filesystem *and* the same storage pool — and maybe even then depending on how the allocator works). Realizing that I decide that sometimes it just sucks to have data in the wrong (old) place. :)

Maybe what we want is to be able to split an independent fileset (if it is 1:1 with a storage pool) from a filesystem and graft it onto another one — that’s probably easier and it almost mirrors vgsplit, et al.

I should go do actual work...

Liberty,


> On Dec 5, 2016, at 9:09 AM, Brian Marshall <mimarsh2 at vt.edu <mailto:mimarsh2 at vt.edu>> wrote:
> 
> All,
> 
> I am in the same boat.  I'd like to copy ~500 TB from one filesystem to another.  Both are being served by the same NSD servers.
> 
> We've done the multiple rsync script method in the past (and yes it's a bit of a pain).  Would love to have an easier utility. 
> 
> Best,
> Brian Marshall
> 
> On Mon, Dec 5, 2016 at 5:26 AM, Heiner Billich <Heiner.Billich at psi.ch <mailto:Heiner.Billich at psi.ch>> wrote:
> Hello,
> 
> I heard about some gpfs optimized bulk(?) copy command  named 'mmcp' or 'mmcopy' but couldn't find it in either /user/lpp/mmfs/samples/ or by asking google. Can please somebody point me to the source? I wonder whether it  allows incremental copies as rsync does.
> 
> We need to copy a few 100TB of data and simple rsync provides just about 100MB/s. I know about the possible workarounds - write a wrapper script, run several rsyncs in parallel, distribute the rsync jobs on several nodes, use a special rsync versions that knows about gpfs ACLs, ... or try mmfind, which requires me to write a custom wrapper for cp ....
> 
> I really would prefer some ready-to-use script or program.
> 
> Thank you and kind regards,
> Heiner Billich
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>

-- 
Stephen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161205/e0421271/attachment-0002.htm>


More information about the gpfsug-discuss mailing list