[gpfsug-discuss] suggestions for copying one GPFS file system into another

Jez Tucker jtucker at pixitmedia.com
Fri Mar 8 16:08:14 GMT 2019


Hi

   I feel as an 'other products do exist' I should also mention Ngenea 
and APSync which could meet the technical requirements of these use cases.

Ngenea allows you to bring data in from 'cloud' and also of interest in 
this specific use case, POSIX filesystems or filer islands.  You can 
present the remote data available locally and then inflate the data 
either on demand or via enacted process.  Massively parallel, 
multi-node, highly threaded with extremely granular rules based 
control.   You can also migrate data back to your filer re-utilising 
such islands as tiers.  You can even use it to 'virtually tier' within 
GPFS/Scale filesystems, alike a 'hardlink across independent filesets'.  
Or even across Global WANs for true 24x7 follow-the-sun working practices.

APSync also provides a differently patched version of rsync and builds 
on top of the 'SnapDiff' technology previously presented at the UG 
whereby you don't need to re-scan your entire filesystem for each sync 
and thus can do incremental changes for create, modified, deleted and 
_track moved files_.  Handy and extremely time saving over regularised 
full runs.  Massively parallel, multi-node, highly threaded (a common 
theme with our tools...).

As I don't do sales; if anyone wants to talk tech nuts-and-bolts with me 
about these, or you have challenges (and I love a challenge..) by all 
means hit me up directly.  I like solving people's blockers :-)

Happy Friday ppl,

Jez


On 05/03/2019 21:38, Simon Thompson wrote:
> DDN also have a paid for product for doing moving of data (data flow) We found out about it after we did a massive data migration...
>
> I can't comment on it other than being aware of it. Sure your local DDN sales person can help.
>
> But if only IBM supported some sort of restripe to new block size, we wouldn't have to do this mass migration :-P
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Simon Thompson [S.J.Thompson at bham.ac.uk]
> Sent: 05 March 2019 16:38
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] suggestions forwar copying one GPFS file system into another
>
> I wrote a patch to mpifileutils which will copy gpfs attributes, but when we played with it with rsync, something was obviously still different about the attrs from each, so use with care.
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Ratliff, John [jdratlif at iu.edu]
> Sent: 05 March 2019 16:21
> To: gpfsug-discuss at spectrumscale.org
> Subject: [gpfsug-discuss] suggestions for copying one GPFS file system into     another
>
> We use a GPFS file system for our computing clusters and we’re working on moving to a new SAN.
>
> We originally tried AFM, but it didn’t seem to work very well. We tried to do a prefetch on a test policy scan of 100 million files, and after 24 hours it hadn’t pre-fetched anything. It wasn’t clear what was happening. Some smaller tests succeeded, but the NFSv4 ACLs did not seem to be transferred.
>
> Since then we started using rsync with the GPFS attrs patch. We have over 600 million files and 700 TB. I split up the rsync tasks with lists of files generated by the policy engine and we transferred the original data in about 2 weeks. Now we’re working on final synchronization. I’d like to use one of the delete options to remove files that were sync’d earlier and then deleted. This can’t be combined with the files-from option, so it’s harder to break up the rsync tasks. Some of the directories I’m running this against have 30-150 million files each. This can take quite some time with a single rsync process.
>
> I’m also wondering if any of my rsync options are unnecessary. I was using avHAXS and numeric-ids. I’m thinking the A (acls) and X (xatttrs) might be unnecessary with GPFS->GPFS. We’re only using NFSv4 GPFS ACLs. I don’t know if GPFS uses any xattrs that rsync would sync or not. Removing those two options removed several system calls, which should make it much faster, but I want to make sure I’m syncing correctly. Also, it seems there is a problem with the GPFS patch on rsync where it will always give an error trying to get GPFS attributes on a symlink, which means it doesn’t sync any symlinks when using that option. So you can rsync symlinks or GPFS attrs, but not both at the same time. This has lead to me running two rsyncs, one to get all files and one to get all attributes.
>
> Thanks for any ideas or suggestions.
>
> John Ratliff | Pervasive Technology Institute | UITS | Research Storage – Indiana University | http://pti.iu.edu
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
*Jez Tucker*
Head of Research and Development, Pixit Media
07764193820 | jtucker at pixitmedia.com <mailto:jtucker at pixitmedia.com>
www.pixitmedia.com <http://www.pixitmedia.com> | Tw:@pixitmedia.com 
<https://twitter.com/PixitMedia>

-- 
 <http://pixitmedia.com>
This email is confidential in that it is intended 
for the exclusive attention of the addressee(s) indicated. If you are not 
the intended recipient, this email should not be read or disclosed to any 
other person. Please notify the sender immediately and delete this email 
from your computer system. Any opinions expressed are not necessarily those 
of the company from which this email was sent and, whilst to the best of 
our knowledge no viruses or defects exist, no responsibility can be 
accepted for any loss or damage arising from its receipt or subsequent use 
of this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190308/e2aeec06/attachment-0002.htm>


More information about the gpfsug-discuss mailing list