[gpfsug-discuss] suggestions for copying one GPFS file system into another

Simon Thompson S.J.Thompson at bham.ac.uk
Tue Mar 5 16:38:31 GMT 2019


I wrote a patch to mpifileutils which will copy gpfs attributes, but when we played with it with rsync, something was obviously still different about the attrs from each, so use with care.

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Ratliff, John [jdratlif at iu.edu]
Sent: 05 March 2019 16:21
To: gpfsug-discuss at spectrumscale.org
Subject: [gpfsug-discuss] suggestions for copying one GPFS file system into     another

We use a GPFS file system for our computing clusters and we’re working on moving to a new SAN.

We originally tried AFM, but it didn’t seem to work very well. We tried to do a prefetch on a test policy scan of 100 million files, and after 24 hours it hadn’t pre-fetched anything. It wasn’t clear what was happening. Some smaller tests succeeded, but the NFSv4 ACLs did not seem to be transferred.

Since then we started using rsync with the GPFS attrs patch. We have over 600 million files and 700 TB. I split up the rsync tasks with lists of files generated by the policy engine and we transferred the original data in about 2 weeks. Now we’re working on final synchronization. I’d like to use one of the delete options to remove files that were sync’d earlier and then deleted. This can’t be combined with the files-from option, so it’s harder to break up the rsync tasks. Some of the directories I’m running this against have 30-150 million files each. This can take quite some time with a single rsync process.

I’m also wondering if any of my rsync options are unnecessary. I was using avHAXS and numeric-ids. I’m thinking the A (acls) and X (xatttrs) might be unnecessary with GPFS->GPFS. We’re only using NFSv4 GPFS ACLs. I don’t know if GPFS uses any xattrs that rsync would sync or not. Removing those two options removed several system calls, which should make it much faster, but I want to make sure I’m syncing correctly. Also, it seems there is a problem with the GPFS patch on rsync where it will always give an error trying to get GPFS attributes on a symlink, which means it doesn’t sync any symlinks when using that option. So you can rsync symlinks or GPFS attrs, but not both at the same time. This has lead to me running two rsyncs, one to get all files and one to get all attributes.

Thanks for any ideas or suggestions.

John Ratliff | Pervasive Technology Institute | UITS | Research Storage – Indiana University | http://pti.iu.edu




More information about the gpfsug-discuss mailing list