[gpfsug-discuss] naive question about rsync: run it on a client or on NSD server?

Fri Feb 14 21:09:14 GMT 2020

On 14/02/2020 16:24, Sanchez, Paul wrote:
> Some (perhaps obvious) points to consider:
> 
> - There are some corner cases (e.g. preserving hard-linked files or
> sparseness) which require special options.
> 
> - Depending on your level of churn, it may be helpful to pre-stage
> the sync before your cutover so that there is less data movement
> required, and you're primarily comparing metadata.
> 
> - Files on the source filesysytem might change (and become internally
> inconsistent) during your rsync, so you should generally sync from a
> snapshot on the source.
In my experience this causes an rsync to exit with a none zero error 
code. See later as to why this is useful. Also it will likely have a 
different mtime that will cause it be resynced on a subsequent run, the 
final one will be with the file system in a "read only" state. Not 
necessarily mounted read only but without anything running that might 
change stuff.

[SNIP]

> 
> - If you decide to do a final "offline" sync, you want it to be fast
> so users can get back to work sooner, so parallelism is usually a
> must.  If you have lots of filesets, then that's a convenient way to
> split the work.

This final "offline" sync is an absolute must, in my experience unless 
you are able to be rather woolly about preserving data.

> 
> - If you have any filesets with many more inodes than the others,
> keep in mind that those will likely take the longest to complete.
> 

Indeed. We found last time that we did an rsync which was for a HPC 
system from the put of woe that is Lustre to GPFS there was huge mileage 
to be hand from telling users that they would get on the new system once 
their data was synced, it would be done on a "per user" basis with the 
priority given to the users with a combination of the smallest amount of 
data and the smallest number of files. Did unbelievable wonders for the 
users to clean up their files. One user went from over 17 million files 
to under 50 thousand! The amount of data needing syncing nearly halved. 
It shrank to ~60% of the pre-announcement size.

> - Test, test, test.  You usually won't get this right on the first go
> or know how long a full sync takes without practice.  Remember that
> you'll need to employ options to delete extraneous files on the
> target when you're syncing over the top of a previous attempt, since
> files intentionally deleted on the source aren't usually welcome if
> they reappear after a migration.
> 

rsync has a --delete option for that.

I am going to add that if you do any sort of ILM/HSM then an rsync is 
going to destroy you ability to identify old files that have not been 
accessed, as the rsync will up date the atime of everything (don't ask 
how I know).

If you have a backup (of course you do) I would strongly recommend 
considering getting your first "pass" from a restore. Firstly it won't 
impact the source file system while it is still in use and second it 
allows you to check your backup actually works :-)

Finally when rsyncing systems like this I use a Perl script with an 
sqlite DB. Basically a list of directories to sync, you can have both 
source and destination to make wonderful things happen if wanted, along 
with a flag field. The way I use that is -1 means not synced, -2 means 
the folder in question is currently been synced, and anything else is 
the exit code of rsync.

If you write the Perl script correctly you can start it on any number of 
nodes, just dump the sqlite DB on a shared folder somewhere (either the 
source or destination file systems work well here). If you are doing it 
in parallel record the node which did the rsync as well it can be useful 
in finding any issues in my experience.

Once everything is done you can quickly check the sqlite DB for none 
zero flag fields to find out what if anything has failed, which gives 
you the confidence that your sync has completed accurately. Also any 
flag fields less than zero show you it's not finished.

Finally you might want to record the time each individual rsync took, 
it's handy for working out that ordering I mentioned :-)

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG