[gpfsug-discuss] Backing up GPFS with Rsync

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Wed Mar 10 15:15:58 GMT 2021


On 10/03/2021 02:59, Alec wrote:
> CAUTION: This email originated outside the University. Check before 
> clicking links or attachments.
> You would definitely be able to search by inode creation date and find 
> the files you want... our 1.25m file filesystem takes about 47 seconds 
> to query...  One thing I would worry about though is inode deletion and 
> inter-fileset file moves.   The SQL based engine wouldn't be able to 
> identify those changes and so you'd not be able to replicate deletes and 
> such.
> 

This is the problem with rsync "backups", you need to run it with 
--delete otherwise any restore will "upset" your users as they find 
large numbers of file they had deleted unhelpfully "restored"

> Alternatively....
> I have a script that runs in about 4 minutes and it pulls all the data 
> out of the backup indexes, and compares the pre-built hourly file index 
> on our system and identifies files that don't exist in the backup, so I 
> have a daily backup validation...  I filter the file list using 
> ksh's printf date manipulation to filter out files that are less than 2 
> days old, to reduce the noise.  A modification to this could simply 
> compare a daily file index with the previous day's index, and send rsync 
> a list of files (existing or deleted) based on just a delta of the two 
> indexes (sort|diff), then you could properly account for all the 
> changes.  If you don't care about file modifications just produce both 
> lists based on creation time instead of modification time.  The mmfind 
> command or GPFS policy engine should be able to produce a full file 
> list/index very rapidly.
> 

My view would be somewhere along the lines of this is a lot of work and 
if you have the space to rsync your GPFS file system to, presumably with 
a server attached to said storage then for under 500 PVU of Spectrum 
Protect licensing you can have a fully supported client/server Spectrum 
Protect/TSM backup solution and just use mmbackup.

You need to play the game and use older hardware ;-) I use an ancient 
pimped out Dell PowerEdge R300 as my TSM client node. Why this old, well 
it has a dual core Xeon E3113 for only 100 PVU. Anything newer would be 
quad core and 70 PVU per core which would cost an additional ~$1000 in 
licensing.

If it breaks down they are under $100 on eBay. It's never skipped a beat 
and I have just finished a complete planned restore of our DSS-G using it.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list