[gpfsug-discuss] Backing up GPFS with Rsync

Simon Thompson S.J.Thompson at bham.ac.uk
Wed Mar 10 19:09:13 GMT 2021


I was looking for the original source for this, but it was on dev works ... which is now dead.

But you can use something like:

tsbuhelper clustermigdiff \
$migratePath/.mmmigrateCfg/mmmigrate.list.v${prevFileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.latest.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.changed.v${fileCount}.filelist \
$migratePath/.mmmigrateCfg/mmmigrate.list.deleted.v${fileCount}.filelist

"mmmigrate.list.latest.filelist" would be the output of a policyscan of your files today
"mmmigrate.list.v${prevFileCount}.filelist" is yesterday's policyscan

This then generates the changed and deleted list of files for you. tsbuhelper is what is used internally in mmbackup, though is not very documented...

We actually used something along these lines to support migrating between file-systems (generate daily diffs and sync those). The policy scan uses:

RULE EXTERNAL LIST 'latest.filelist' EXEC '' \
 RULE 'FilesToMigrate' LIST 'latest.filelist' DIRECTORIES_PLUS \
 SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || \
 VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || \
 ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' \
 WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' \
 ELSE 'resdnt' END )) \
 WHERE \
 ( \
 NOT \
 ( (PATH_NAME LIKE '/%/.mmbackup%') OR \
 (PATH_NAME LIKE '/%/.mmmigrate%') OR \
 (PATH_NAME LIKE '/%/.afm%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR \
 (PATH_NAME LIKE '/%/.mmLockDir/%') OR \
 (MODE LIKE 's%') \
 ) \
 ) \
 AND \
 (MISC_ATTRIBUTES LIKE '%u%') \
 AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) \
 AND (NOT (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.SpaceMan/%'))

On our file-system, both the scan and diff took a long time (hours), but hundreds of millions of files.

This comes with no warranty ...

We don't use this for backup, Spectrum Protect and mmbackup are our friends ...

Simon

On 10/03/2021, 02:22, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski" <gpfsug-discuss-bounces at spectrumscale.org on behalf of novosirj at rutgers.edu> wrote:

    Yup, you want to use the policy engine:

    https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm

    Something in here ought to help. We do something like this (but I’m reluctant to provide examples as I’m actually suspicious that we don’t have it quite right and are passing far too much stuff to rsync).

    --
    #BlackLivesMatter
    ____
    || \\UTGERS,  	 |---------------------------*O*---------------------------
    ||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
    || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
    ||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
         `'

    > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
    > 
    >  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
    > 
    > Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a brute force attempt. and it would be nice to know where the delta of files that have changed.
    > 
    > I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS metadata - inodes, modify times, creation times, etc.
    > 
    > 
    > 
    > -- 
    > 
    > Best Regards,
    > 
    > William Burke (he/him)
    > Lead HPC Engineer
    > Advance Research Computing
    > 860.255.8832 m | LinkedIn
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list