[gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync

Stephen Ulmer ulmer at ulmer.org
Thu Mar 11 13:17:30 GMT 2021


I’m going to ask what may be a dumb question:

Given that you have GPFS on both ends, what made you decide to NOT use AFM?

 -- 
Stephen


> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>> Sent: Wednesday, March 10, 2021 3:22 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>> 
>> Yup, you want to use the policy engine:
>> 
>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>> 
>> Something in here ought to help. We do something like this (but I’m reluctant to provide examples as I’m actually suspicious that we
>> don’t have it quite right and are passing far too much stuff to rsync).
>> 
>> --
>> #BlackLivesMatter
>> ____
>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>      `'
>> 
>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>> 
>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>> 
>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>> 
>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>> metadata - inodes, modify times, creation times, etc.
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best Regards,
>>> 
>>> William Burke (he/him)
>>> Lead HPC Engineer
>>> Advance Research Computing
>>> 860.255.8832 m | LinkedIn
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/410f7c04/attachment-0002.htm>


More information about the gpfsug-discuss mailing list