[gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync

Stephen Ulmer ulmer at ulmer.org
Thu Mar 11 13:47:44 GMT 2021


Thank you! Would you mind letting me know in what era you made your evaluation?

I’m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else’s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 -- 
Stephen


> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> 
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
>  -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
>> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
>> I’m going to ask what may be a dumb question:
>> 
>> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
>> 
>>  -- 
>> Stephen
>> 
>> 
>>> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
>>> 
>>> Hello William,
>>> 
>>> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
>>> 
>>> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
>>> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
>>> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
>>> 
>>> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
>>> enough not to go that route.
>>> 
>>> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
>>> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
>>> the ctime changes in the last couple of days (to update metadata info).
>>> 
>>> Good luck.
>>> Kind regards.
>>> 
>>> -- 
>>> 
>>> Enrico Tagliavini
>>> Systems / Software Engineer
>>> 
>>> enrico.tagliavini at fmi.ch
>>> 
>>> Friedrich Miescher Institute for Biomedical Research
>>> Infomatics
>>> 
>>> Maulbeerstrasse 66
>>> 4058 Basel
>>> Switzerland
>>> 
>>> 
>>> 
>>> 
>>> -------- Forwarded Message --------
>>>> 
>>>> -----Original Message-----
>>>> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
>>>> Sent: Wednesday, March 10, 2021 3:22 AM
>>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>>> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
>>>> 
>>>> Yup, you want to use the policy engine:
>>>> 
>>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>>>> 
>>>> Something in here ought to help. We do something like this (but I’m reluctant to provide examples as I’m actually suspicious that we
>>>> don’t have it quite right and are passing far too much stuff to rsync).
>>>> 
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>>>> \\UTGERS,   |---------------------------*O*---------------------------
>>>>>> _// the State |         Ryan Novosielski - novosirj at rutgers.edu
>>>>>> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>>>>  \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark
>>>>      `'
>>>> 
>>>>>> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
>>>>> 
>>>>>  I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
>>>>> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
>>>>> 
>>>>> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
>>>>> i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a
>>>>> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
>>>>> brute force attempt. and it would be nice to know where the delta of files that have changed.
>>>>> 
>>>>> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
>>>>> metadata - inodes, modify times, creation times, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> William Burke (he/him)
>>>>> Lead HPC Engineer
>>>>> Advance Research Computing
>>>>> 860.255.8832 m | LinkedIn
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/b1e24f88/attachment-0002.htm>


More information about the gpfsug-discuss mailing list