[gpfsug-discuss] Backing up GPFS with Rsync

Ryan Novosielski novosirj at rutgers.edu
Thu Mar 11 16:28:57 GMT 2021


Agreed. Since 5.0.4.1 on the client side (we do rely on it for home directories that are geographically distributed), we have effectively not had any more problems. Our server side are all 5.0.3.2-3. 

--
#BlackLivesMatter
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Mar 11, 2021, at 11:08 AM, Steven Daniels <sadaniel at us.ibm.com> wrote:
> 
> Also, be aware there have been massive improvements in AFM, in terms of usability, reliablity and performance. 
> 
> I just completed a project where we moved about 3/4 PB during 7x24 operations to retire a very old storage system (1st Gen IBM GSS) to a new ESS. We were able to get considerable performance but not without effort, it allowed the client to continue operations and migrate to new hardware seamlessly.
> 
> The new v5.1 AFM feature supports filesystem level AFM which would have greatly simplified the effort and I believe will make AFM vastly easier to implement in the general case. 
> 
> I'll leave it to Venkat and others on the development team to share more details about improvements. 
> 
> 
> Steven A. Daniels
> Cross-brand Client Architect
> Senior Certified IT Specialist
> National Programs
> Fax and Voice: 3038101229
> sadaniel at us.ibm.com
> http://www.ibm.com
> <1A816397.jpg>
> 
> <graycol.gif>Stephen Ulmer ---03/11/2021 06:47:59 AM---Thank you! Would you mind letting me know in what era you made your evaluation? I’m not suggesting y
> 
> From:  Stephen Ulmer <ulmer at ulmer.org>
> To:  gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Cc:  bill.burke.860 at gmail.com
> Date:  03/11/2021 06:47 AM
> Subject:  [EXTERNAL] Re: [gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync
> Sent by:  gpfsug-discuss-bounces at spectrumscale.org
> 
> 
> 
> 
> Thank you! Would you mind letting me know in what era you made your evaluation?
> 
> I’m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.
> 
> Different clients have different requirements, so every implementation could be different. When I add someone else’s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.
> 
> Your original post was very thoughtful, and I appreciate your time.
> 
> -- 
> Stephen
> 
> On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
>  
> Hello Stephen,
> 
> actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.
> 
> The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.
> 
> Kind regards.
> 
> -- 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> 
> On Thu, 2021-03-11 at 08:17 -0500, Stephen Ulmer wrote:
> I’m going to ask what may be a dumb question:
> 
> Given that you have GPFS on both ends, what made you decide to NOT use AFM?
> 
> --  
> Stephen
> 
> 
> On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:
> 
> Hello William,
> 
> I've got your email forwarded my another user and I decided to subscribe to give you my two cents.
> 
> I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is
> easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example
> if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.
> 
> DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me
> enough not to go that route.
> 
> What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just
> build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which
> the ctime changes in the last couple of days (to update metadata info).
> 
> Good luck.
> Kind regards.
> 
> -- 
> 
> Enrico Tagliavini
> Systems / Software Engineer
> 
> enrico.tagliavini at fmi.ch
> 
> Friedrich Miescher Institute for Biomedical Research
> Infomatics
> 
> Maulbeerstrasse 66
> 4058 Basel
> Switzerland
> 
> 
> 
> 
> -------- Forwarded Message --------
> 
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Ryan Novosielski
> Sent: Wednesday, March 10, 2021 3:22 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync
> 
> Yup, you want to use the policy engine:
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
> 
> Something in here ought to help. We do something like this (but I’m reluctant to provide examples as I’m actually suspicious that we
> don’t have it quite right and are passing far too much stuff to rsync).
> 
> --
> #BlackLivesMatter
> ____
> \\UTGERS, |---------------------------*O*---------------------------
> _// the State | Ryan Novosielski - novosirj at rutgers.edu
> \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
> 
> On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com> wrote:
> 
> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I
> could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9
> 
> Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If
> i use the rsync tool it will scan the file system which is 400+ million files. Obviously this will be problematic to complete a
> scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a
> brute force attempt. and it would be nice to know where the delta of files that have changed.
> 
> I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS
> metadata - inodes, modify times, creation times, etc.
> 
> 
> 
> --
> 
> Best Regards,
> 
> William Burke (he/him)
> Lead HPC Engineer
> Advance Research Computing
> 860.255.8832 m | LinkedIn
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=6mf8yZ-lDnfsy3mVONFq1RV1ypXT67SthQnq3D6Ym4Q&m=hSVvvIGpqQhKt_u_TKHdjoXyU-z7P14pCBQ5pA7MMFA&s=g2hkl0Raj7QbLvqRZfDk6nska0crl4Peh4kd8YwiO6k&e= 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list