<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">I’m going to ask what may be a dumb question:<div><br>Given that you have GPFS on both ends, what made you decide to NOT use AFM?</div><div><br><div dir="ltr"> <span style="text-align: -webkit-auto; orphans: 2; widows: 2; background-color: rgba(255, 255, 255, 0);">-- </span><div class="" style="orphans: 2; widows: 2; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div style="orphans: auto; widows: auto;"><span style="background-color: rgba(255, 255, 255, 0);">Stephen</span></div><div style="orphans: auto; widows: auto;"><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div></div></div><div dir="ltr"><br><blockquote type="cite">On Mar 11, 2021, at 3:56 AM, Tagliavini, Enrico <enrico.tagliavini@fmi.ch> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><span>Hello William,</span><br><span></span><br><span>I've got your email forwarded my another user and I decided to subscribe to give you my two cents.</span><br><span></span><br><span>I would like to warn you about the risk of dong what you have in mind. Using the GPFS policy engine to get a list of file to rsync is</span><br><span>easily going to get you with missing data in the backup. The problem is that there are cases that are not covered by it. For example</span><br><span>if you mv a folder with a lot of nested subfolders and files none of the subfolders would show up in your list of files to be updated.</span><br><span></span><br><span>DM API would be the way to go, as you could replicate the mv on the backup side, but you must not miss any event, which scares me</span><br><span>enough not to go that route.</span><br><span></span><br><span>What I ended up doing instead: we run GPFS on both side, main and backup storage. So I use the policy engine on both sides and just</span><br><span>build up the differences. We have about 250 million files and this is surprisingly fast. On top of that add all the files for which</span><br><span>the ctime changes in the last couple of days (to update metadata info).</span><br><span></span><br><span>Good luck.</span><br><span>Kind regards.</span><br><span></span><br><span>-- </span><br><span></span><br><span>Enrico Tagliavini</span><br><span>Systems / Software Engineer</span><br><span></span><br><span>enrico.tagliavini@fmi.ch</span><br><span></span><br><span>Friedrich Miescher Institute for Biomedical Research</span><br><span>Infomatics</span><br><span></span><br><span>Maulbeerstrasse 66</span><br><span>4058 Basel</span><br><span>Switzerland</span><br><span></span><br><span></span><br><span></span><br><span></span><br><span>-------- Forwarded Message --------</span><br><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>-----Original Message-----</span><br></blockquote><blockquote type="cite"><span>From: gpfsug-discuss-bounces@spectrumscale.org <gpfsug-discuss-bounces@spectrumscale.org> On Behalf Of Ryan Novosielski</span><br></blockquote><blockquote type="cite"><span>Sent: Wednesday, March 10, 2021 3:22 AM</span><br></blockquote><blockquote type="cite"><span>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org></span><br></blockquote><blockquote type="cite"><span>Subject: Re: [gpfsug-discuss] Backing up GPFS with Rsync</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Yup, you want to use the policy engine:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Something in here ought to help. We do something like this (but I’m reluctant to provide examples as I’m actually suspicious that we</span><br></blockquote><blockquote type="cite"><span>don’t have it quite right and are passing far too much stuff to rsync).</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>--</span><br></blockquote><blockquote type="cite"><span>#BlackLivesMatter</span><br></blockquote><blockquote type="cite"><span>____</span><br></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>\\UTGERS,   |---------------------------*O*---------------------------</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_// the State |         Ryan Novosielski - novosirj@rutgers.edu</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>\\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> \\    of NJ | Office of Advanced Research Computing - MSB C630, Newark</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><span>     `'</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860@gmail.com> wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span> I would like to know what files were modified/created/deleted (only for the current day) on the GPFS's file system so that I</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>could rsync ONLY those files to a predetermined external location. I am running GPFS 4.2.3.9</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Is there a way to access the GPFS's metadata directly so that I do not have to traverse the filesystem looking for these files? If</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>i use the rsync tool it will scan the file system which is 400+ million files.  Obviously this will be problematic to complete a</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>scan in a day, if it would ever complete single-threaded. There are tools or scripts that run multithreaded rsync but it's still a</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>brute force attempt. and it would be nice to know where the delta of files that have changed.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>I began looking at Spectrum Scale Data Management (DM) API but I am not sure if this is the best approach to looking at the GPFS</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>metadata - inodes, modify times, creation times, etc.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>--</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Best Regards,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>William Burke (he/him)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Lead HPC Engineer</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Advance Research Computing</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>860.255.8832 m | LinkedIn</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>gpfsug-discuss mailing list</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>gpfsug-discuss at spectrumscale.org</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>_______________________________________________</span><br></blockquote><blockquote type="cite"><span>gpfsug-discuss mailing list</span><br></blockquote><blockquote type="cite"><span>gpfsug-discuss at spectrumscale.org</span><br></blockquote><blockquote type="cite"><span>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span><br></blockquote><span>_______________________________________________</span><br><span>gpfsug-discuss mailing list</span><br><span>gpfsug-discuss at spectrumscale.org</span><br><span>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span><br></div></blockquote></div></body></html>