[gpfsug-discuss] Fwd: FW: Backing up GPFS with Rsync

Tagliavini, Enrico enrico.tagliavini at fmi.ch
Thu Mar 11 14:24:43 GMT 2021


We evaluated AFM multiple times. The first time was in 2017 with Spectrum Scale 4.2 . When we switched to Spectrum Scale 5 not long ago we also re-evaluated AFM.

The horror stories about data loss are becoming more rare with modern setups, especially in the non DR case scenario. However AFM is still a very complicated tool, way to complicated if what you are looking for is a "simple" rsync style backup (but faster). The 3000+ pages of documentation for GPFS do not help our small team and many of those pages are dedicated to just AFM.

The performance problem is also still a real issue with modern versions as far as I was told. We can have a quite erratic data turnover in our setup, tied to very big scientific instruments capable of generating many TB of data per hour. Having good performance is important. I used the same tool we use for backups also to migrate the data from the old storage to the new storage (and from GPFS 4 to GPFS 5), and I managed to reach speeds of 17 - 19 GB / s data transfer (when hitting big files that is) using only two servers equipped with Infiniband EDR. I made a simple script to parallelize rsync to make it faster: https://github.com/fmi-basel/splitrsync . Combined with another program using the policy engine to generate the file list to avoid the painful crawling.

As I said we are a small team, so we have to be efficient. Developing that tool costed me time, but the ROI is there as I can use the same tool with non GPFS powered storage system, and we had many occasions where this was the case, for example when moving data from old system to be decommissioned to the GPFS storage.

And I would like to finally mention another hot topic: who says we will be on GPFS forever? The recent licensing change would probably destroy our small IT budget and we would not be able to afford Spectrum Scale any longer. We might be forced to switch to a cheaper solution. At least this way we can carry some of the code we wrote with us. With AFM we would have to start from scratch. Originally we were not really planning to move as we didn't expect this change in licensing with the associated increased cost. But now, this turns out to be a small time saver if we indeed have to switch.

Kind regards.


--


Enrico Tagliavini
Systems / Software Engineer

enrico.tagliavini at fmi.ch

Friedrich Miescher Institute for Biomedical Research
Infomatics

Maulbeerstrasse 66
4058 Basel
Switzerland





On Thu, 2021-03-11 at 08:47 -0500, Stephen Ulmer wrote:
Thank you! Would you mind letting me know in what era you made your evaluation?

I’m not suggesting you should change anything at all, but when I make recommendations for my own customers I like to be able to associate the level of GPFS with the anecdotes. I view the software as more of a stream of features and capabilities than as a set product.

Different clients have different requirements, so every implementation could be different. When I add someone else’s judgement to my own, I just like getting as close to their actual evaluation scenario as possible.

Your original post was very thoughtful, and I appreciate your time.

 --
Stephen


On Mar 11, 2021, at 7:58 AM, Tagliavini, Enrico <enrico.tagliavini at fmi.ch> wrote:


Hello Stephen,

actually not a dumb question at all. We evaluated AFM quite a bit before turning it down.

The horror stories about it and massive data loss are too scary. Plus we had actual reports of very bad performance. Personally I think AFM is very complicated, overcomplicated for what we need. We need the data safe, we don't need active / active DR or anything like that. While AFM can technically do what we need the complexity of its design makes it too easy to make a mistake and cause a service disruption or, even worst, data loss. We are a very small institute with a small IT team, so investing time in making it right was also not really worth it due to the high TCO.

Kind regards.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210311/fe9b528f/attachment-0002.htm>


More information about the gpfsug-discuss mailing list