[gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

Lukas Hejtmanek xhejtman at ics.muni.cz
Tue Aug 30 23:02:50 BST 2016


Hello,

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> Find the paper here:
> 
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection

thank you for the paper, I appreciate it. 

However, I wonder whether it could be extended a little. As it has the title
Petascale Data Protection, I think that in Peta scale, you have to deal with
millions (well rather hundreds of millions) of files you store in and this is
something where TSM does not scale well.

Could you give some hints:

On the backup site:
mmbackup takes ages for:
a) scan (try to scan 500M files even in parallel)
b) backup - what if 10 % of files get changed - backup process can be blocked
several days as mmbackup cannot run in several instances on the same file
system, so you have to wait until one run of mmbackup finishes. How long could
it take at petascale?

On the restore site:
how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'
runs into serious troubles after say 20M files (maybe wrong internal
structures used), however, scanning 1000 more files takes several minutes
resulting the dsmc restore never reaches that 40M files. 

using filelists the situation is even worse. I run dsmc restore -filelist
with a filelist consisting of 2.4M files. Running for *two* days without
restoring even a single file. dsmc is consuming 100 % CPU. 

So any hints addressing these issues with really large number of files would
be even more appreciated.

-- 
Lukáš Hejtmánek



More information about the gpfsug-discuss mailing list