<div dir="ltr">so lets start with some simple questions. <div><br></div><div>when you say mmbackup takes ages, what version of gpfs code are you running ? </div><div>how do you execute the mmbackup command ? exact parameters would be useful . </div><div>what HW are you using for the metadata disks ? </div><div>how much capacity (df -h) and how many inodes (df -i) do you have in the filesystem you try to backup ?</div><div><br></div><div>sven</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek <span dir="ltr"><<a href="mailto:xhejtman@ics.muni.cz" target="_blank">xhejtman@ics.muni.cz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

<br>

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:<br>

> Find the paper here:<br>

><br>

> <a href="https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection" rel="noreferrer" target="_blank">https://www.ibm.com/<wbr>developerworks/community/<wbr>wikis/home?lang=en#!/wiki/<wbr>Tivoli%20Storage%20Manager/<wbr>page/Petascale%20Data%<wbr>20Protection</a><br>

<br>

thank you for the paper, I appreciate it.<br>

<br>

However, I wonder whether it could be extended a little. As it has the title<br>

Petascale Data Protection, I think that in Peta scale, you have to deal with<br>

millions (well rather hundreds of millions) of files you store in and this is<br>

something where TSM does not scale well.<br>

<br>

Could you give some hints:<br>

<br>

On the backup site:<br>

mmbackup takes ages for:<br>

a) scan (try to scan 500M files even in parallel)<br>

b) backup - what if 10 % of files get changed - backup process can be blocked<br>

several days as mmbackup cannot run in several instances on the same file<br>

system, so you have to wait until one run of mmbackup finishes. How long could<br>

it take at petascale?<br>

<br>

On the restore site:<br>

how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'<br>

runs into serious troubles after say 20M files (maybe wrong internal<br>

structures used), however, scanning 1000 more files takes several minutes<br>

resulting the dsmc restore never reaches that 40M files.<br>

<br>

using filelists the situation is even worse. I run dsmc restore -filelist<br>

with a filelist consisting of 2.4M files. Running for *two* days without<br>

restoring even a single file. dsmc is consuming 100 % CPU.<br>

<br>

So any hints addressing these issues with really large number of files would<br>

be even more appreciated.<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Lukáš Hejtmánek<br>

______________________________<wbr>_________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/<wbr>listinfo/gpfsug-discuss</a><br>

</font></span></blockquote></div><br></div>