<font size=2 face="sans-serif">Diffing file lists can be fast - IF you

keep the file lists sorted by a unique key, e.g. the inode number. <font size=2 face="sans-serif">I believe that's how mmbackup does it.

 Use the classic set difference algorithm.<br></font><br><font size=2 face="sans-serif">Standard diff is designed to do something

else and is terribly slow on large file lists.</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Edward Wahl <ewahl@osc.edu></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">"Simon Thompson

(Research Computing - IT Services)" <S.J.Thompson@bham.ac.uk></font><br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">gpfsug main discussion

list <gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">02/27/2017 03:51 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [gpfsug-discuss]

Tracking deleted files</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><tt><font size=2>I can think of a couple of ways to do this.  But

using snapshots seems heavy, but so does using mmbackup unless you are already running it every day.

<br><br>Diff the shadow files?  Haha could be a _terrible_ idea if you have

a couple<br>hundred million files. But it IS possible. <br><br><br>Next, I'm NOT a tsm expert, but I know a bit about it: (and I probably

stayed at a Holiday Inn express at least once in my heavy travel days) -query objects using '-ina=yes' and yesterdays date? Might be a touch slow.

But<br>it probably uses the next one as it's backend:<br><br>-db2 query inside TSM to see a similar thing.  This ought to be the

fastest,<br>and I'm sure with a little google'ing you can work this out.  Tivoli

MUST know<br>exact dates of deletion as it uses that and the retention time to know<br>when to purge/reclaim deleted objects from it's storage pools.<br>(retain extra version or RETEXTRA or retain only version) <br><br>Ed<br><br>On Mon, 27 Feb 2017 13:32:42 +0000<br>"Simon Thompson (Research Computing - IT Services)" <S.J.Thompson@bham.ac.uk><br>wrote:<br><br>> >It has been discussed in the past, but the way to track stuff

is to<br>> >enable HSM and then hook into the DSMAPI. That way you can see

all the<br>> >file creates and deletes "live".  <br>> <br>> Won't work, I already have a "real" HSM client attached

to DMAPI<br>> (dsmrecalld).<br>> <br>> I'm not actually wanting to backup for this use case, we already have<br>> mmbackup running to do those things, but it was a list of deleted

files<br>> that I was after (I just thought it might be easy given mmbackup is<br>> tracking it already).<br>> <br>> Simon<br>> <br>> _______________________________________________<br>> gpfsug-discuss mailing list<br>> gpfsug-discuss at spectrumscale.org<br>> </font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br><br><br><br>-- <br><br>Ed Wahl<br>Ohio Supercomputer Center<br>614-292-9302<br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><tt><font size=2>http://gpfsug.org/mailman/listinfo/gpfsug-discuss</font></tt></a><tt><font size=2><br><br></font></tt><br><BR>