[gpfsug-discuss] TSM and re-linked filesets

Marc A Kaplan makaplan at us.ibm.com
Wed May 4 17:14:44 BST 2016


I think you found your answer:  TSM tracks files by pathname. 

So... if a file had path /w/x/y/z on Monday.  But was moved to /w/x/q/p on 
Tuesday, how would TSM "know" it was the same file...?
It wouldn't! To TSM it seems you've deleted the first and created the 
second.

Technically there are some other possibilities, and some backup systems 
may use them, but NOT TSM:

1) Record the inode number and generation number and/or creation 
timestamp.   Within a given Posix-ish file system, that uniquely 
identifies the file.

2) Record a strong (cryptographic quality) checksum (hash) of the contents 
of the file.  If two files have the same checksum (hash) then the odds are
we can use the same backup data for both and don't have to make an extra 
copy in the backup system.    To make the odds really, really "long" you 
want to
take into account the "birthday paradox" and use lots and lots of bits. 
Long odds can also be compared to the probability of losing a file due to 
a bug
or an IO error or accident or disaster...

For example SHA-256, might be strong and long enough for you to believe 
in.
Backup is not generally a cryptographic game, so perhaps you should not 
worry much about some evil doer purposely trying to confound your backup 
system.
OTOH - if you have users who are adversaries, all backing up into the same 
system... In theory one might "destroy" another's backup.

This save transmission and storage of duplicates, but of course the backup 
system has to read the contents of each suspected new file and compute the 
hash...



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160504/b5aca332/attachment-0002.htm>


More information about the gpfsug-discuss mailing list