[gpfsug-discuss] TSM and re-linked filesets
Marc A Kaplan
makaplan at us.ibm.com
Wed May 4 17:14:44 BST 2016
I think you found your answer: TSM tracks files by pathname.
So... if a file had path /w/x/y/z on Monday. But was moved to /w/x/q/p on
Tuesday, how would TSM "know" it was the same file...?
It wouldn't! To TSM it seems you've deleted the first and created the
second.
Technically there are some other possibilities, and some backup systems
may use them, but NOT TSM:
1) Record the inode number and generation number and/or creation
timestamp. Within a given Posix-ish file system, that uniquely
identifies the file.
2) Record a strong (cryptographic quality) checksum (hash) of the contents
of the file. If two files have the same checksum (hash) then the odds are
we can use the same backup data for both and don't have to make an extra
copy in the backup system. To make the odds really, really "long" you
want to
take into account the "birthday paradox" and use lots and lots of bits.
Long odds can also be compared to the probability of losing a file due to
a bug
or an IO error or accident or disaster...
For example SHA-256, might be strong and long enough for you to believe
in.
Backup is not generally a cryptographic game, so perhaps you should not
worry much about some evil doer purposely trying to confound your backup
system.
OTOH - if you have users who are adversaries, all backing up into the same
system... In theory one might "destroy" another's backup.
This save transmission and storage of duplicates, but of course the backup
system has to read the contents of each suspected new file and compute the
hash...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160504/b5aca332/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list