[gpfsug-discuss] Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ?
Sven Oehme
oehmes at us.ibm.com
Fri Mar 21 21:39:57 GMT 2014
Hi,
there are various ways to speed up backup of data from GPFS to TSM (which
btw is the same problem for most other backup solutions as well), but
first one needs to find out what the problem is as it can be multiple
completely independent issues and each of them needs a different solution,
so let me explain the different issues and also what you can do about
them.
the very first challenge is to find what data has changed. the way TSM
does this is by crawling trough your filesystem, looking at mtime on each
file to find out which file has changed. think about a ls -Rl on your
filesystem root. this can, depending on how many files you have, take days
in a large scale environment (think 100's of millions of files). there is
very little one can speed up on this process the way its done. all you can
do is put metadata on faster disks (e.g. SSD) and that will improve the
speed of this 'scan phase'. an alternative is to not do this scan at all
with the TSM client, but instead let GPFS find out for TSM what files have
changed and then share this information with TSM . the function/command in
GPFS to do so is called mmbackup :
https://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r5.gpfs100.doc%2Fbl1adm_backupusingmmbackup.htm
it essentially traverses the GPFS metadata sequentially and in parallel
across all nodes and filters out files that needs to be backed up.
in several customer environments i was called to assist with issues like
this, this change alone did speed up the backup process multiple orders of
magnitudes. we had a few customers where this change reduced the scan time
from days down to minutes. its not always this big, but its usually the
largest chunk of the issue.
the 2nd challenge is if you have to backup a very large number (millions)
of very small (<32k) files.
the main issue here is that for each file TSM issues a random i/o to GPFS,
one at a time, so your throughput directly correlates with size of the
files and latency for a single file read operation. if you are not on 3.5
TL3 and/or your files don't fit into the inode its actually even 2 random
i/os that are issued as you need to read the metadata followed by the data
block for the file.
in this scenario you can only do 2 things :
1. parallelism - mmbackup again starts multiple processes in parallel to
speed up this phase of the backup
2. use a 'helper' process to prefetch data for a single TSM client so all
data comes out of cache and the latency for the random reads is eliminated
to increase throughput.
without any of this seeing only a few low MB/sec is not uncommon for
customers, but with the changes above you are able to backup very large
quantities of data.
hope this helps. Sven
------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
IBM Almaden Research Lab
------------------------------------------
From: Sabuj Pattanayek <sabujp at gmail.com>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date: 03/20/2014 06:39 PM
Subject: Re: [gpfsug-discuss] Is TSM/HSM 7.1 compatible with GPFS
3.5.0.12 ?
Sent by: gpfsug-discuss-bounces at gpfsug.org
We're using tsm 7.1 with gpfs 3.5.0.11. At some point we do want to enable
the HSM features but haven't had time to properly configure/set them up
yet. I had dmapi enabled on GPFS but was never able to bring it up with
dmapi enabled. Everything wasn't properly configured at the time and we
were missing some pieces (not my post, but same issue) :
https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014622591
I'd say that we are having less than optimal performance with TSM however.
We're only able to pull about 4TB a day. It took us 20 days to backup 80TB
for our initial full. Using rsync/tar piped to tar would probably have
taken less than a week. We tried various methods, e.g. using a fast
intermediate diskpool, going simultaneously to our 6 LTO6 tape drives, etc
but each "cp" (tsm client) process that TSM would use seemed to be very
slow. We tweaked just about every setting to optimize performance but to
really no avail. When going to the disk pool this is what should have
happened :
GPFS => relatively fast random I/O (on par with rsync/tar piped to tar)
tsm disk cache => large sequential I/O's for each disk pool volume => tape
this is what really happened
GPFS => slow random I/O => tsm disk pool cache => slow random I/O => tape
so instead we did :
GPFS => slow random I/O (TSM) => tape
..but was the same speed as going through the tsm disk pool cache. We
closely monitored the network, disk, memory, cpu, on the tsm server and
none of the hardware or capabilities of the server were the bottleneck, it
was all in TSM.
If anyone has seen this sort of behavior and has some pointers/hints at
improving performance I'd be glad to hear it.
Thanks,
Sabuj
On Thu, Mar 20, 2014 at 5:21 PM, Grace Tsai <gtsai at slac.stanford.edu>
wrote:
Hi,
Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ?
Thanks.
Grace
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140321/d1581e61/attachment-0003.htm>
More information about the gpfsug-discuss
mailing list