[gpfsug-discuss] Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ?

Fri Mar 21 21:39:57 GMT 2014

Hi,

there are various ways to speed up backup of data from GPFS to TSM (which 
btw is the same problem for most other backup solutions as well), but 
first one needs to find out what the problem is as it can be multiple 
completely independent issues and each of them needs a different solution, 
so let me explain the different issues and also what you can do about 
them.

the very first challenge is to find what data has changed. the way TSM 
does this is by crawling trough your filesystem, looking at mtime on each 
file to find out which file has changed. think about a ls -Rl on your 
filesystem root. this can, depending on how many files you have, take days 
in a large scale environment (think 100's of millions  of files). there is 
very little one can speed up on this process the way its done. all you can 
do is put metadata on faster disks (e.g. SSD) and that will improve the 
speed of this 'scan phase'. an alternative is to not do this scan at all 
with the TSM client, but instead let GPFS find out for TSM what files have 
changed and then share this information with TSM . the function/command in 
GPFS to do so is called mmbackup : 

https://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r5.gpfs100.doc%2Fbl1adm_backupusingmmbackup.htm

it essentially traverses the GPFS metadata sequentially and in parallel 
across all nodes and filters out files that needs to be backed up. 
in several customer environments i was called to assist with issues like 
this, this change alone did speed up the backup process multiple orders of 
magnitudes. we had a few customers where this change reduced the scan time 
from days down to minutes. its not always this big, but its usually the 
largest chunk of the issue. 

the 2nd challenge is if you have to backup a very large number (millions) 
of very small (<32k) files. 
the main issue here is that for each file TSM issues a random i/o to GPFS, 
one at a time, so your throughput directly correlates with size of the 
files and latency for a single file read operation. if you are not on 3.5 
TL3 and/or your files don't fit into the inode its actually even 2 random 
i/os that are issued as you need to read the metadata followed by the data 
block for the file. 
in this scenario you can only do 2 things : 

1. parallelism - mmbackup again starts multiple processes in parallel to 
speed up this phase of the backup 
2. use a 'helper' process to prefetch data for a single TSM client so all 
data comes out of cache and the latency for the random reads is eliminated 
to increase throughput.

without any of this seeing only a few low MB/sec is not uncommon for 
customers, but with the changes above you are able to backup very large 
quantities of data. 

hope this helps. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
IBM Almaden Research Lab 
------------------------------------------

From:   Sabuj Pattanayek <sabujp at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   03/20/2014 06:39 PM
Subject:        Re: [gpfsug-discuss] Is TSM/HSM 7.1 compatible with GPFS 
3.5.0.12 ?
Sent by:        gpfsug-discuss-bounces at gpfsug.org

We're using tsm 7.1 with gpfs 3.5.0.11. At some point we do want to enable 
the HSM features but haven't had time to properly configure/set them up 
yet. I had dmapi enabled on GPFS but was never able to bring it up with 
dmapi enabled. Everything wasn't properly configured at the time and we 
were missing some pieces (not my post, but same issue) :

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014622591

I'd say that we are having less than optimal performance with TSM however. 
We're only able to pull about 4TB a day. It took us 20 days to backup 80TB 
for our initial full. Using rsync/tar piped to tar would probably have 
taken less than a week. We tried various methods, e.g. using a fast 
intermediate diskpool, going simultaneously to our 6 LTO6 tape drives, etc 
but each "cp" (tsm client) process that TSM would use seemed to be very 
slow. We tweaked just about every setting to optimize performance but to 
really no avail. When going to the disk pool this is what should have 
happened :

GPFS => relatively fast random I/O (on par with rsync/tar piped to tar) 
tsm disk cache => large sequential I/O's for each disk pool volume => tape

this is what really happened

GPFS => slow random I/O => tsm disk pool cache => slow random I/O => tape

so instead we did :

GPFS => slow random I/O (TSM) => tape

..but was the same speed as going through the tsm disk pool cache. We 
closely monitored the network, disk, memory, cpu, on the tsm server and 
none of the hardware or capabilities of the server were the bottleneck, it 
was all in TSM.

If anyone has seen this sort of behavior and has some pointers/hints at 
improving performance I'd be glad to hear it.

Thanks,
Sabuj

On Thu, Mar 20, 2014 at 5:21 PM, Grace Tsai <gtsai at slac.stanford.edu> 
wrote:
Hi,

Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ?

Thanks.

Grace
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140321/d1581e61/attachment-0003.htm>