[gpfsug-discuss] mmbackup questions

Thu Oct 17 19:37:28 BST 2019

Mmbackup uses tsbuhelper internally. This is effectively a diff of the previous and current policy scan. Objects inspected is the count of these files that are changed since the last time and these are the candidates sent to the TSM server.

You mention not being able to upgrade a DSS-G, I thought this has been available for sometime as a special bid process. We did something very complicated with ours at one point. I also thought the "no-upgrade" was related to a support position from IBM on creating additional DAs. You can't add new storage to an DA, but believe it's possible and now supported (I think) to add expansion shelves into a new DA. (I think ESS also supports this). Note that you don't necessarily get the same performance of doing this as if you'd purchased a fully stacked system in the first place. For example if you initially had 166 drives as a two expansion system and then add 84 drives in a new expansion, you now have two DAs, one smaller than the other and neither the same as if you'd originally created it with 250 drives... I don't actually have any benchmarks to prove this, but it was my understanding from various discussions over time.

There are also now both DSS (and ESS) configs with both spinning and SSD enclosures. I assume these aren't special bid only products anymore.

Simon

On 17/10/2019, 19:05, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    On 17/10/2019 15:26, Skylar Thompson wrote:
    > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote:
    >> I have been looking to give mmbackup another go (a very long history
    >> with it being a pile of steaming dinosaur droppings last time I tried,
    >> but that was seven years ago).
    >>
    >> Anyway having done a backup last night I am curious about something
    >> that does not appear to be explained in the documentation.
    >>
    >> Basically the output has a line like the following
    >>
    >>          Total number of objects inspected:      474630
    >>
    >> What is this number? Is it the number of files that have changed since
    >> the last backup or something else as it is not the number of files on
    >> the file system by any stretch of the imagination. One would hope that
    >> it inspected everything on the file system...
    > 
    > I believe this is the number of paths that matched some include rule (or
    > didn't match some exclude rule) for mmbackup. I would assume it would
    > differ from the "total number of objects backed up" line if there were
    > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to
    > decide whether to process.
    >   

    After digging through dsminstr.log it would appear to be the sum of the 
    combination of new, changed and deleted files that mmbackup is going to 
    process. There is some wierd sh*t going on though with mmbackup on the 
    face of it, where it sends one file to the TSM server.

    A line with the total number of files in the file system (aka potential 
    backup candidates) would be nice I think.

    >> Also it appears that the shadow database is held on the GPFS file system
    >> that is being backed up. Is there any way to change the location of that?
    >> I am only using one node for backup (because I am cheap and don't like
    >> paying for more PVU's than I need to) and would like to hold it on the
    >> node doing the backup where I can put it on SSD. Which does to things
    >> firstly hopefully goes a lot faster, and secondly reduces the impact on
    >> the file system of the backup.
    > 
    > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment
    > variable noted in the mmbackup man path:
    > 
    >                    Specifies an alternative directory name for
    >                    storing all temporary and permanent records for
    >                    the backup. The directory name specified must
    >                    be an existing directory and it cannot contain
    >                    special characters (for example, a colon,
    >                    semicolon, blank, tab, or comma).
    > 
    > Which seems like it might provide a mechanism to store the shadow database
    > elsewhere. For us, though, we provide storage via a cost center, so we
    > would want our customers to eat the full cost of their excessive file counts.
    >   

    We have set a file quota of one million for all our users. So far only 
    one users has actually needed it raising. It does however make users 
    come and have a conversation with us about what they are doing. With the 
    one exception they have found ways to do their work without abusing the 
    file system as a database.

    We don't have a SSD storage pool on the file system so moving it to the 
    backup node for which we can add SSD cheaply (I mean really really cheap 
    these days) is more realistic that adding some SSD for a storage pool to 
    the file system. Once I am a bit more familiar with it I will try 
    changing it to the system disks. It's not SSD at the moment but if it 
    works I can easily justify getting some and replacing the existing 
    drives (it would just be two RAID rebuilds away).

    Last time it was brought up you could not add extra shelves to an 
    existing DSS-G system, you had to buy a whole new one. This is despite 
    the servers shipping with a full complement of SAS cards and a large box 
    full of 12Gbps SAS cables (well over £1000 worth at list I reckon) that 
    are completely useless. Ok they work and I could use them elsewhere but 
    frankly why ship them if I can't expand!!!

    >> Anyway a significant speed up (assuming it worked) was achieved but I
    >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load
    >> average never went above one) and we didn't touch the swap despite only
    >> have 24GB of RAM. Though the 10GbE networking did get busy during the
    >> transfer of data to the TSM server bit of the backup but during the
    >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where
    >> not busy either. What options are there for tuning things because I feel
    >> it should be able to go a lot faster.
    > 
    > We have some TSM nodes (corresponding to GPFS filesets) that stress out our
    > mmbackup cluster at the sort step of mmbackup. UNIX sort is not
    > RAM-friendly, as it happens.
    > 

    I have configured more monitoring  of the system, and will watch it over 
    the coming days, but nothing was stressed on our system at all as far as 
    I can tell but it was going slower than I had hoped. It was still way 
    faster than a traditional dsmc incr but I was hoping for more though I 
    am not sure why as the backup now takes place well inside my backup 
    window. Perhaps I am being greedy.

    JAB.

    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss