From stuartb at 4gh.net Wed Feb 6 18:38:56 2013 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 6 Feb 2013 13:38:56 -0500 (EST) Subject: [gpfsug-discuss] GPFS snapshot cron job Message-ID: I'm new on this list. It looks like it can be useful for exchanging GPFS experiences. We have been running GPFS for a couple of years now on one cluster and are in process of bringing it up on a couple of other clusters. One thing we would like, but have not had time to do is automatic snapshots similar to what NetApp does. For our purposes a cron job that ran every 4 hours that creates a new snapshot and removes older snapshots would be sufficient. The slightly hard task is correctly removing the older snapshots. Does anyone have such a cron script they can share? Or did I miss something in GPFS that handles automatic snapshots? Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From pete at realisestudio.com Wed Feb 6 19:28:40 2013 From: pete at realisestudio.com (Pete Smith) Date: Wed, 6 Feb 2013 19:28:40 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: Hi rsnapshot is probably what you're looking for. :-) On 6 Feb 2013 18:39, "Stuart Barkley" wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Feb 6 19:40:49 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 06 Feb 2013 19:40:49 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: <5112B1C1.3080403@buzzard.me.uk> On 06/02/13 18:38, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > Find attached a Perl script that does just what you want with a range of configurable parameters. It is intended to create snapshots that work with the Samba VFS module shadow_copy2 so that you can have a previous versions facility on your Windows boxes. Note it creates a "quiescent" lock that interacted with another script that was called to do a policy based tiering from fast disks to slow disks. That gets called based on a trigger for a percentage of the fast disk pool being full, and consequently can get called at any time. If the tiering is running then trying to take a snapshot at the same time will lead to race conditions and the file system will deadlock. Note that if you are creating snapshots in the background then a whole range of GPFS commands if run at the moment the snapshot is being created or deleted will lead to deadlocks. > Or did I miss something in GPFS that handles automatic snapshots? Yeah what you missed is that it will randomly lock your file system up. So while the script I have attached is all singing and all dancing. It has never stayed in production for very long. On a test file system that has little activity it runs for months without a hitch. When rolled out on busy file systems with in a few days we would a deadlock waiting for some file system quiescent state and everything would grind to a shuddering halt. Sometimes on creating the snapshot and sometimes on deleting them. Unless there has been a radical change in GPFS in the last few months, you cannot realistically do what you want. IBM's response was that you should not be taking snapshots or deleting old ones while the file system is "busy". Not that I would have thought the file system would have been that "busy" at 07:00 on a Saturday morning, but hey. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. -------------- next part -------------- A non-text attachment was scrubbed... Name: shadowcopy.pl Type: application/x-perl Size: 5787 bytes Desc: not available URL: From erich at uw.edu Wed Feb 6 19:45:05 2013 From: erich at uw.edu (Eric Horst) Date: Wed, 6 Feb 2013 11:45:05 -0800 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: It's easy if you use a chronologically sortable naming scheme. We use YYYY-MM-DD-hhmmss. This is a modified excerpt from the bash script I use. The prune function takes an arg of the number of snapshots to keep. SNAPROOT=/grfs/ud00/.snapshots function prune () { PCPY=$1 for s in $(/bin/ls -d "$SNAPROOT"/????-??-??-?????? | head --lines=-$PCPY); do mmdelsnapshot $FSNAME $s if [ $? != 0 ]; then echo ERROR: there was a mmdelsnapshot problem $? exit else echo Success fi done } echo Pruning snapshots prune 12 -Eric On Wed, Feb 6, 2013 at 10:38 AM, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bergman at panix.com Wed Feb 6 21:28:30 2013 From: bergman at panix.com (bergman at panix.com) Date: Wed, 06 Feb 2013 16:28:30 -0500 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: Your message of "Wed, 06 Feb 2013 13:38:56 EST." References: Message-ID: <20647.1360186110@localhost> In the message dated: Wed, 06 Feb 2013 13:38:56 -0500, The pithy ruminations from Stuart Barkley on <[gpfsug-discuss] GPFS snapshot cron job> were: => I'm new on this list. It looks like it can be useful for exchanging => GPFS experiences. => => We have been running GPFS for a couple of years now on one cluster and => are in process of bringing it up on a couple of other clusters. => => One thing we would like, but have not had time to do is automatic => snapshots similar to what NetApp does. For our purposes a cron job => that ran every 4 hours that creates a new snapshot and removes older => snapshots would be sufficient. The slightly hard task is correctly => removing the older snapshots. => => Does anyone have such a cron script they can share? Yes. I've attached the script that we run from cron. Our goal was to keep a decaying set of snapshots over a fairly long time period, so that users would be able to recover from "rm", while not using excessive space. Snapshots are named with timestamp, making it slightly easier to understand what data they contain and the remove the older ones. The cron job runs every 15 minutes on every GPFS server node, but checks if it is executing on the node that is the manager for the specified filesystem to avoid concurrency issues. The script will avoid making a snapshot if there isn't sufficient disk space. Our config file to manage snapshots is: ------------ CUT HERE -- CUT HERE -------------- case $1 in home) intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval MINFREE=5 # minimum free disk space, in percent ;; shared) intervals=(1 4 48) # hour number of each interval counts=(4 2 2) # max number of snapshots to keep per each interval MINFREE=20 # minimum free disk space, in percent ;; esac ------------ CUT HERE -- CUT HERE -------------- For the "home" filesystem, this says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 4 snapshots made in the most recent 4 hr interval (1 for each hour) keep 4 snapshots made in the most recent 24 hr interval (1 each 6hrs) keep 4 snapshots made in the most recent 48 hr interval (1 each 12 hrs) For the "shared" filesystem, the configuration says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 2 snapshots made in the most recent 4 hr interval (1 each 2 hours) keep 2 snapshots made in the most recent 48 hr interval (1 each 24 hrs) Those intervals "overlap", so there are a lot of recent snapshots, and fewer older ones. Each time a snapshot is made, older snapshots may be removed. So, at 5:01 PM on Thursday, there may be snapshots of the "home" filesystem from: 17:00 Thursday ---+-- 4 in the last hour 16:45 Thursday | 16:30 Thursday | 16:15 Thursday -- + 16:00 Thursday ---+-- 4 in the last 4 hours, including 15:00 Thursday | the 5:00PM Thursday snapshot 14:00 Thursday ---+ 11:00 Thursday ---+-- 4 in the last 24 hours, including 05:00 Thursday | 17:00 Thursday 23:00 Wednesday ---+ 17:00 Wednesday ---+-- 4 @ 12-hr intervals in the last 48 hours, 05:00 Wednesday ---+ including 17:00 & 05:00 Thursday Suggestions and patches are welcome. => => Or did I miss something in GPFS that handles automatic snapshots? We have seen periodic slowdowns when snapshots are running, but nothing to the extent described by Jonathan Buzzard. Mark => => Thanks, => Stuart Barkley => -- => I've never been lost; I was once bewildered for three days, but never lost! => -- Daniel Boone -------------- next part -------------- #! /bin/bash #$Id: snapshotter 858 2012-01-31 19:24:11Z$ # Manage snapshots of GPFS volumes # # Desgined to be called from cron at :15 intervals # ################################################################## # Defaults, may be overridden by /usr/local/etc/snappshotter.conf # or file specified by "-c" CONF=/usr/local/etc/snappshotter.conf # config file, supersceded by "-c" option MINFREE=10 # minimum free space, in percent. # # Series of intervals and counts. Intervals expressed as the end-point in hours. # count = number of snapshots to keep per-interval ############## # time # ==== # :00-59 keep snapshots at 15 minute intervals; ceiling of interval = 1hr # 01-03:59 keep snapshots at 1hr interval; ceiling of interval = 4hr # 04-23:59 keep snapshots at 6hr intervals; ceiling of interval = 24hr # 24-47:59 keep snapshots at 12hr intervals; ceiling of interval = 48hr intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval # Note that the snapshots in interval (N+1) must be on a time interval # that corresponds to the snapshots kept in interval N. # # :00-59 keep snapshots divisible by 1/4hr: 00:00, 00:15, 00:30, 00:45, 01:00, 01:15 ... # 01-04:59 keep snapshots divisible by 4/4hr: 00:00, 01:00, 02:00, 03:00 ... # 05-23:59 keep snapshots divisible by 24/4hr: 00:00, 06:00, 12:00, 18:00 # 24-48:59 keep snapshots divisible by 48/4hr: 00:00, 12:00 # # ################################################################## TESTING="no" MMDF=/usr/lpp/mmfs/bin/mmdf MMCRSNAPSHOT=/usr/lpp/mmfs/bin/mmcrsnapshot MMLSSNAPSHOT=/usr/lpp/mmfs/bin/mmlssnapshot MMDELSNAPSHOT=/usr/lpp/mmfs/bin/mmdelsnapshot LOGGER="logger -p user.alert -t snapshotter" PATH="${PATH}:/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/local/sbin" # for access to 'ip' command, GPFS commands now=`date '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` secsINhr=$((60 * 60)) ##################################################################### usage() { cat - << E-O-USAGE 1>&2 $0 -- manage GPFS snapshots Create new GPFS snapshots and remove old snapshots. Options: -f filesystem required -- name of filesystem to snapshot -t testing test mode, report what would be done but perform no action -d "datestamp" test mode only; used supplied date stamp as if it was the current time. -c configfile use supplied configuration file in place of default: $CONF -L show license statement In test mode, the input data, in the same format as produced by "mmlssnap" must be supplied. This can be done on STDIN, as: $0 -t -f home -d "\`date --date "Dec 7 23:45"\`" < mmlssnap.data or $0 -t -f home -d "\`date --date "now +4hours"\`" < mmlssnap.data E-O-USAGE echo 1>&2 echo $1 1>&2 exit 1 } ##################################################################### license() { cat - << E-O-LICENSE Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania 3600 Market Street, Suite 380 Philadelphia, PA 19104 Web: http://www.rad.upenn.edu/sbia/ Email: sbia-software at uphs.upenn.edu SBIA Contribution and Software License Agreement ("Agreement") ============================================================== Version 1.0 (June 9, 2011) This Agreement covers contributions to and downloads from Software maintained by the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania ("SBIA"). Part A of this Agreement applies to contributions of software and/or data to the Software (including making revisions of or additions to code and/or data already in this Software). Part B of this Agreement applies to downloads of software and/or data from SBIA. Part C of this Agreement applies to all transactions with SBIA. If you distribute Software (as defined below) downloaded from SBIA, all of the paragraphs of Part B of this Agreement must be included with and apply to such Software. Your contribution of software and/or data to SBIA (including prior to the date of the first publication of this Agreement, each a "Contribution") and/or downloading, copying, modifying, displaying, distributing or use of any software and/or data from SBIA (collectively, the "Software") constitutes acceptance of all of the terms and conditions of this Agreement. If you do not agree to such terms and conditions, you have no right to contribute your Contribution, or to download, copy, modify, display, distribute or use the Software. PART A. CONTRIBUTION AGREEMENT - LICENSE TO SBIA WITH RIGHT TO SUBLICENSE ("CONTRIBUTION AGREEMENT"). ----------------------------------------------------------------------------------------------------- 1. As used in this Contribution Agreement, "you" means the individual contributing the Contribution to the Software maintained by SBIA and the institution or entity which employs or is otherwise affiliated with such individual in connection with such Contribution. 2. This Contribution Agreement applies to all Contributions made to the Software maintained by SBIA, including without limitation Contributions made prior to the date of first publication of this Agreement. If at any time you make a Contribution to the Software, you represent that (i) you are legally authorized and entitled to make such Contribution and to grant all licenses granted in this Contribution Agreement with respect to such Contribution; (ii) if your Contribution includes any patient data, all such data is de-identified in accordance with U.S. confidentiality and security laws and requirements, including but not limited to the Health Insurance Portability and Accountability Act (HIPAA) and its regulations, and your disclosure of such data for the purposes contemplated by this Agreement is properly authorized and in compliance with all applicable laws and regulations; and (iii) you have preserved in the Contribution all applicable attributions, copyright notices and licenses for any third party software or data included in the Contribution. 3. Except for the licenses granted in this Agreement, you reserve all right, title and interest in your Contribution. 4. You hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no charge, royalty-free, irrevocable license to use, reproduce, make derivative works of, display and distribute the Contribution. If your Contribution is protected by patent, you hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable license under your interest in patent rights covering the Contribution, to make, have made, use, sell and otherwise transfer your Contribution, alone or in combination with any other code. 5. You acknowledge and agree that SBIA may incorporate your Contribution into the Software and may make the Software available to members of the public on an open source basis under terms substantially in accordance with the Software License set forth in Part B of this Agreement. You further acknowledge and agree that SBIA shall have no liability arising in connection with claims resulting from your breach of any of the terms of this Agreement. 6. YOU WARRANT THAT TO THE BEST OF YOUR KNOWLEDGE YOUR CONTRIBUTION DOES NOT CONTAIN ANY CODE THAT REQUIRES OR PRESCRIBES AN "OPEN SOURCE LICENSE" FOR DERIVATIVE WORKS (by way of non-limiting example, the GNU General Public License or other so-called "reciprocal" license that requires any derived work to be licensed under the GNU General Public License or other "open source license"). PART B. DOWNLOADING AGREEMENT - LICENSE FROM SBIA WITH RIGHT TO SUBLICENSE ("SOFTWARE LICENSE"). ------------------------------------------------------------------------------------------------ 1. As used in this Software License, "you" means the individual downloading and/or using, reproducing, modifying, displaying and/or distributing the Software and the institution or entity which employs or is otherwise affiliated with such individual in connection therewith. The Section of Biomedical Image Analysis, Department of Radiology at the Universiy of Pennsylvania ("SBIA") hereby grants you, with right to sublicense, with respect to SBIA's rights in the software, and data, if any, which is the subject of this Software License (collectively, the "Software"), a royalty-free, non-exclusive license to use, reproduce, make derivative works of, display and distribute the Software, provided that: (a) you accept and adhere to all of the terms and conditions of this Software License; (b) in connection with any copy of or sublicense of all or any portion of the Software, all of the terms and conditions in this Software License shall appear in and shall apply to such copy and such sublicense, including without limitation all source and executable forms and on any user documentation, prefaced with the following words: "All or portions of this licensed product (such portions are the "Software") have been obtained under license from the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania and are subject to the following terms and conditions:" (c) you preserve and maintain all applicable attributions, copyright notices and licenses included in or applicable to the Software; (d) modified versions of the Software must be clearly identified and marked as such, and must not be misrepresented as being the original Software; and (e) you consider making, but are under no obligation to make, the source code of any of your modifications to the Software freely available to others on an open source basis. 2. The license granted in this Software License includes without limitation the right to (i) incorporate the Software into proprietary programs (subject to any restrictions applicable to such programs), (ii) add your own copyright statement to your modifications of the Software, and (iii) provide additional or different license terms and conditions in your sublicenses of modifications of the Software; provided that in each case your use, reproduction or distribution of such modifications otherwise complies with the conditions stated in this Software License. 3. This Software License does not grant any rights with respect to third party software, except those rights that SBIA has been authorized by a third party to grant to you, and accordingly you are solely responsible for (i) obtaining any permissions from third parties that you need to use, reproduce, make derivative works of, display and distribute the Software, and (ii) informing your sublicensees, including without limitation your end-users, of their obligations to secure any such required permissions. 4. The Software has been designed for research purposes only and has not been reviewed or approved by the Food and Drug Administration or by any other agency. YOU ACKNOWLEDGE AND AGREE THAT CLINICAL APPLICATIONS ARE NEITHER RECOMMENDED NOR ADVISED. Any commercialization of the Software is at the sole risk of the party or parties engaged in such commercialization. You further agree to use, reproduce, make derivative works of, display and distribute the Software in compliance with all applicable governmental laws, regulations and orders, including without limitation those relating to export and import control. 5. The Software is provided "AS IS" and neither SBIA nor any contributor to the software (each a "Contributor") shall have any obligation to provide maintenance, support, updates, enhancements or modifications thereto. SBIA AND ALL CONTRIBUTORS SPECIFICALLY DISCLAIM ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL SBIA OR ANY CONTRIBUTOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED TO THE SOFTWARE, EVEN IF SBIA OR ANY CONTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. TO THE MAXIMUM EXTENT NOT PROHIBITED BY LAW OR REGULATION, YOU FURTHER ASSUME ALL LIABILITY FOR YOUR USE, REPRODUCTION, MAKING OF DERIVATIVE WORKS, DISPLAY, LICENSE OR DISTRIBUTION OF THE SOFTWARE AND AGREE TO INDEMNIFY AND HOLD HARMLESS SBIA AND ALL CONTRIBUTORS FROM AND AGAINST ANY AND ALL CLAIMS, SUITS, ACTIONS, DEMANDS AND JUDGMENTS ARISING THEREFROM. 6. None of the names, logos or trademarks of SBIA or any of SBIA's affiliates or any of the Contributors, or any funding agency, may be used to endorse or promote products produced in whole or in part by operation of the Software or derived from or based on the Software without specific prior written permission from the applicable party. 7. Any use, reproduction or distribution of the Software which is not in accordance with this Software License shall automatically revoke all rights granted to you under this Software License and render Paragraphs 1 and 2 of this Software License null and void. 8. This Software License does not grant any rights in or to any intellectual property owned by SBIA or any Contributor except those rights expressly granted hereunder. PART C. MISCELLANEOUS --------------------- This Agreement shall be governed by and construed in accordance with the laws of The Commonwealth of Pennsylvania without regard to principles of conflicts of law. This Agreement shall supercede and replace any license terms that you may have agreed to previously with respect to Software from SBIA. E-O-LICENSE exit } ##################################################################### # Parse the command-line while [ "X$1" != "X" ] do case $1 in -L) license ;; -t) TESTING="yes" shift ;; -d) # Date stamp given...only valid in testing mode shift # Convert the user-supplied date to the YYYY_Mo_DD_HH:MM form, # throwing away the seconds UserDATE="$1" now=`date --date "$1" '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` shift ;; -c) shift CONF=$1 if [ ! -f $CONF ] ; then usage "Specified configuration file ($CONF) not found" fi shift ;; -f) shift filesys=$1 shift ;; *) usage "Unrecognized option: \"$1\"" ;; esac done ############## End of command line parsing LOCKFILE=/var/run/snapshotter.$filesys if [ -f $LOCKFILE ] ; then PIDs=`cat $LOCKFILE | tr "\012" " "` echo "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." 1>&2 $LOGGER "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." exit 1 else echo $$ > $LOCKFILE if [ $? != 0 ] ; then echo "Could not create lockfile $LOCKFILE for process $$. Exiting." 1>&2 $LOGGER "Could not create lockfile $LOCKFILE for process $$" exit 2 fi fi ######## Check sanity of user-supplied values if [ "X$filesys" = "X" ] ; then $LOGGER "Filesystem must be specified" usage "Filesystem must be specified" fi if [ $TESTING = "yes" ] ; then # testing mode: # accept faux filesystem argument # accept faux datestamp as arguments # read faux results from mmlssnapshot on STDIN # MMDF # # Do not really use mmdf executable, so that the testing can be # done outside a GPFS cluster Use a 2-digit random number 00 .. 99 # from $RANDOM, but fill the variable with dummy fields so the # the random number corresponds to field5, where it would be in # the mmdf output. MMDF="eval echo \(total\) f1 f2 f3 f4 \(\${RANDOM: -2:2}%\) " MMCRSNAPSHOT="echo mmcrsnapshot" MMDELSNAPSHOT="echo mmdelsnapshot" MMLSSNAPDATA=`cat - | tr "\012" "%"` MMLSSNAPSHOT="eval echo \$MMLSSNAPDATA|tr '%' '\012'" LOGGER="echo Log message: " else if [ "X$UserDATE" != "X" ] ; then $LOGGER "Option \"-d\" only valid in testing mode" usage "Option \"-d\" only valid in testing mode" fi /usr/lpp/mmfs/bin/mmlsfs $filesys -T 1> /dev/null 2>&1 if [ $? != 0 ] ; then $LOGGER "Error accessing GPFS filesystem: $filesys" echo "Error accessing GPFS filesystem: $filesys" 1>&2 rm -f $LOCKFILE exit 1 fi # Check if the node where this script is running is the GPFS manager node for the # specified filesystem manager=`/usr/lpp/mmfs/bin/mmlsmgr $filesys | grep -w "^$filesys" |awk '{print $2}'` ip addr list | grep -qw "$manager" if [ $? != 0 ] ; then # This node is not the manager...exit rm -f $LOCKFILE exit fi MMLSSNAPSHOT="$MMLSSNAPSHOT $filesys" fi # It is valid for the default config file not to exist, so check if # is there before sourcing it if [ -f $CONF ] ; then . $CONF $filesys # load variables found in $CONF, based on $filesys fi # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # Produce list of valid snapshot names (w/o header lines) snapnames=`$MMLSSNAPSHOT |grep Valid |sed -e '$d' -e 's/ .*//'` # get the number of existing snapshots snapcount=($snapnames) ; snapcount=${#snapcount[*]} ########################################################### # given a list of old snapshot names, in the form: # YYYY_Mo_DD_HH:MM # fill the buckets by time. A snapshot can only go # into one bucket! ########################################################### for oldsnap in $snapnames do oldstamp=`echo $oldsnap|sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` oldsecs=`date --date "$oldstamp" "+%s"` diff=$((nowsecs - oldsecs)) # difference in seconds between 'now' and old snapshot if [ $diff -lt 0 ] ; then # this can happen during testing...we have got a faux # snapshot date in the future...skip it continue fi index=0 prevbucket=0 filled=No while [ $index -lt ${#intervals[*]} -a $filled != "Yes" ] do bucket=${intervals[$index]} # ceiling for number of hours for this bucket (1 more than the number of # actual hours, ie., "7" means that the bucket can contain snapshots that are # at least 6:59 (hh:mm) old. count=${counts[$index]} # max number of items in this bucket bucketinterval=$(( bucket * ( secsINhr / count ) )) # Number of hours (in seconds) between snapshots that should be retained # for this bucket...convert from hrs (bucket/count) to seconds in order to deal with :15 minute intervals # Force the mathematical precedence to do (secsINhr / count) so that cases where count>bucket (like the first 1hr # that may have a count of 4 retained snapshots) doesn't result in the shell throwing away the fraction if [ $diff -ge $((prevbucket * secsINhr)) -a $diff -lt $((bucket * secsINhr)) ] ; then # We found the correct bucket filled=Yes ## printf "Checking if $oldsnap should be retained if it is multiple of $bucketinterval [ ($oldsecs %% $bucketinterval) = 0]" # Does the snapshot being examined fall on the interval determined above for the snapshots that should be retained? if [ $(( oldsecs % bucketinterval )) = 0 ] ; then # The hour of the old snapshot is evenly divisible by the number of snapshots that should be # retained in this interval...keep it tokeep="$tokeep $oldsnap" ## printf "...yes\n" else todelete="$todelete $oldsnap" ## printf "...no\n" fi prevbucket=$bucket fi index=$((index + 1)) done if [ $diff -ge $((bucket * secsINhr )) ] ; then filled=Yes # This is too old...schedule it for deletion $LOGGER "Scheduling old snapshot $oldsnap from $filesys for deletion" todelete="$todelete $oldsnap" fi # We should not get here if [ $filled != Yes ] ; then $LOGGER "Snapshot \"$oldsnap\" on $filesys does not match any intervals" fi done # Sort the lists to make reading the testing output easier todelete=`echo $todelete | tr " " "\012" | sort -bdfu` tokeep=`echo $tokeep | tr " " "\012" | sort -bdfu` ############################################################# for oldsnap in $todelete do if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $oldsnap # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate ## echo "Removing entry for \"$oldsnap\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$oldsnap [^%]*%/%/"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $oldsnap 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $oldsnap\": $output" 1>&2 $LOGGER "Error removing snapshot of $filesys with label \"$oldsnap\": $output" rm -f $LOCKFILE exit 1 else $LOGGER "successfully removed snapshot of $filesys with label \"$oldsnap\"" fi done ############# Now check for free space ####################################### # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # get the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` while [ $freenow -le $MINFREE -a $snapcount -gt 0 ] do # must delete some snapshots, from the oldest first... todelete=`$MMLSSNAPSHOT|grep Valid |sed -n -e 's/ .*//' -e '1p'` if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $todelete # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate and from $tokeep ## echo "Removing entry for \"$todelete\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$todelete [^%]*%/%/"` tokeep=`echo $tokeep | sed -e "s/^$todelete //" -e "s/ $todelete / /" -e "s/ $todelete$//" -e "s/^$todelete$//"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $todelete 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $todelete\": $output" 1>&2 $LOGGER "Low disk space (${freenow}%) triggered attempt to remove snapshot of $filesys with label \"$todelete\" -- Error: $output" rm -f $LOCKFILE exit 1 else $LOGGER "removed snapshot \"$todelete\" from $filesys because ${freenow}% free disk is less than ${MINFREE}%" fi # update the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` done if [ $snapcount = 0 -a $freenow -ge $MINFREE ] ; then echo "All existing snapshots removed on $filesys, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" 1>&2 $LOGGER "All existing snapshots on $filesys removed, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" rm -f $LOCKFILE exit 1 fi $LOGGER "Free disk space on $filesys (${freenow}%) above minimum required (${MINFREE}%) to create new snapshot" ############################################################## if [ $TESTING = "yes" ] ; then # List snapshots being kept for oldsnap in $tokeep do echo "Keeping snapshot $oldsnap" done fi ############################################################# # Now create the current snapshot...do this after deleting snaps in order to reduce the chance of running # out of disk space results=`$MMCRSNAPSHOT $filesys $now 2>&1 | tr "\012" "%"` if [ $? != 0 ] ; then printf "Error from \"$MMCRSNAPSHOT $filesys $now\":\n\t" 1>&2 echo $results | tr '%' '\012' 1>&2 results=`echo $results | tr '%' '\012'` $LOGGER "Error creating snapshot of $filesys with label $now: \"$results\"" rm -f $LOCKFILE exit 1 else $LOGGER "successfully created snapshot of $filesys with label $now" fi rm -f $LOCKFILE From Jez.Tucker at rushes.co.uk Thu Feb 7 12:28:16 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 7 Feb 2013 12:28:16 +0000 Subject: [gpfsug-discuss] SOBAR Message-ID: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Hey all Is anyone using the SOBAR method of backing up the metadata and NSD configs? If so, how is your experience? >From reading the docs, it seems a bit odd that on restoration you have to re-init the FS and recall all the data. If so, what's the point of SOBAR? --- Jez Tucker Senior Sysadmin Rushes http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Thu Feb 7 12:47:30 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 12:47:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <5113A262.8080206@ed.ac.uk> On 07/02/13 12:28, Jez Tucker wrote: > Hey all > > Is anyone using the SOBAR method of backing up the metadata and NSD > configs? > > If so, how is your experience? > > From reading the docs, it seems a bit odd that on restoration you have > to re-init the FS and recall all the data. > > If so, what?s the point of SOBAR? Ooh - this is new. From first glance, it looks to be a DR solution? We're actually in the process of engineering our own DR solution based on a not-dissimilar concept: - build a second GPFS file system off-site, with HSM enabled (called "dr-fs" here) - each night, rsync the changed data from "prod-fs" to "dr-fs" - each day, migrate data from the disk pool in "dr-fs" to the tape pool to free up sufficient capacity for the next night's rsync You have a complete copy of the filesystem metadata from "prod-fs" on "dr-fs", so it looks (to a user) identical, but on "dr-fs" some of the ("older") data is on tape (ratios dependent on sizing of disk vs tape pools, of course). In the event of a disaster, you just flip over to "dr-fs". From the quick glance at SOBAR, it looks to me like the concept is that you don't have a separate file system, but you hold a secondary copy in TSM via the premigrate function, and store the filesystem metadata as a flat file dump backed up "in the normal way". In DR, you rebuild the FS from the metadata backup, and re-attach the HSM pool to this newly-restored filesystem, (and then start pushing the data back out of the HSM pool into the GPFS disk pool). As soon as the HSM pool is re-attached, users can start getting their data (as fast as TSM can give it to them), and the filesystem will look "normal" to them (albeit slow, if recalling from tape). Nice - good to see this kind of thing coming from IBM - restore of huge filesystems from traditional backup really doesn't make much sense nowadays - it'd just take too long. This kind of approach doesn't necessarily accelerate the overall time to restore, but it allows for a usable filesystem to be made available while the restore happens in the background. I'd look for clarity about the state of the filesystem on restore - particularly around what happens to data which arrives after the migration has happened but before the metadata snapshot is taken. I think it'd be lost, but the metadata would still point to it existing? Might get confusing... Just my 2 cents from a quick skim read mind - plus a whole bunch of thinking we've done on this subject recently :) -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Thu Feb 7 13:40:30 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 07 Feb 2013 13:40:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113A262.8080206@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> Message-ID: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: [SNIP] > Nice - good to see this kind of thing coming from IBM - restore of huge > filesystems from traditional backup really doesn't make much sense > nowadays - it'd just take too long. Define too long? It's perfectly doable, and the speed of the restore will depend on what resources you have to throw at the problem. The main issue is having lots of tape drives for the restore. Having a plan to buy more ASAP is a good idea. The second is don't let yourself get sidetracked doing "high priority" restores for individuals, it will radically delay the restore. Beyond that you need some way to recreate all your storage pools, filesets, junction points and quotas etc. Looks like the mmbackupconfig and mmrestoreconfig now take care of all that for you. That is a big time saver right there. > This kind of approach doesn't > necessarily accelerate the overall time to restore, but it allows for a > usable filesystem to be made available while the restore happens in the > background. > The problem is that your tape drives will go crazy with HSM activity. So while in theory it is usable it practice it won't be. Worse with the tape drives going crazy with the HSM they won't be available for restore. I would predict much much long times to recovery where recovery is defined as being back to where you where before the disaster occurred. > > I'd look for clarity about the state of the filesystem on restore - > particularly around what happens to data which arrives after the > migration has happened but before the metadata snapshot is taken. I > think it'd be lost, but the metadata would still point to it existing? I would imagine that you just do a standard HSM reconciliation to fix that. Should be really fast with the new policy based reconciliation after you spend several months backing all your HSM'ed files up again :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From orlando.richards at ed.ac.uk Thu Feb 7 13:51:25 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:51:25 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> Message-ID: <5113B15D.4080805@ed.ac.uk> On 07/02/13 13:40, Jonathan Buzzard wrote: > On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > > [SNIP] > >> Nice - good to see this kind of thing coming from IBM - restore of huge >> filesystems from traditional backup really doesn't make much sense >> nowadays - it'd just take too long. > > Define too long? It's perfectly doable, and the speed of the restore > will depend on what resources you have to throw at the problem. The main > issue is having lots of tape drives for the restore. I can tell you speak from (bitter?) experience :) I've always been "disappointed" with the speed of restores - but I've never tried a "restore everything", which presumably will run quicker. One problem I can see us having is that we have lots of small files, which tends to make everything go really slowly - but getting the thread count up would, I'm sure, help a lot. > Having a plan to > buy more ASAP is a good idea. The second is don't let yourself get > sidetracked doing "high priority" restores for individuals, it will > radically delay the restore. Quite. > Beyond that you need some way to recreate all your storage pools, > filesets, junction points and quotas etc. Looks like the mmbackupconfig > and mmrestoreconfig now take care of all that for you. That is a big > time saver right there. > >> This kind of approach doesn't >> necessarily accelerate the overall time to restore, but it allows for a >> usable filesystem to be made available while the restore happens in the >> background. >> > > The problem is that your tape drives will go crazy with HSM activity. So > while in theory it is usable it practice it won't be. Worse with the > tape drives going crazy with the HSM they won't be available for > restore. I would predict much much long times to recovery where recovery > is defined as being back to where you where before the disaster > occurred. Yup - I can see that too. I think a large disk pool would help there, along with some kind of logic around "what data is old?" to sensibly place stuff "likely to be accessed" on disk, and the "old" stuff on tape where it can be recalled at a more leisurely pace. >> >> I'd look for clarity about the state of the filesystem on restore - >> particularly around what happens to data which arrives after the >> migration has happened but before the metadata snapshot is taken. I >> think it'd be lost, but the metadata would still point to it existing? > > I would imagine that you just do a standard HSM reconciliation to fix > that. Should be really fast with the new policy based reconciliation > after you spend several months backing all your HSM'ed files up > again :-) > Ahh - but once you've got them in TSM, you can just do a storage pool backup, presumably to a third site, and always have lots of copies everywhere! Of course - you still need to keep generational history somewhere... -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Thu Feb 7 13:56:05 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:56:05 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B15D.4080805@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> Message-ID: <5113B275.7030401@ed.ac.uk> On 07/02/13 13:51, Orlando Richards wrote: > On 07/02/13 13:40, Jonathan Buzzard wrote: >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: >> >> [SNIP] >> >>> Nice - good to see this kind of thing coming from IBM - restore of huge >>> filesystems from traditional backup really doesn't make much sense >>> nowadays - it'd just take too long. >> >> Define too long? Oh - for us, this is rapidly approaching "anything more than a day, and can you do it faster than that please". Not much appetite for the costs of full replication though. :/ -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Fri Feb 8 09:40:27 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 08 Feb 2013 09:40:27 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B275.7030401@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> <5113B275.7030401@ed.ac.uk> Message-ID: <1360316427.16393.23.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 13:56 +0000, Orlando Richards wrote: > On 07/02/13 13:51, Orlando Richards wrote: > > On 07/02/13 13:40, Jonathan Buzzard wrote: > >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > >> > >> [SNIP] > >> > >>> Nice - good to see this kind of thing coming from IBM - restore of huge > >>> filesystems from traditional backup really doesn't make much sense > >>> nowadays - it'd just take too long. > >> > >> Define too long? > > I can tell you speak from (bitter?) experience :) Done two large GPFS restores. The first was to migrate a HSM file system to completely new hardware, new TSM version and new GPFS version. IBM would not warrant an upgrade procedure so we "restored" from tape onto the new hardware and then did rsync's to get it "identical". Big problem was the TSM server hardware at the time (a p630) just gave up the ghost about 5TB into the restore repeatedly. Had do it a user at a time which made it take *much* longer as I was repeatedly going over the same tapes. The second was from bitter experience. Someone else in a moment of complete and utter stupidity wiped some ~30 NSD's of their descriptors. Two file systems an instant and complete loss. Well not strictly true it was several days before it manifested itself when one of the NSD servers was rebooted. A day was then wasted working out what the hell had happened to the file system that could have gone to the restore. Took about three weeks to get back completely. Could have been done a lot lot faster if I had had more tape drives on day one and/or made a better job of getting more in, had not messed about prioritizing restores of particular individuals, and not had capacity issues on the TSM server to boot (it was scheduled for upgrade anyway and a CPU failed mid restore). I think TSM 6.x would have been faster as well as it has faster DB performance, and the restore consisted of some 50 million files in about 30TB and it was the number of files that was the killer for speed. It would be nice in a disaster scenario if TSM would also use the tapes in the copy pools for restore, especially when they are in a different library. Not sure if the automatic failover procedure in 6.3 does that. For large file systems I would seriously consider using virtual mount points in TSM and then collocating the file systems. I would also look to match my virtual mount points to file sets. The basic problem is that most people don't have the spare hardware to even try disaster recovery, and even then you are not going to be doing it under the same pressure, hindsight is always 20/20. > Oh - for us, this is rapidly approaching "anything more than a day, and > can you do it faster than that please". Not much appetite for the costs > of full replication though. > Remember you can have any two of cheap, fast and reliable. If you want it back in a day or less then that almost certainly requires a full mirror and is going to be expensive. Noting of course if it ain't offline it ain't backed up. See above if some numpty can wipe the NSD descriptors on your file systems then can do it to your replicated file system at the same time. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jez.Tucker at rushes.co.uk Fri Feb 8 13:17:03 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 8 Feb 2013 13:17:03 +0000 Subject: [gpfsug-discuss] Maximum number of files in a TSM dsmc archive filelist Message-ID: <39571EA9316BE44899D59C7A640C13F5306E9570@WARVWEXC1.uk.deluxe-eu.com> Allo I'm doing an archive with 1954846 files in a filelist. SEGV every time. (BA 6.4.0-0) Am I being optimistic with that number of files? Has anyone successfully done that many in a single archive? --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:29:12 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:29:12 -0600 Subject: [gpfsug-discuss] File system recovery question Message-ID: <20130213162912.GA22701@logos.ncsa.illinois.edu> I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. Executing od shows it is zero's. (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0040000 120070 156006 120070 156006 120070 156006 120070 156006 Dumping the header of one of the other disks shows read data for the other NSD's in that file system. (! 25)-> mmlsnsd -m | grep dh1_vd05_005 Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node (! 27)-> mmnsddiscover -d dh1_vd05_005 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... myhost: Rediscovery failed for dh1_vd05_005. mmnsddiscover: Finished. Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic Is there any way to repair the header on the NSD? Thanks for any ideas! Chad From Jez.Tucker at rushes.co.uk Wed Feb 13 16:43:50 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Wed, 13 Feb 2013 16:43:50 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> So, er. Fun. I checked our disks. 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0001000 Looks like you lost a fair bit. Presumably you don't have replication of 2? If so, I think you could just lose the NSD. Failing that: 1) Check your other disks and see if there's anything that you can figure out. Though TBH, this may take forever. 2) Restore 3) Call IBM and log a SEV 1. 3) then 2) is probably the best course of action Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Chad Kerner > Sent: 13 February 2013 16:29 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] File system recovery question > > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. I > can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for > the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a > while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > mmcommon preunmount invoked. File system: desarchive Reason: > SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From craigawilson at gmail.com Wed Feb 13 16:48:32 2013 From: craigawilson at gmail.com (Craig Wilson) Date: Wed, 13 Feb 2013 16:48:32 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> Message-ID: Dealt with a similar issue a couple of months ago. In that case the data was fine but two of the descriptors were over written. You can use "mmfsadm test readdescraw /dev/$drive" to see the descriptors, we managed to recover the disk but only after logging it to IBM and manually rebuilding the descriptor. -CW On 13 February 2013 16:43, Jez Tucker wrote: > So, er. Fun. > > I checked our disks. > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0001000 > > Looks like you lost a fair bit. > > > Presumably you don't have replication of 2? > If so, I think you could just lose the NSD. > > Failing that: > > 1) Check your other disks and see if there's anything that you can figure > out. Though TBH, this may take forever. > 2) Restore > 3) Call IBM and log a SEV 1. > > 3) then 2) is probably the best course of action > > Jez > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Chad Kerner > > Sent: 13 February 2013 16:29 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] File system recovery question > > > > I have a file system, and it appears that someone dd'd over the first > > part of one of the NSD's with zero's. I see the device in multipath. I > > can fdisk and dd the device out. > > > > Executing od shows it is zero's. > > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > > * > > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > > > Dumping the header of one of the other disks shows read data for > > the other NSD's in that file system. > > > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > > Disk name NSD volume ID Device Node name > > Remarks > > > --------------------------------------------------------------------------------------- > > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > > node > > > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > > mmnsddiscover: Attempting to rediscover the disks. This may take a > > while ... > > myhost: Rediscovery failed for dh1_vd05_005. > > mmnsddiscover: Finished. > > > > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > > volume dh1_vd05_005. > > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > > desarchive. > > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > > mmcommon preunmount invoked. File system: desarchive Reason: > > SGPanic > > > > Is there any way to repair the header on the NSD? > > > > Thanks for any ideas! > > Chad > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Feb 13 16:48:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:48:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> So what do you get if you run: mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 ? Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:29, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:52:30 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:52:30 -0600 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> Message-ID: <20130213165230.GA23294@logos.ncsa.illinois.edu> (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: > So what do you get if you run: > > mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > > ? > > > > > Vic Cornell > viccornell at gmail.com > > > On 13 Feb 2013, at 16:29, Chad Kerner wrote: > > > I have a file system, and it appears that someone dd'd over the first part > of one of the NSD's with zero's. I see the device in multipath. I can > fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other > NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. > Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the > system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: > desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > From viccornell at gmail.com Wed Feb 13 16:57:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:57:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213165230.GA23294@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> <20130213165230.GA23294@logos.ncsa.illinois.edu> Message-ID: <4D043736-06A7-44A0-830E-63D66438595F@gmail.com> Thats not pretty - but you can push the NSD descriptor on with something like: tspreparedisk -F -n /dev/mapper/dh1_vd05_005 -p 8D8EEA98506C69CE That leaves you with the FS and Disk descriptors to recover . . . . Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:52, Chad Kerner wrote: > > > (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 > No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 > No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 > > > > On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: >> So what do you get if you run: >> >> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 >> >> ? >> >> >> >> >> Vic Cornell >> viccornell at gmail.com >> >> >> On 13 Feb 2013, at 16:29, Chad Kerner wrote: >> >> >> I have a file system, and it appears that someone dd'd over the first part >> of one of the NSD's with zero's. I see the device in multipath. I can >> fdisk and dd the device out. >> >> Executing od shows it is zero's. >> (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 >> 0000000 000000 000000 000000 000000 000000 000000 000000 000000 >> * >> 0040000 120070 156006 120070 156006 120070 156006 120070 156006 >> >> Dumping the header of one of the other disks shows read data for the other >> NSD's in that file system. >> >> (! 25)-> mmlsnsd -m | grep dh1_vd05_005 >> Disk name NSD volume ID Device Node name >> Remarks >> --------------------------------------------------------------------------------------- >> dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server >> node >> >> (! 27)-> mmnsddiscover -d dh1_vd05_005 >> mmnsddiscover: Attempting to rediscover the disks. This may take a while >> ... >> myhost: Rediscovery failed for dh1_vd05_005. >> mmnsddiscover: Finished. >> >> >> Wed Feb 13 09:14:03.694 2013: Command: mount desarchive >> Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. >> Physical volume dh1_vd05_005. >> Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the >> system with return code 5 reason code 0 >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive >> Wed Feb 13 09:14:07.104 2013: Input/output error >> Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: >> desarchive Reason: SGPanic >> >> Is there any way to repair the header on the NSD? >> >> Thanks for any ideas! >> Chad >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> From jonathan at buzzard.me.uk Wed Feb 13 17:00:31 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 13 Feb 2013 17:00:31 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <1360774831.23342.9.camel@buzzard.phy.strath.ac.uk> On Wed, 2013-02-13 at 10:29 -0600, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. > I can fdisk and dd the device out. Log a SEV1 call with IBM. If it is only one NSD that is stuffed they might be able to get it back for you. However it is a custom procedure that requires developer time from Poughkeepsie. It will take some time. In the meantime I would strongly encourage you to start preparing for a total restore, which will include recreating the file system from scratch. Certainly if all the NSD headers are stuffed then the file system is a total loss. However even with only one lost it is not as I understand it certain you can get it back. It is probably a good idea to store the NSD headers somewhere off the file system in case some numpty wipes them. The most likely reason for this is that they ran an distro install on a system that has direct access to the disk. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Tobias.Kuebler at sva.de Wed Feb 13 17:00:37 2013 From: Tobias.Kuebler at sva.de (Tobias.Kuebler at sva.de) Date: Wed, 13 Feb 2013 18:00:37 +0100 Subject: [gpfsug-discuss] =?iso-8859-1?q?AUTO=3A_Tobias_Kuebler_ist_au=DFe?= =?iso-8859-1?q?r_Haus_=28R=FCckkehr_am_02/18/2013=29?= Message-ID: Ich bin bis 02/18/2013 abwesend. Vielen Dank f?r Ihre Nachricht. Ankommende E-Mails werden w?hrend meiner Abwesenheit nicht weitergeleitet, ich versuche Sie jedoch m?glichst rasch nach meiner R?ckkehr zu beantworten. In dringenden F?llen wenden Sie sich bitte an Ihren zust?ndigen Vertriebsbeauftragten. Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re: [gpfsug-discuss] File system recovery question" gesendet am 13.02.2013 17:43:50. Diese ist die einzige Benachrichtigung, die Sie empfangen werden, w?hrend diese Person abwesend ist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Thu Feb 28 17:25:26 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 28 Feb 2013 17:25:26 +0000 Subject: [gpfsug-discuss] Who uses TSM to archive HSMd data (inline) ? Message-ID: <39571EA9316BE44899D59C7A640C13F5306EED70@WARVWEXC1.uk.deluxe-eu.com> Hello all, I have to ask Does anyone else do this? We have a problem and I'm told that "it's so rare that anyone would archive data which is HSMd". I.E. to create an archive whereby a project is entirely or partially HSMd to LTO - online data is archived to tape - offline data is copied from HSM tape to archive tape 'inline' Surely nobody pulls back all their data to disk before re-archiving back to tape? --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 6 18:38:56 2013 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 6 Feb 2013 13:38:56 -0500 (EST) Subject: [gpfsug-discuss] GPFS snapshot cron job Message-ID: I'm new on this list. It looks like it can be useful for exchanging GPFS experiences. We have been running GPFS for a couple of years now on one cluster and are in process of bringing it up on a couple of other clusters. One thing we would like, but have not had time to do is automatic snapshots similar to what NetApp does. For our purposes a cron job that ran every 4 hours that creates a new snapshot and removes older snapshots would be sufficient. The slightly hard task is correctly removing the older snapshots. Does anyone have such a cron script they can share? Or did I miss something in GPFS that handles automatic snapshots? Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From pete at realisestudio.com Wed Feb 6 19:28:40 2013 From: pete at realisestudio.com (Pete Smith) Date: Wed, 6 Feb 2013 19:28:40 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: Hi rsnapshot is probably what you're looking for. :-) On 6 Feb 2013 18:39, "Stuart Barkley" wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Feb 6 19:40:49 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 06 Feb 2013 19:40:49 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: <5112B1C1.3080403@buzzard.me.uk> On 06/02/13 18:38, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > Find attached a Perl script that does just what you want with a range of configurable parameters. It is intended to create snapshots that work with the Samba VFS module shadow_copy2 so that you can have a previous versions facility on your Windows boxes. Note it creates a "quiescent" lock that interacted with another script that was called to do a policy based tiering from fast disks to slow disks. That gets called based on a trigger for a percentage of the fast disk pool being full, and consequently can get called at any time. If the tiering is running then trying to take a snapshot at the same time will lead to race conditions and the file system will deadlock. Note that if you are creating snapshots in the background then a whole range of GPFS commands if run at the moment the snapshot is being created or deleted will lead to deadlocks. > Or did I miss something in GPFS that handles automatic snapshots? Yeah what you missed is that it will randomly lock your file system up. So while the script I have attached is all singing and all dancing. It has never stayed in production for very long. On a test file system that has little activity it runs for months without a hitch. When rolled out on busy file systems with in a few days we would a deadlock waiting for some file system quiescent state and everything would grind to a shuddering halt. Sometimes on creating the snapshot and sometimes on deleting them. Unless there has been a radical change in GPFS in the last few months, you cannot realistically do what you want. IBM's response was that you should not be taking snapshots or deleting old ones while the file system is "busy". Not that I would have thought the file system would have been that "busy" at 07:00 on a Saturday morning, but hey. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. -------------- next part -------------- A non-text attachment was scrubbed... Name: shadowcopy.pl Type: application/x-perl Size: 5787 bytes Desc: not available URL: From erich at uw.edu Wed Feb 6 19:45:05 2013 From: erich at uw.edu (Eric Horst) Date: Wed, 6 Feb 2013 11:45:05 -0800 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: It's easy if you use a chronologically sortable naming scheme. We use YYYY-MM-DD-hhmmss. This is a modified excerpt from the bash script I use. The prune function takes an arg of the number of snapshots to keep. SNAPROOT=/grfs/ud00/.snapshots function prune () { PCPY=$1 for s in $(/bin/ls -d "$SNAPROOT"/????-??-??-?????? | head --lines=-$PCPY); do mmdelsnapshot $FSNAME $s if [ $? != 0 ]; then echo ERROR: there was a mmdelsnapshot problem $? exit else echo Success fi done } echo Pruning snapshots prune 12 -Eric On Wed, Feb 6, 2013 at 10:38 AM, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bergman at panix.com Wed Feb 6 21:28:30 2013 From: bergman at panix.com (bergman at panix.com) Date: Wed, 06 Feb 2013 16:28:30 -0500 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: Your message of "Wed, 06 Feb 2013 13:38:56 EST." References: Message-ID: <20647.1360186110@localhost> In the message dated: Wed, 06 Feb 2013 13:38:56 -0500, The pithy ruminations from Stuart Barkley on <[gpfsug-discuss] GPFS snapshot cron job> were: => I'm new on this list. It looks like it can be useful for exchanging => GPFS experiences. => => We have been running GPFS for a couple of years now on one cluster and => are in process of bringing it up on a couple of other clusters. => => One thing we would like, but have not had time to do is automatic => snapshots similar to what NetApp does. For our purposes a cron job => that ran every 4 hours that creates a new snapshot and removes older => snapshots would be sufficient. The slightly hard task is correctly => removing the older snapshots. => => Does anyone have such a cron script they can share? Yes. I've attached the script that we run from cron. Our goal was to keep a decaying set of snapshots over a fairly long time period, so that users would be able to recover from "rm", while not using excessive space. Snapshots are named with timestamp, making it slightly easier to understand what data they contain and the remove the older ones. The cron job runs every 15 minutes on every GPFS server node, but checks if it is executing on the node that is the manager for the specified filesystem to avoid concurrency issues. The script will avoid making a snapshot if there isn't sufficient disk space. Our config file to manage snapshots is: ------------ CUT HERE -- CUT HERE -------------- case $1 in home) intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval MINFREE=5 # minimum free disk space, in percent ;; shared) intervals=(1 4 48) # hour number of each interval counts=(4 2 2) # max number of snapshots to keep per each interval MINFREE=20 # minimum free disk space, in percent ;; esac ------------ CUT HERE -- CUT HERE -------------- For the "home" filesystem, this says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 4 snapshots made in the most recent 4 hr interval (1 for each hour) keep 4 snapshots made in the most recent 24 hr interval (1 each 6hrs) keep 4 snapshots made in the most recent 48 hr interval (1 each 12 hrs) For the "shared" filesystem, the configuration says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 2 snapshots made in the most recent 4 hr interval (1 each 2 hours) keep 2 snapshots made in the most recent 48 hr interval (1 each 24 hrs) Those intervals "overlap", so there are a lot of recent snapshots, and fewer older ones. Each time a snapshot is made, older snapshots may be removed. So, at 5:01 PM on Thursday, there may be snapshots of the "home" filesystem from: 17:00 Thursday ---+-- 4 in the last hour 16:45 Thursday | 16:30 Thursday | 16:15 Thursday -- + 16:00 Thursday ---+-- 4 in the last 4 hours, including 15:00 Thursday | the 5:00PM Thursday snapshot 14:00 Thursday ---+ 11:00 Thursday ---+-- 4 in the last 24 hours, including 05:00 Thursday | 17:00 Thursday 23:00 Wednesday ---+ 17:00 Wednesday ---+-- 4 @ 12-hr intervals in the last 48 hours, 05:00 Wednesday ---+ including 17:00 & 05:00 Thursday Suggestions and patches are welcome. => => Or did I miss something in GPFS that handles automatic snapshots? We have seen periodic slowdowns when snapshots are running, but nothing to the extent described by Jonathan Buzzard. Mark => => Thanks, => Stuart Barkley => -- => I've never been lost; I was once bewildered for three days, but never lost! => -- Daniel Boone -------------- next part -------------- #! /bin/bash #$Id: snapshotter 858 2012-01-31 19:24:11Z$ # Manage snapshots of GPFS volumes # # Desgined to be called from cron at :15 intervals # ################################################################## # Defaults, may be overridden by /usr/local/etc/snappshotter.conf # or file specified by "-c" CONF=/usr/local/etc/snappshotter.conf # config file, supersceded by "-c" option MINFREE=10 # minimum free space, in percent. # # Series of intervals and counts. Intervals expressed as the end-point in hours. # count = number of snapshots to keep per-interval ############## # time # ==== # :00-59 keep snapshots at 15 minute intervals; ceiling of interval = 1hr # 01-03:59 keep snapshots at 1hr interval; ceiling of interval = 4hr # 04-23:59 keep snapshots at 6hr intervals; ceiling of interval = 24hr # 24-47:59 keep snapshots at 12hr intervals; ceiling of interval = 48hr intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval # Note that the snapshots in interval (N+1) must be on a time interval # that corresponds to the snapshots kept in interval N. # # :00-59 keep snapshots divisible by 1/4hr: 00:00, 00:15, 00:30, 00:45, 01:00, 01:15 ... # 01-04:59 keep snapshots divisible by 4/4hr: 00:00, 01:00, 02:00, 03:00 ... # 05-23:59 keep snapshots divisible by 24/4hr: 00:00, 06:00, 12:00, 18:00 # 24-48:59 keep snapshots divisible by 48/4hr: 00:00, 12:00 # # ################################################################## TESTING="no" MMDF=/usr/lpp/mmfs/bin/mmdf MMCRSNAPSHOT=/usr/lpp/mmfs/bin/mmcrsnapshot MMLSSNAPSHOT=/usr/lpp/mmfs/bin/mmlssnapshot MMDELSNAPSHOT=/usr/lpp/mmfs/bin/mmdelsnapshot LOGGER="logger -p user.alert -t snapshotter" PATH="${PATH}:/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/local/sbin" # for access to 'ip' command, GPFS commands now=`date '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` secsINhr=$((60 * 60)) ##################################################################### usage() { cat - << E-O-USAGE 1>&2 $0 -- manage GPFS snapshots Create new GPFS snapshots and remove old snapshots. Options: -f filesystem required -- name of filesystem to snapshot -t testing test mode, report what would be done but perform no action -d "datestamp" test mode only; used supplied date stamp as if it was the current time. -c configfile use supplied configuration file in place of default: $CONF -L show license statement In test mode, the input data, in the same format as produced by "mmlssnap" must be supplied. This can be done on STDIN, as: $0 -t -f home -d "\`date --date "Dec 7 23:45"\`" < mmlssnap.data or $0 -t -f home -d "\`date --date "now +4hours"\`" < mmlssnap.data E-O-USAGE echo 1>&2 echo $1 1>&2 exit 1 } ##################################################################### license() { cat - << E-O-LICENSE Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania 3600 Market Street, Suite 380 Philadelphia, PA 19104 Web: http://www.rad.upenn.edu/sbia/ Email: sbia-software at uphs.upenn.edu SBIA Contribution and Software License Agreement ("Agreement") ============================================================== Version 1.0 (June 9, 2011) This Agreement covers contributions to and downloads from Software maintained by the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania ("SBIA"). Part A of this Agreement applies to contributions of software and/or data to the Software (including making revisions of or additions to code and/or data already in this Software). Part B of this Agreement applies to downloads of software and/or data from SBIA. Part C of this Agreement applies to all transactions with SBIA. If you distribute Software (as defined below) downloaded from SBIA, all of the paragraphs of Part B of this Agreement must be included with and apply to such Software. Your contribution of software and/or data to SBIA (including prior to the date of the first publication of this Agreement, each a "Contribution") and/or downloading, copying, modifying, displaying, distributing or use of any software and/or data from SBIA (collectively, the "Software") constitutes acceptance of all of the terms and conditions of this Agreement. If you do not agree to such terms and conditions, you have no right to contribute your Contribution, or to download, copy, modify, display, distribute or use the Software. PART A. CONTRIBUTION AGREEMENT - LICENSE TO SBIA WITH RIGHT TO SUBLICENSE ("CONTRIBUTION AGREEMENT"). ----------------------------------------------------------------------------------------------------- 1. As used in this Contribution Agreement, "you" means the individual contributing the Contribution to the Software maintained by SBIA and the institution or entity which employs or is otherwise affiliated with such individual in connection with such Contribution. 2. This Contribution Agreement applies to all Contributions made to the Software maintained by SBIA, including without limitation Contributions made prior to the date of first publication of this Agreement. If at any time you make a Contribution to the Software, you represent that (i) you are legally authorized and entitled to make such Contribution and to grant all licenses granted in this Contribution Agreement with respect to such Contribution; (ii) if your Contribution includes any patient data, all such data is de-identified in accordance with U.S. confidentiality and security laws and requirements, including but not limited to the Health Insurance Portability and Accountability Act (HIPAA) and its regulations, and your disclosure of such data for the purposes contemplated by this Agreement is properly authorized and in compliance with all applicable laws and regulations; and (iii) you have preserved in the Contribution all applicable attributions, copyright notices and licenses for any third party software or data included in the Contribution. 3. Except for the licenses granted in this Agreement, you reserve all right, title and interest in your Contribution. 4. You hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no charge, royalty-free, irrevocable license to use, reproduce, make derivative works of, display and distribute the Contribution. If your Contribution is protected by patent, you hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable license under your interest in patent rights covering the Contribution, to make, have made, use, sell and otherwise transfer your Contribution, alone or in combination with any other code. 5. You acknowledge and agree that SBIA may incorporate your Contribution into the Software and may make the Software available to members of the public on an open source basis under terms substantially in accordance with the Software License set forth in Part B of this Agreement. You further acknowledge and agree that SBIA shall have no liability arising in connection with claims resulting from your breach of any of the terms of this Agreement. 6. YOU WARRANT THAT TO THE BEST OF YOUR KNOWLEDGE YOUR CONTRIBUTION DOES NOT CONTAIN ANY CODE THAT REQUIRES OR PRESCRIBES AN "OPEN SOURCE LICENSE" FOR DERIVATIVE WORKS (by way of non-limiting example, the GNU General Public License or other so-called "reciprocal" license that requires any derived work to be licensed under the GNU General Public License or other "open source license"). PART B. DOWNLOADING AGREEMENT - LICENSE FROM SBIA WITH RIGHT TO SUBLICENSE ("SOFTWARE LICENSE"). ------------------------------------------------------------------------------------------------ 1. As used in this Software License, "you" means the individual downloading and/or using, reproducing, modifying, displaying and/or distributing the Software and the institution or entity which employs or is otherwise affiliated with such individual in connection therewith. The Section of Biomedical Image Analysis, Department of Radiology at the Universiy of Pennsylvania ("SBIA") hereby grants you, with right to sublicense, with respect to SBIA's rights in the software, and data, if any, which is the subject of this Software License (collectively, the "Software"), a royalty-free, non-exclusive license to use, reproduce, make derivative works of, display and distribute the Software, provided that: (a) you accept and adhere to all of the terms and conditions of this Software License; (b) in connection with any copy of or sublicense of all or any portion of the Software, all of the terms and conditions in this Software License shall appear in and shall apply to such copy and such sublicense, including without limitation all source and executable forms and on any user documentation, prefaced with the following words: "All or portions of this licensed product (such portions are the "Software") have been obtained under license from the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania and are subject to the following terms and conditions:" (c) you preserve and maintain all applicable attributions, copyright notices and licenses included in or applicable to the Software; (d) modified versions of the Software must be clearly identified and marked as such, and must not be misrepresented as being the original Software; and (e) you consider making, but are under no obligation to make, the source code of any of your modifications to the Software freely available to others on an open source basis. 2. The license granted in this Software License includes without limitation the right to (i) incorporate the Software into proprietary programs (subject to any restrictions applicable to such programs), (ii) add your own copyright statement to your modifications of the Software, and (iii) provide additional or different license terms and conditions in your sublicenses of modifications of the Software; provided that in each case your use, reproduction or distribution of such modifications otherwise complies with the conditions stated in this Software License. 3. This Software License does not grant any rights with respect to third party software, except those rights that SBIA has been authorized by a third party to grant to you, and accordingly you are solely responsible for (i) obtaining any permissions from third parties that you need to use, reproduce, make derivative works of, display and distribute the Software, and (ii) informing your sublicensees, including without limitation your end-users, of their obligations to secure any such required permissions. 4. The Software has been designed for research purposes only and has not been reviewed or approved by the Food and Drug Administration or by any other agency. YOU ACKNOWLEDGE AND AGREE THAT CLINICAL APPLICATIONS ARE NEITHER RECOMMENDED NOR ADVISED. Any commercialization of the Software is at the sole risk of the party or parties engaged in such commercialization. You further agree to use, reproduce, make derivative works of, display and distribute the Software in compliance with all applicable governmental laws, regulations and orders, including without limitation those relating to export and import control. 5. The Software is provided "AS IS" and neither SBIA nor any contributor to the software (each a "Contributor") shall have any obligation to provide maintenance, support, updates, enhancements or modifications thereto. SBIA AND ALL CONTRIBUTORS SPECIFICALLY DISCLAIM ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL SBIA OR ANY CONTRIBUTOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED TO THE SOFTWARE, EVEN IF SBIA OR ANY CONTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. TO THE MAXIMUM EXTENT NOT PROHIBITED BY LAW OR REGULATION, YOU FURTHER ASSUME ALL LIABILITY FOR YOUR USE, REPRODUCTION, MAKING OF DERIVATIVE WORKS, DISPLAY, LICENSE OR DISTRIBUTION OF THE SOFTWARE AND AGREE TO INDEMNIFY AND HOLD HARMLESS SBIA AND ALL CONTRIBUTORS FROM AND AGAINST ANY AND ALL CLAIMS, SUITS, ACTIONS, DEMANDS AND JUDGMENTS ARISING THEREFROM. 6. None of the names, logos or trademarks of SBIA or any of SBIA's affiliates or any of the Contributors, or any funding agency, may be used to endorse or promote products produced in whole or in part by operation of the Software or derived from or based on the Software without specific prior written permission from the applicable party. 7. Any use, reproduction or distribution of the Software which is not in accordance with this Software License shall automatically revoke all rights granted to you under this Software License and render Paragraphs 1 and 2 of this Software License null and void. 8. This Software License does not grant any rights in or to any intellectual property owned by SBIA or any Contributor except those rights expressly granted hereunder. PART C. MISCELLANEOUS --------------------- This Agreement shall be governed by and construed in accordance with the laws of The Commonwealth of Pennsylvania without regard to principles of conflicts of law. This Agreement shall supercede and replace any license terms that you may have agreed to previously with respect to Software from SBIA. E-O-LICENSE exit } ##################################################################### # Parse the command-line while [ "X$1" != "X" ] do case $1 in -L) license ;; -t) TESTING="yes" shift ;; -d) # Date stamp given...only valid in testing mode shift # Convert the user-supplied date to the YYYY_Mo_DD_HH:MM form, # throwing away the seconds UserDATE="$1" now=`date --date "$1" '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` shift ;; -c) shift CONF=$1 if [ ! -f $CONF ] ; then usage "Specified configuration file ($CONF) not found" fi shift ;; -f) shift filesys=$1 shift ;; *) usage "Unrecognized option: \"$1\"" ;; esac done ############## End of command line parsing LOCKFILE=/var/run/snapshotter.$filesys if [ -f $LOCKFILE ] ; then PIDs=`cat $LOCKFILE | tr "\012" " "` echo "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." 1>&2 $LOGGER "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." exit 1 else echo $$ > $LOCKFILE if [ $? != 0 ] ; then echo "Could not create lockfile $LOCKFILE for process $$. Exiting." 1>&2 $LOGGER "Could not create lockfile $LOCKFILE for process $$" exit 2 fi fi ######## Check sanity of user-supplied values if [ "X$filesys" = "X" ] ; then $LOGGER "Filesystem must be specified" usage "Filesystem must be specified" fi if [ $TESTING = "yes" ] ; then # testing mode: # accept faux filesystem argument # accept faux datestamp as arguments # read faux results from mmlssnapshot on STDIN # MMDF # # Do not really use mmdf executable, so that the testing can be # done outside a GPFS cluster Use a 2-digit random number 00 .. 99 # from $RANDOM, but fill the variable with dummy fields so the # the random number corresponds to field5, where it would be in # the mmdf output. MMDF="eval echo \(total\) f1 f2 f3 f4 \(\${RANDOM: -2:2}%\) " MMCRSNAPSHOT="echo mmcrsnapshot" MMDELSNAPSHOT="echo mmdelsnapshot" MMLSSNAPDATA=`cat - | tr "\012" "%"` MMLSSNAPSHOT="eval echo \$MMLSSNAPDATA|tr '%' '\012'" LOGGER="echo Log message: " else if [ "X$UserDATE" != "X" ] ; then $LOGGER "Option \"-d\" only valid in testing mode" usage "Option \"-d\" only valid in testing mode" fi /usr/lpp/mmfs/bin/mmlsfs $filesys -T 1> /dev/null 2>&1 if [ $? != 0 ] ; then $LOGGER "Error accessing GPFS filesystem: $filesys" echo "Error accessing GPFS filesystem: $filesys" 1>&2 rm -f $LOCKFILE exit 1 fi # Check if the node where this script is running is the GPFS manager node for the # specified filesystem manager=`/usr/lpp/mmfs/bin/mmlsmgr $filesys | grep -w "^$filesys" |awk '{print $2}'` ip addr list | grep -qw "$manager" if [ $? != 0 ] ; then # This node is not the manager...exit rm -f $LOCKFILE exit fi MMLSSNAPSHOT="$MMLSSNAPSHOT $filesys" fi # It is valid for the default config file not to exist, so check if # is there before sourcing it if [ -f $CONF ] ; then . $CONF $filesys # load variables found in $CONF, based on $filesys fi # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # Produce list of valid snapshot names (w/o header lines) snapnames=`$MMLSSNAPSHOT |grep Valid |sed -e '$d' -e 's/ .*//'` # get the number of existing snapshots snapcount=($snapnames) ; snapcount=${#snapcount[*]} ########################################################### # given a list of old snapshot names, in the form: # YYYY_Mo_DD_HH:MM # fill the buckets by time. A snapshot can only go # into one bucket! ########################################################### for oldsnap in $snapnames do oldstamp=`echo $oldsnap|sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` oldsecs=`date --date "$oldstamp" "+%s"` diff=$((nowsecs - oldsecs)) # difference in seconds between 'now' and old snapshot if [ $diff -lt 0 ] ; then # this can happen during testing...we have got a faux # snapshot date in the future...skip it continue fi index=0 prevbucket=0 filled=No while [ $index -lt ${#intervals[*]} -a $filled != "Yes" ] do bucket=${intervals[$index]} # ceiling for number of hours for this bucket (1 more than the number of # actual hours, ie., "7" means that the bucket can contain snapshots that are # at least 6:59 (hh:mm) old. count=${counts[$index]} # max number of items in this bucket bucketinterval=$(( bucket * ( secsINhr / count ) )) # Number of hours (in seconds) between snapshots that should be retained # for this bucket...convert from hrs (bucket/count) to seconds in order to deal with :15 minute intervals # Force the mathematical precedence to do (secsINhr / count) so that cases where count>bucket (like the first 1hr # that may have a count of 4 retained snapshots) doesn't result in the shell throwing away the fraction if [ $diff -ge $((prevbucket * secsINhr)) -a $diff -lt $((bucket * secsINhr)) ] ; then # We found the correct bucket filled=Yes ## printf "Checking if $oldsnap should be retained if it is multiple of $bucketinterval [ ($oldsecs %% $bucketinterval) = 0]" # Does the snapshot being examined fall on the interval determined above for the snapshots that should be retained? if [ $(( oldsecs % bucketinterval )) = 0 ] ; then # The hour of the old snapshot is evenly divisible by the number of snapshots that should be # retained in this interval...keep it tokeep="$tokeep $oldsnap" ## printf "...yes\n" else todelete="$todelete $oldsnap" ## printf "...no\n" fi prevbucket=$bucket fi index=$((index + 1)) done if [ $diff -ge $((bucket * secsINhr )) ] ; then filled=Yes # This is too old...schedule it for deletion $LOGGER "Scheduling old snapshot $oldsnap from $filesys for deletion" todelete="$todelete $oldsnap" fi # We should not get here if [ $filled != Yes ] ; then $LOGGER "Snapshot \"$oldsnap\" on $filesys does not match any intervals" fi done # Sort the lists to make reading the testing output easier todelete=`echo $todelete | tr " " "\012" | sort -bdfu` tokeep=`echo $tokeep | tr " " "\012" | sort -bdfu` ############################################################# for oldsnap in $todelete do if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $oldsnap # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate ## echo "Removing entry for \"$oldsnap\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$oldsnap [^%]*%/%/"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $oldsnap 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $oldsnap\": $output" 1>&2 $LOGGER "Error removing snapshot of $filesys with label \"$oldsnap\": $output" rm -f $LOCKFILE exit 1 else $LOGGER "successfully removed snapshot of $filesys with label \"$oldsnap\"" fi done ############# Now check for free space ####################################### # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # get the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` while [ $freenow -le $MINFREE -a $snapcount -gt 0 ] do # must delete some snapshots, from the oldest first... todelete=`$MMLSSNAPSHOT|grep Valid |sed -n -e 's/ .*//' -e '1p'` if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $todelete # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate and from $tokeep ## echo "Removing entry for \"$todelete\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$todelete [^%]*%/%/"` tokeep=`echo $tokeep | sed -e "s/^$todelete //" -e "s/ $todelete / /" -e "s/ $todelete$//" -e "s/^$todelete$//"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $todelete 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $todelete\": $output" 1>&2 $LOGGER "Low disk space (${freenow}%) triggered attempt to remove snapshot of $filesys with label \"$todelete\" -- Error: $output" rm -f $LOCKFILE exit 1 else $LOGGER "removed snapshot \"$todelete\" from $filesys because ${freenow}% free disk is less than ${MINFREE}%" fi # update the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` done if [ $snapcount = 0 -a $freenow -ge $MINFREE ] ; then echo "All existing snapshots removed on $filesys, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" 1>&2 $LOGGER "All existing snapshots on $filesys removed, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" rm -f $LOCKFILE exit 1 fi $LOGGER "Free disk space on $filesys (${freenow}%) above minimum required (${MINFREE}%) to create new snapshot" ############################################################## if [ $TESTING = "yes" ] ; then # List snapshots being kept for oldsnap in $tokeep do echo "Keeping snapshot $oldsnap" done fi ############################################################# # Now create the current snapshot...do this after deleting snaps in order to reduce the chance of running # out of disk space results=`$MMCRSNAPSHOT $filesys $now 2>&1 | tr "\012" "%"` if [ $? != 0 ] ; then printf "Error from \"$MMCRSNAPSHOT $filesys $now\":\n\t" 1>&2 echo $results | tr '%' '\012' 1>&2 results=`echo $results | tr '%' '\012'` $LOGGER "Error creating snapshot of $filesys with label $now: \"$results\"" rm -f $LOCKFILE exit 1 else $LOGGER "successfully created snapshot of $filesys with label $now" fi rm -f $LOCKFILE From Jez.Tucker at rushes.co.uk Thu Feb 7 12:28:16 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 7 Feb 2013 12:28:16 +0000 Subject: [gpfsug-discuss] SOBAR Message-ID: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Hey all Is anyone using the SOBAR method of backing up the metadata and NSD configs? If so, how is your experience? >From reading the docs, it seems a bit odd that on restoration you have to re-init the FS and recall all the data. If so, what's the point of SOBAR? --- Jez Tucker Senior Sysadmin Rushes http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Thu Feb 7 12:47:30 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 12:47:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <5113A262.8080206@ed.ac.uk> On 07/02/13 12:28, Jez Tucker wrote: > Hey all > > Is anyone using the SOBAR method of backing up the metadata and NSD > configs? > > If so, how is your experience? > > From reading the docs, it seems a bit odd that on restoration you have > to re-init the FS and recall all the data. > > If so, what?s the point of SOBAR? Ooh - this is new. From first glance, it looks to be a DR solution? We're actually in the process of engineering our own DR solution based on a not-dissimilar concept: - build a second GPFS file system off-site, with HSM enabled (called "dr-fs" here) - each night, rsync the changed data from "prod-fs" to "dr-fs" - each day, migrate data from the disk pool in "dr-fs" to the tape pool to free up sufficient capacity for the next night's rsync You have a complete copy of the filesystem metadata from "prod-fs" on "dr-fs", so it looks (to a user) identical, but on "dr-fs" some of the ("older") data is on tape (ratios dependent on sizing of disk vs tape pools, of course). In the event of a disaster, you just flip over to "dr-fs". From the quick glance at SOBAR, it looks to me like the concept is that you don't have a separate file system, but you hold a secondary copy in TSM via the premigrate function, and store the filesystem metadata as a flat file dump backed up "in the normal way". In DR, you rebuild the FS from the metadata backup, and re-attach the HSM pool to this newly-restored filesystem, (and then start pushing the data back out of the HSM pool into the GPFS disk pool). As soon as the HSM pool is re-attached, users can start getting their data (as fast as TSM can give it to them), and the filesystem will look "normal" to them (albeit slow, if recalling from tape). Nice - good to see this kind of thing coming from IBM - restore of huge filesystems from traditional backup really doesn't make much sense nowadays - it'd just take too long. This kind of approach doesn't necessarily accelerate the overall time to restore, but it allows for a usable filesystem to be made available while the restore happens in the background. I'd look for clarity about the state of the filesystem on restore - particularly around what happens to data which arrives after the migration has happened but before the metadata snapshot is taken. I think it'd be lost, but the metadata would still point to it existing? Might get confusing... Just my 2 cents from a quick skim read mind - plus a whole bunch of thinking we've done on this subject recently :) -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Thu Feb 7 13:40:30 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 07 Feb 2013 13:40:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113A262.8080206@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> Message-ID: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: [SNIP] > Nice - good to see this kind of thing coming from IBM - restore of huge > filesystems from traditional backup really doesn't make much sense > nowadays - it'd just take too long. Define too long? It's perfectly doable, and the speed of the restore will depend on what resources you have to throw at the problem. The main issue is having lots of tape drives for the restore. Having a plan to buy more ASAP is a good idea. The second is don't let yourself get sidetracked doing "high priority" restores for individuals, it will radically delay the restore. Beyond that you need some way to recreate all your storage pools, filesets, junction points and quotas etc. Looks like the mmbackupconfig and mmrestoreconfig now take care of all that for you. That is a big time saver right there. > This kind of approach doesn't > necessarily accelerate the overall time to restore, but it allows for a > usable filesystem to be made available while the restore happens in the > background. > The problem is that your tape drives will go crazy with HSM activity. So while in theory it is usable it practice it won't be. Worse with the tape drives going crazy with the HSM they won't be available for restore. I would predict much much long times to recovery where recovery is defined as being back to where you where before the disaster occurred. > > I'd look for clarity about the state of the filesystem on restore - > particularly around what happens to data which arrives after the > migration has happened but before the metadata snapshot is taken. I > think it'd be lost, but the metadata would still point to it existing? I would imagine that you just do a standard HSM reconciliation to fix that. Should be really fast with the new policy based reconciliation after you spend several months backing all your HSM'ed files up again :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From orlando.richards at ed.ac.uk Thu Feb 7 13:51:25 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:51:25 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> Message-ID: <5113B15D.4080805@ed.ac.uk> On 07/02/13 13:40, Jonathan Buzzard wrote: > On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > > [SNIP] > >> Nice - good to see this kind of thing coming from IBM - restore of huge >> filesystems from traditional backup really doesn't make much sense >> nowadays - it'd just take too long. > > Define too long? It's perfectly doable, and the speed of the restore > will depend on what resources you have to throw at the problem. The main > issue is having lots of tape drives for the restore. I can tell you speak from (bitter?) experience :) I've always been "disappointed" with the speed of restores - but I've never tried a "restore everything", which presumably will run quicker. One problem I can see us having is that we have lots of small files, which tends to make everything go really slowly - but getting the thread count up would, I'm sure, help a lot. > Having a plan to > buy more ASAP is a good idea. The second is don't let yourself get > sidetracked doing "high priority" restores for individuals, it will > radically delay the restore. Quite. > Beyond that you need some way to recreate all your storage pools, > filesets, junction points and quotas etc. Looks like the mmbackupconfig > and mmrestoreconfig now take care of all that for you. That is a big > time saver right there. > >> This kind of approach doesn't >> necessarily accelerate the overall time to restore, but it allows for a >> usable filesystem to be made available while the restore happens in the >> background. >> > > The problem is that your tape drives will go crazy with HSM activity. So > while in theory it is usable it practice it won't be. Worse with the > tape drives going crazy with the HSM they won't be available for > restore. I would predict much much long times to recovery where recovery > is defined as being back to where you where before the disaster > occurred. Yup - I can see that too. I think a large disk pool would help there, along with some kind of logic around "what data is old?" to sensibly place stuff "likely to be accessed" on disk, and the "old" stuff on tape where it can be recalled at a more leisurely pace. >> >> I'd look for clarity about the state of the filesystem on restore - >> particularly around what happens to data which arrives after the >> migration has happened but before the metadata snapshot is taken. I >> think it'd be lost, but the metadata would still point to it existing? > > I would imagine that you just do a standard HSM reconciliation to fix > that. Should be really fast with the new policy based reconciliation > after you spend several months backing all your HSM'ed files up > again :-) > Ahh - but once you've got them in TSM, you can just do a storage pool backup, presumably to a third site, and always have lots of copies everywhere! Of course - you still need to keep generational history somewhere... -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Thu Feb 7 13:56:05 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:56:05 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B15D.4080805@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> Message-ID: <5113B275.7030401@ed.ac.uk> On 07/02/13 13:51, Orlando Richards wrote: > On 07/02/13 13:40, Jonathan Buzzard wrote: >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: >> >> [SNIP] >> >>> Nice - good to see this kind of thing coming from IBM - restore of huge >>> filesystems from traditional backup really doesn't make much sense >>> nowadays - it'd just take too long. >> >> Define too long? Oh - for us, this is rapidly approaching "anything more than a day, and can you do it faster than that please". Not much appetite for the costs of full replication though. :/ -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Fri Feb 8 09:40:27 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 08 Feb 2013 09:40:27 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B275.7030401@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> <5113B275.7030401@ed.ac.uk> Message-ID: <1360316427.16393.23.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 13:56 +0000, Orlando Richards wrote: > On 07/02/13 13:51, Orlando Richards wrote: > > On 07/02/13 13:40, Jonathan Buzzard wrote: > >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > >> > >> [SNIP] > >> > >>> Nice - good to see this kind of thing coming from IBM - restore of huge > >>> filesystems from traditional backup really doesn't make much sense > >>> nowadays - it'd just take too long. > >> > >> Define too long? > > I can tell you speak from (bitter?) experience :) Done two large GPFS restores. The first was to migrate a HSM file system to completely new hardware, new TSM version and new GPFS version. IBM would not warrant an upgrade procedure so we "restored" from tape onto the new hardware and then did rsync's to get it "identical". Big problem was the TSM server hardware at the time (a p630) just gave up the ghost about 5TB into the restore repeatedly. Had do it a user at a time which made it take *much* longer as I was repeatedly going over the same tapes. The second was from bitter experience. Someone else in a moment of complete and utter stupidity wiped some ~30 NSD's of their descriptors. Two file systems an instant and complete loss. Well not strictly true it was several days before it manifested itself when one of the NSD servers was rebooted. A day was then wasted working out what the hell had happened to the file system that could have gone to the restore. Took about three weeks to get back completely. Could have been done a lot lot faster if I had had more tape drives on day one and/or made a better job of getting more in, had not messed about prioritizing restores of particular individuals, and not had capacity issues on the TSM server to boot (it was scheduled for upgrade anyway and a CPU failed mid restore). I think TSM 6.x would have been faster as well as it has faster DB performance, and the restore consisted of some 50 million files in about 30TB and it was the number of files that was the killer for speed. It would be nice in a disaster scenario if TSM would also use the tapes in the copy pools for restore, especially when they are in a different library. Not sure if the automatic failover procedure in 6.3 does that. For large file systems I would seriously consider using virtual mount points in TSM and then collocating the file systems. I would also look to match my virtual mount points to file sets. The basic problem is that most people don't have the spare hardware to even try disaster recovery, and even then you are not going to be doing it under the same pressure, hindsight is always 20/20. > Oh - for us, this is rapidly approaching "anything more than a day, and > can you do it faster than that please". Not much appetite for the costs > of full replication though. > Remember you can have any two of cheap, fast and reliable. If you want it back in a day or less then that almost certainly requires a full mirror and is going to be expensive. Noting of course if it ain't offline it ain't backed up. See above if some numpty can wipe the NSD descriptors on your file systems then can do it to your replicated file system at the same time. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jez.Tucker at rushes.co.uk Fri Feb 8 13:17:03 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 8 Feb 2013 13:17:03 +0000 Subject: [gpfsug-discuss] Maximum number of files in a TSM dsmc archive filelist Message-ID: <39571EA9316BE44899D59C7A640C13F5306E9570@WARVWEXC1.uk.deluxe-eu.com> Allo I'm doing an archive with 1954846 files in a filelist. SEGV every time. (BA 6.4.0-0) Am I being optimistic with that number of files? Has anyone successfully done that many in a single archive? --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:29:12 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:29:12 -0600 Subject: [gpfsug-discuss] File system recovery question Message-ID: <20130213162912.GA22701@logos.ncsa.illinois.edu> I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. Executing od shows it is zero's. (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0040000 120070 156006 120070 156006 120070 156006 120070 156006 Dumping the header of one of the other disks shows read data for the other NSD's in that file system. (! 25)-> mmlsnsd -m | grep dh1_vd05_005 Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node (! 27)-> mmnsddiscover -d dh1_vd05_005 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... myhost: Rediscovery failed for dh1_vd05_005. mmnsddiscover: Finished. Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic Is there any way to repair the header on the NSD? Thanks for any ideas! Chad From Jez.Tucker at rushes.co.uk Wed Feb 13 16:43:50 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Wed, 13 Feb 2013 16:43:50 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> So, er. Fun. I checked our disks. 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0001000 Looks like you lost a fair bit. Presumably you don't have replication of 2? If so, I think you could just lose the NSD. Failing that: 1) Check your other disks and see if there's anything that you can figure out. Though TBH, this may take forever. 2) Restore 3) Call IBM and log a SEV 1. 3) then 2) is probably the best course of action Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Chad Kerner > Sent: 13 February 2013 16:29 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] File system recovery question > > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. I > can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for > the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a > while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > mmcommon preunmount invoked. File system: desarchive Reason: > SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From craigawilson at gmail.com Wed Feb 13 16:48:32 2013 From: craigawilson at gmail.com (Craig Wilson) Date: Wed, 13 Feb 2013 16:48:32 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> Message-ID: Dealt with a similar issue a couple of months ago. In that case the data was fine but two of the descriptors were over written. You can use "mmfsadm test readdescraw /dev/$drive" to see the descriptors, we managed to recover the disk but only after logging it to IBM and manually rebuilding the descriptor. -CW On 13 February 2013 16:43, Jez Tucker wrote: > So, er. Fun. > > I checked our disks. > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0001000 > > Looks like you lost a fair bit. > > > Presumably you don't have replication of 2? > If so, I think you could just lose the NSD. > > Failing that: > > 1) Check your other disks and see if there's anything that you can figure > out. Though TBH, this may take forever. > 2) Restore > 3) Call IBM and log a SEV 1. > > 3) then 2) is probably the best course of action > > Jez > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Chad Kerner > > Sent: 13 February 2013 16:29 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] File system recovery question > > > > I have a file system, and it appears that someone dd'd over the first > > part of one of the NSD's with zero's. I see the device in multipath. I > > can fdisk and dd the device out. > > > > Executing od shows it is zero's. > > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > > * > > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > > > Dumping the header of one of the other disks shows read data for > > the other NSD's in that file system. > > > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > > Disk name NSD volume ID Device Node name > > Remarks > > > --------------------------------------------------------------------------------------- > > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > > node > > > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > > mmnsddiscover: Attempting to rediscover the disks. This may take a > > while ... > > myhost: Rediscovery failed for dh1_vd05_005. > > mmnsddiscover: Finished. > > > > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > > volume dh1_vd05_005. > > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > > desarchive. > > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > > mmcommon preunmount invoked. File system: desarchive Reason: > > SGPanic > > > > Is there any way to repair the header on the NSD? > > > > Thanks for any ideas! > > Chad > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Feb 13 16:48:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:48:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> So what do you get if you run: mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 ? Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:29, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:52:30 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:52:30 -0600 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> Message-ID: <20130213165230.GA23294@logos.ncsa.illinois.edu> (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: > So what do you get if you run: > > mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > > ? > > > > > Vic Cornell > viccornell at gmail.com > > > On 13 Feb 2013, at 16:29, Chad Kerner wrote: > > > I have a file system, and it appears that someone dd'd over the first part > of one of the NSD's with zero's. I see the device in multipath. I can > fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other > NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. > Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the > system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: > desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > From viccornell at gmail.com Wed Feb 13 16:57:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:57:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213165230.GA23294@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> <20130213165230.GA23294@logos.ncsa.illinois.edu> Message-ID: <4D043736-06A7-44A0-830E-63D66438595F@gmail.com> Thats not pretty - but you can push the NSD descriptor on with something like: tspreparedisk -F -n /dev/mapper/dh1_vd05_005 -p 8D8EEA98506C69CE That leaves you with the FS and Disk descriptors to recover . . . . Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:52, Chad Kerner wrote: > > > (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 > No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 > No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 > > > > On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: >> So what do you get if you run: >> >> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 >> >> ? >> >> >> >> >> Vic Cornell >> viccornell at gmail.com >> >> >> On 13 Feb 2013, at 16:29, Chad Kerner wrote: >> >> >> I have a file system, and it appears that someone dd'd over the first part >> of one of the NSD's with zero's. I see the device in multipath. I can >> fdisk and dd the device out. >> >> Executing od shows it is zero's. >> (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 >> 0000000 000000 000000 000000 000000 000000 000000 000000 000000 >> * >> 0040000 120070 156006 120070 156006 120070 156006 120070 156006 >> >> Dumping the header of one of the other disks shows read data for the other >> NSD's in that file system. >> >> (! 25)-> mmlsnsd -m | grep dh1_vd05_005 >> Disk name NSD volume ID Device Node name >> Remarks >> --------------------------------------------------------------------------------------- >> dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server >> node >> >> (! 27)-> mmnsddiscover -d dh1_vd05_005 >> mmnsddiscover: Attempting to rediscover the disks. This may take a while >> ... >> myhost: Rediscovery failed for dh1_vd05_005. >> mmnsddiscover: Finished. >> >> >> Wed Feb 13 09:14:03.694 2013: Command: mount desarchive >> Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. >> Physical volume dh1_vd05_005. >> Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the >> system with return code 5 reason code 0 >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive >> Wed Feb 13 09:14:07.104 2013: Input/output error >> Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: >> desarchive Reason: SGPanic >> >> Is there any way to repair the header on the NSD? >> >> Thanks for any ideas! >> Chad >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> From jonathan at buzzard.me.uk Wed Feb 13 17:00:31 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 13 Feb 2013 17:00:31 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <1360774831.23342.9.camel@buzzard.phy.strath.ac.uk> On Wed, 2013-02-13 at 10:29 -0600, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. > I can fdisk and dd the device out. Log a SEV1 call with IBM. If it is only one NSD that is stuffed they might be able to get it back for you. However it is a custom procedure that requires developer time from Poughkeepsie. It will take some time. In the meantime I would strongly encourage you to start preparing for a total restore, which will include recreating the file system from scratch. Certainly if all the NSD headers are stuffed then the file system is a total loss. However even with only one lost it is not as I understand it certain you can get it back. It is probably a good idea to store the NSD headers somewhere off the file system in case some numpty wipes them. The most likely reason for this is that they ran an distro install on a system that has direct access to the disk. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Tobias.Kuebler at sva.de Wed Feb 13 17:00:37 2013 From: Tobias.Kuebler at sva.de (Tobias.Kuebler at sva.de) Date: Wed, 13 Feb 2013 18:00:37 +0100 Subject: [gpfsug-discuss] =?iso-8859-1?q?AUTO=3A_Tobias_Kuebler_ist_au=DFe?= =?iso-8859-1?q?r_Haus_=28R=FCckkehr_am_02/18/2013=29?= Message-ID: Ich bin bis 02/18/2013 abwesend. Vielen Dank f?r Ihre Nachricht. Ankommende E-Mails werden w?hrend meiner Abwesenheit nicht weitergeleitet, ich versuche Sie jedoch m?glichst rasch nach meiner R?ckkehr zu beantworten. In dringenden F?llen wenden Sie sich bitte an Ihren zust?ndigen Vertriebsbeauftragten. Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re: [gpfsug-discuss] File system recovery question" gesendet am 13.02.2013 17:43:50. Diese ist die einzige Benachrichtigung, die Sie empfangen werden, w?hrend diese Person abwesend ist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Thu Feb 28 17:25:26 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 28 Feb 2013 17:25:26 +0000 Subject: [gpfsug-discuss] Who uses TSM to archive HSMd data (inline) ? Message-ID: <39571EA9316BE44899D59C7A640C13F5306EED70@WARVWEXC1.uk.deluxe-eu.com> Hello all, I have to ask Does anyone else do this? We have a problem and I'm told that "it's so rare that anyone would archive data which is HSMd". I.E. to create an archive whereby a project is entirely or partially HSMd to LTO - online data is archived to tape - offline data is copied from HSM tape to archive tape 'inline' Surely nobody pulls back all their data to disk before re-archiving back to tape? --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 6 18:38:56 2013 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 6 Feb 2013 13:38:56 -0500 (EST) Subject: [gpfsug-discuss] GPFS snapshot cron job Message-ID: I'm new on this list. It looks like it can be useful for exchanging GPFS experiences. We have been running GPFS for a couple of years now on one cluster and are in process of bringing it up on a couple of other clusters. One thing we would like, but have not had time to do is automatic snapshots similar to what NetApp does. For our purposes a cron job that ran every 4 hours that creates a new snapshot and removes older snapshots would be sufficient. The slightly hard task is correctly removing the older snapshots. Does anyone have such a cron script they can share? Or did I miss something in GPFS that handles automatic snapshots? Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From pete at realisestudio.com Wed Feb 6 19:28:40 2013 From: pete at realisestudio.com (Pete Smith) Date: Wed, 6 Feb 2013 19:28:40 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: Hi rsnapshot is probably what you're looking for. :-) On 6 Feb 2013 18:39, "Stuart Barkley" wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Feb 6 19:40:49 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 06 Feb 2013 19:40:49 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: <5112B1C1.3080403@buzzard.me.uk> On 06/02/13 18:38, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > Find attached a Perl script that does just what you want with a range of configurable parameters. It is intended to create snapshots that work with the Samba VFS module shadow_copy2 so that you can have a previous versions facility on your Windows boxes. Note it creates a "quiescent" lock that interacted with another script that was called to do a policy based tiering from fast disks to slow disks. That gets called based on a trigger for a percentage of the fast disk pool being full, and consequently can get called at any time. If the tiering is running then trying to take a snapshot at the same time will lead to race conditions and the file system will deadlock. Note that if you are creating snapshots in the background then a whole range of GPFS commands if run at the moment the snapshot is being created or deleted will lead to deadlocks. > Or did I miss something in GPFS that handles automatic snapshots? Yeah what you missed is that it will randomly lock your file system up. So while the script I have attached is all singing and all dancing. It has never stayed in production for very long. On a test file system that has little activity it runs for months without a hitch. When rolled out on busy file systems with in a few days we would a deadlock waiting for some file system quiescent state and everything would grind to a shuddering halt. Sometimes on creating the snapshot and sometimes on deleting them. Unless there has been a radical change in GPFS in the last few months, you cannot realistically do what you want. IBM's response was that you should not be taking snapshots or deleting old ones while the file system is "busy". Not that I would have thought the file system would have been that "busy" at 07:00 on a Saturday morning, but hey. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. -------------- next part -------------- A non-text attachment was scrubbed... Name: shadowcopy.pl Type: application/x-perl Size: 5787 bytes Desc: not available URL: From erich at uw.edu Wed Feb 6 19:45:05 2013 From: erich at uw.edu (Eric Horst) Date: Wed, 6 Feb 2013 11:45:05 -0800 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: It's easy if you use a chronologically sortable naming scheme. We use YYYY-MM-DD-hhmmss. This is a modified excerpt from the bash script I use. The prune function takes an arg of the number of snapshots to keep. SNAPROOT=/grfs/ud00/.snapshots function prune () { PCPY=$1 for s in $(/bin/ls -d "$SNAPROOT"/????-??-??-?????? | head --lines=-$PCPY); do mmdelsnapshot $FSNAME $s if [ $? != 0 ]; then echo ERROR: there was a mmdelsnapshot problem $? exit else echo Success fi done } echo Pruning snapshots prune 12 -Eric On Wed, Feb 6, 2013 at 10:38 AM, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bergman at panix.com Wed Feb 6 21:28:30 2013 From: bergman at panix.com (bergman at panix.com) Date: Wed, 06 Feb 2013 16:28:30 -0500 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: Your message of "Wed, 06 Feb 2013 13:38:56 EST." References: Message-ID: <20647.1360186110@localhost> In the message dated: Wed, 06 Feb 2013 13:38:56 -0500, The pithy ruminations from Stuart Barkley on <[gpfsug-discuss] GPFS snapshot cron job> were: => I'm new on this list. It looks like it can be useful for exchanging => GPFS experiences. => => We have been running GPFS for a couple of years now on one cluster and => are in process of bringing it up on a couple of other clusters. => => One thing we would like, but have not had time to do is automatic => snapshots similar to what NetApp does. For our purposes a cron job => that ran every 4 hours that creates a new snapshot and removes older => snapshots would be sufficient. The slightly hard task is correctly => removing the older snapshots. => => Does anyone have such a cron script they can share? Yes. I've attached the script that we run from cron. Our goal was to keep a decaying set of snapshots over a fairly long time period, so that users would be able to recover from "rm", while not using excessive space. Snapshots are named with timestamp, making it slightly easier to understand what data they contain and the remove the older ones. The cron job runs every 15 minutes on every GPFS server node, but checks if it is executing on the node that is the manager for the specified filesystem to avoid concurrency issues. The script will avoid making a snapshot if there isn't sufficient disk space. Our config file to manage snapshots is: ------------ CUT HERE -- CUT HERE -------------- case $1 in home) intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval MINFREE=5 # minimum free disk space, in percent ;; shared) intervals=(1 4 48) # hour number of each interval counts=(4 2 2) # max number of snapshots to keep per each interval MINFREE=20 # minimum free disk space, in percent ;; esac ------------ CUT HERE -- CUT HERE -------------- For the "home" filesystem, this says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 4 snapshots made in the most recent 4 hr interval (1 for each hour) keep 4 snapshots made in the most recent 24 hr interval (1 each 6hrs) keep 4 snapshots made in the most recent 48 hr interval (1 each 12 hrs) For the "shared" filesystem, the configuration says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 2 snapshots made in the most recent 4 hr interval (1 each 2 hours) keep 2 snapshots made in the most recent 48 hr interval (1 each 24 hrs) Those intervals "overlap", so there are a lot of recent snapshots, and fewer older ones. Each time a snapshot is made, older snapshots may be removed. So, at 5:01 PM on Thursday, there may be snapshots of the "home" filesystem from: 17:00 Thursday ---+-- 4 in the last hour 16:45 Thursday | 16:30 Thursday | 16:15 Thursday -- + 16:00 Thursday ---+-- 4 in the last 4 hours, including 15:00 Thursday | the 5:00PM Thursday snapshot 14:00 Thursday ---+ 11:00 Thursday ---+-- 4 in the last 24 hours, including 05:00 Thursday | 17:00 Thursday 23:00 Wednesday ---+ 17:00 Wednesday ---+-- 4 @ 12-hr intervals in the last 48 hours, 05:00 Wednesday ---+ including 17:00 & 05:00 Thursday Suggestions and patches are welcome. => => Or did I miss something in GPFS that handles automatic snapshots? We have seen periodic slowdowns when snapshots are running, but nothing to the extent described by Jonathan Buzzard. Mark => => Thanks, => Stuart Barkley => -- => I've never been lost; I was once bewildered for three days, but never lost! => -- Daniel Boone -------------- next part -------------- #! /bin/bash #$Id: snapshotter 858 2012-01-31 19:24:11Z$ # Manage snapshots of GPFS volumes # # Desgined to be called from cron at :15 intervals # ################################################################## # Defaults, may be overridden by /usr/local/etc/snappshotter.conf # or file specified by "-c" CONF=/usr/local/etc/snappshotter.conf # config file, supersceded by "-c" option MINFREE=10 # minimum free space, in percent. # # Series of intervals and counts. Intervals expressed as the end-point in hours. # count = number of snapshots to keep per-interval ############## # time # ==== # :00-59 keep snapshots at 15 minute intervals; ceiling of interval = 1hr # 01-03:59 keep snapshots at 1hr interval; ceiling of interval = 4hr # 04-23:59 keep snapshots at 6hr intervals; ceiling of interval = 24hr # 24-47:59 keep snapshots at 12hr intervals; ceiling of interval = 48hr intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval # Note that the snapshots in interval (N+1) must be on a time interval # that corresponds to the snapshots kept in interval N. # # :00-59 keep snapshots divisible by 1/4hr: 00:00, 00:15, 00:30, 00:45, 01:00, 01:15 ... # 01-04:59 keep snapshots divisible by 4/4hr: 00:00, 01:00, 02:00, 03:00 ... # 05-23:59 keep snapshots divisible by 24/4hr: 00:00, 06:00, 12:00, 18:00 # 24-48:59 keep snapshots divisible by 48/4hr: 00:00, 12:00 # # ################################################################## TESTING="no" MMDF=/usr/lpp/mmfs/bin/mmdf MMCRSNAPSHOT=/usr/lpp/mmfs/bin/mmcrsnapshot MMLSSNAPSHOT=/usr/lpp/mmfs/bin/mmlssnapshot MMDELSNAPSHOT=/usr/lpp/mmfs/bin/mmdelsnapshot LOGGER="logger -p user.alert -t snapshotter" PATH="${PATH}:/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/local/sbin" # for access to 'ip' command, GPFS commands now=`date '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` secsINhr=$((60 * 60)) ##################################################################### usage() { cat - << E-O-USAGE 1>&2 $0 -- manage GPFS snapshots Create new GPFS snapshots and remove old snapshots. Options: -f filesystem required -- name of filesystem to snapshot -t testing test mode, report what would be done but perform no action -d "datestamp" test mode only; used supplied date stamp as if it was the current time. -c configfile use supplied configuration file in place of default: $CONF -L show license statement In test mode, the input data, in the same format as produced by "mmlssnap" must be supplied. This can be done on STDIN, as: $0 -t -f home -d "\`date --date "Dec 7 23:45"\`" < mmlssnap.data or $0 -t -f home -d "\`date --date "now +4hours"\`" < mmlssnap.data E-O-USAGE echo 1>&2 echo $1 1>&2 exit 1 } ##################################################################### license() { cat - << E-O-LICENSE Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania 3600 Market Street, Suite 380 Philadelphia, PA 19104 Web: http://www.rad.upenn.edu/sbia/ Email: sbia-software at uphs.upenn.edu SBIA Contribution and Software License Agreement ("Agreement") ============================================================== Version 1.0 (June 9, 2011) This Agreement covers contributions to and downloads from Software maintained by the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania ("SBIA"). Part A of this Agreement applies to contributions of software and/or data to the Software (including making revisions of or additions to code and/or data already in this Software). Part B of this Agreement applies to downloads of software and/or data from SBIA. Part C of this Agreement applies to all transactions with SBIA. If you distribute Software (as defined below) downloaded from SBIA, all of the paragraphs of Part B of this Agreement must be included with and apply to such Software. Your contribution of software and/or data to SBIA (including prior to the date of the first publication of this Agreement, each a "Contribution") and/or downloading, copying, modifying, displaying, distributing or use of any software and/or data from SBIA (collectively, the "Software") constitutes acceptance of all of the terms and conditions of this Agreement. If you do not agree to such terms and conditions, you have no right to contribute your Contribution, or to download, copy, modify, display, distribute or use the Software. PART A. CONTRIBUTION AGREEMENT - LICENSE TO SBIA WITH RIGHT TO SUBLICENSE ("CONTRIBUTION AGREEMENT"). ----------------------------------------------------------------------------------------------------- 1. As used in this Contribution Agreement, "you" means the individual contributing the Contribution to the Software maintained by SBIA and the institution or entity which employs or is otherwise affiliated with such individual in connection with such Contribution. 2. This Contribution Agreement applies to all Contributions made to the Software maintained by SBIA, including without limitation Contributions made prior to the date of first publication of this Agreement. If at any time you make a Contribution to the Software, you represent that (i) you are legally authorized and entitled to make such Contribution and to grant all licenses granted in this Contribution Agreement with respect to such Contribution; (ii) if your Contribution includes any patient data, all such data is de-identified in accordance with U.S. confidentiality and security laws and requirements, including but not limited to the Health Insurance Portability and Accountability Act (HIPAA) and its regulations, and your disclosure of such data for the purposes contemplated by this Agreement is properly authorized and in compliance with all applicable laws and regulations; and (iii) you have preserved in the Contribution all applicable attributions, copyright notices and licenses for any third party software or data included in the Contribution. 3. Except for the licenses granted in this Agreement, you reserve all right, title and interest in your Contribution. 4. You hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no charge, royalty-free, irrevocable license to use, reproduce, make derivative works of, display and distribute the Contribution. If your Contribution is protected by patent, you hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable license under your interest in patent rights covering the Contribution, to make, have made, use, sell and otherwise transfer your Contribution, alone or in combination with any other code. 5. You acknowledge and agree that SBIA may incorporate your Contribution into the Software and may make the Software available to members of the public on an open source basis under terms substantially in accordance with the Software License set forth in Part B of this Agreement. You further acknowledge and agree that SBIA shall have no liability arising in connection with claims resulting from your breach of any of the terms of this Agreement. 6. YOU WARRANT THAT TO THE BEST OF YOUR KNOWLEDGE YOUR CONTRIBUTION DOES NOT CONTAIN ANY CODE THAT REQUIRES OR PRESCRIBES AN "OPEN SOURCE LICENSE" FOR DERIVATIVE WORKS (by way of non-limiting example, the GNU General Public License or other so-called "reciprocal" license that requires any derived work to be licensed under the GNU General Public License or other "open source license"). PART B. DOWNLOADING AGREEMENT - LICENSE FROM SBIA WITH RIGHT TO SUBLICENSE ("SOFTWARE LICENSE"). ------------------------------------------------------------------------------------------------ 1. As used in this Software License, "you" means the individual downloading and/or using, reproducing, modifying, displaying and/or distributing the Software and the institution or entity which employs or is otherwise affiliated with such individual in connection therewith. The Section of Biomedical Image Analysis, Department of Radiology at the Universiy of Pennsylvania ("SBIA") hereby grants you, with right to sublicense, with respect to SBIA's rights in the software, and data, if any, which is the subject of this Software License (collectively, the "Software"), a royalty-free, non-exclusive license to use, reproduce, make derivative works of, display and distribute the Software, provided that: (a) you accept and adhere to all of the terms and conditions of this Software License; (b) in connection with any copy of or sublicense of all or any portion of the Software, all of the terms and conditions in this Software License shall appear in and shall apply to such copy and such sublicense, including without limitation all source and executable forms and on any user documentation, prefaced with the following words: "All or portions of this licensed product (such portions are the "Software") have been obtained under license from the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania and are subject to the following terms and conditions:" (c) you preserve and maintain all applicable attributions, copyright notices and licenses included in or applicable to the Software; (d) modified versions of the Software must be clearly identified and marked as such, and must not be misrepresented as being the original Software; and (e) you consider making, but are under no obligation to make, the source code of any of your modifications to the Software freely available to others on an open source basis. 2. The license granted in this Software License includes without limitation the right to (i) incorporate the Software into proprietary programs (subject to any restrictions applicable to such programs), (ii) add your own copyright statement to your modifications of the Software, and (iii) provide additional or different license terms and conditions in your sublicenses of modifications of the Software; provided that in each case your use, reproduction or distribution of such modifications otherwise complies with the conditions stated in this Software License. 3. This Software License does not grant any rights with respect to third party software, except those rights that SBIA has been authorized by a third party to grant to you, and accordingly you are solely responsible for (i) obtaining any permissions from third parties that you need to use, reproduce, make derivative works of, display and distribute the Software, and (ii) informing your sublicensees, including without limitation your end-users, of their obligations to secure any such required permissions. 4. The Software has been designed for research purposes only and has not been reviewed or approved by the Food and Drug Administration or by any other agency. YOU ACKNOWLEDGE AND AGREE THAT CLINICAL APPLICATIONS ARE NEITHER RECOMMENDED NOR ADVISED. Any commercialization of the Software is at the sole risk of the party or parties engaged in such commercialization. You further agree to use, reproduce, make derivative works of, display and distribute the Software in compliance with all applicable governmental laws, regulations and orders, including without limitation those relating to export and import control. 5. The Software is provided "AS IS" and neither SBIA nor any contributor to the software (each a "Contributor") shall have any obligation to provide maintenance, support, updates, enhancements or modifications thereto. SBIA AND ALL CONTRIBUTORS SPECIFICALLY DISCLAIM ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL SBIA OR ANY CONTRIBUTOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED TO THE SOFTWARE, EVEN IF SBIA OR ANY CONTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. TO THE MAXIMUM EXTENT NOT PROHIBITED BY LAW OR REGULATION, YOU FURTHER ASSUME ALL LIABILITY FOR YOUR USE, REPRODUCTION, MAKING OF DERIVATIVE WORKS, DISPLAY, LICENSE OR DISTRIBUTION OF THE SOFTWARE AND AGREE TO INDEMNIFY AND HOLD HARMLESS SBIA AND ALL CONTRIBUTORS FROM AND AGAINST ANY AND ALL CLAIMS, SUITS, ACTIONS, DEMANDS AND JUDGMENTS ARISING THEREFROM. 6. None of the names, logos or trademarks of SBIA or any of SBIA's affiliates or any of the Contributors, or any funding agency, may be used to endorse or promote products produced in whole or in part by operation of the Software or derived from or based on the Software without specific prior written permission from the applicable party. 7. Any use, reproduction or distribution of the Software which is not in accordance with this Software License shall automatically revoke all rights granted to you under this Software License and render Paragraphs 1 and 2 of this Software License null and void. 8. This Software License does not grant any rights in or to any intellectual property owned by SBIA or any Contributor except those rights expressly granted hereunder. PART C. MISCELLANEOUS --------------------- This Agreement shall be governed by and construed in accordance with the laws of The Commonwealth of Pennsylvania without regard to principles of conflicts of law. This Agreement shall supercede and replace any license terms that you may have agreed to previously with respect to Software from SBIA. E-O-LICENSE exit } ##################################################################### # Parse the command-line while [ "X$1" != "X" ] do case $1 in -L) license ;; -t) TESTING="yes" shift ;; -d) # Date stamp given...only valid in testing mode shift # Convert the user-supplied date to the YYYY_Mo_DD_HH:MM form, # throwing away the seconds UserDATE="$1" now=`date --date "$1" '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` shift ;; -c) shift CONF=$1 if [ ! -f $CONF ] ; then usage "Specified configuration file ($CONF) not found" fi shift ;; -f) shift filesys=$1 shift ;; *) usage "Unrecognized option: \"$1\"" ;; esac done ############## End of command line parsing LOCKFILE=/var/run/snapshotter.$filesys if [ -f $LOCKFILE ] ; then PIDs=`cat $LOCKFILE | tr "\012" " "` echo "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." 1>&2 $LOGGER "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." exit 1 else echo $$ > $LOCKFILE if [ $? != 0 ] ; then echo "Could not create lockfile $LOCKFILE for process $$. Exiting." 1>&2 $LOGGER "Could not create lockfile $LOCKFILE for process $$" exit 2 fi fi ######## Check sanity of user-supplied values if [ "X$filesys" = "X" ] ; then $LOGGER "Filesystem must be specified" usage "Filesystem must be specified" fi if [ $TESTING = "yes" ] ; then # testing mode: # accept faux filesystem argument # accept faux datestamp as arguments # read faux results from mmlssnapshot on STDIN # MMDF # # Do not really use mmdf executable, so that the testing can be # done outside a GPFS cluster Use a 2-digit random number 00 .. 99 # from $RANDOM, but fill the variable with dummy fields so the # the random number corresponds to field5, where it would be in # the mmdf output. MMDF="eval echo \(total\) f1 f2 f3 f4 \(\${RANDOM: -2:2}%\) " MMCRSNAPSHOT="echo mmcrsnapshot" MMDELSNAPSHOT="echo mmdelsnapshot" MMLSSNAPDATA=`cat - | tr "\012" "%"` MMLSSNAPSHOT="eval echo \$MMLSSNAPDATA|tr '%' '\012'" LOGGER="echo Log message: " else if [ "X$UserDATE" != "X" ] ; then $LOGGER "Option \"-d\" only valid in testing mode" usage "Option \"-d\" only valid in testing mode" fi /usr/lpp/mmfs/bin/mmlsfs $filesys -T 1> /dev/null 2>&1 if [ $? != 0 ] ; then $LOGGER "Error accessing GPFS filesystem: $filesys" echo "Error accessing GPFS filesystem: $filesys" 1>&2 rm -f $LOCKFILE exit 1 fi # Check if the node where this script is running is the GPFS manager node for the # specified filesystem manager=`/usr/lpp/mmfs/bin/mmlsmgr $filesys | grep -w "^$filesys" |awk '{print $2}'` ip addr list | grep -qw "$manager" if [ $? != 0 ] ; then # This node is not the manager...exit rm -f $LOCKFILE exit fi MMLSSNAPSHOT="$MMLSSNAPSHOT $filesys" fi # It is valid for the default config file not to exist, so check if # is there before sourcing it if [ -f $CONF ] ; then . $CONF $filesys # load variables found in $CONF, based on $filesys fi # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # Produce list of valid snapshot names (w/o header lines) snapnames=`$MMLSSNAPSHOT |grep Valid |sed -e '$d' -e 's/ .*//'` # get the number of existing snapshots snapcount=($snapnames) ; snapcount=${#snapcount[*]} ########################################################### # given a list of old snapshot names, in the form: # YYYY_Mo_DD_HH:MM # fill the buckets by time. A snapshot can only go # into one bucket! ########################################################### for oldsnap in $snapnames do oldstamp=`echo $oldsnap|sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` oldsecs=`date --date "$oldstamp" "+%s"` diff=$((nowsecs - oldsecs)) # difference in seconds between 'now' and old snapshot if [ $diff -lt 0 ] ; then # this can happen during testing...we have got a faux # snapshot date in the future...skip it continue fi index=0 prevbucket=0 filled=No while [ $index -lt ${#intervals[*]} -a $filled != "Yes" ] do bucket=${intervals[$index]} # ceiling for number of hours for this bucket (1 more than the number of # actual hours, ie., "7" means that the bucket can contain snapshots that are # at least 6:59 (hh:mm) old. count=${counts[$index]} # max number of items in this bucket bucketinterval=$(( bucket * ( secsINhr / count ) )) # Number of hours (in seconds) between snapshots that should be retained # for this bucket...convert from hrs (bucket/count) to seconds in order to deal with :15 minute intervals # Force the mathematical precedence to do (secsINhr / count) so that cases where count>bucket (like the first 1hr # that may have a count of 4 retained snapshots) doesn't result in the shell throwing away the fraction if [ $diff -ge $((prevbucket * secsINhr)) -a $diff -lt $((bucket * secsINhr)) ] ; then # We found the correct bucket filled=Yes ## printf "Checking if $oldsnap should be retained if it is multiple of $bucketinterval [ ($oldsecs %% $bucketinterval) = 0]" # Does the snapshot being examined fall on the interval determined above for the snapshots that should be retained? if [ $(( oldsecs % bucketinterval )) = 0 ] ; then # The hour of the old snapshot is evenly divisible by the number of snapshots that should be # retained in this interval...keep it tokeep="$tokeep $oldsnap" ## printf "...yes\n" else todelete="$todelete $oldsnap" ## printf "...no\n" fi prevbucket=$bucket fi index=$((index + 1)) done if [ $diff -ge $((bucket * secsINhr )) ] ; then filled=Yes # This is too old...schedule it for deletion $LOGGER "Scheduling old snapshot $oldsnap from $filesys for deletion" todelete="$todelete $oldsnap" fi # We should not get here if [ $filled != Yes ] ; then $LOGGER "Snapshot \"$oldsnap\" on $filesys does not match any intervals" fi done # Sort the lists to make reading the testing output easier todelete=`echo $todelete | tr " " "\012" | sort -bdfu` tokeep=`echo $tokeep | tr " " "\012" | sort -bdfu` ############################################################# for oldsnap in $todelete do if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $oldsnap # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate ## echo "Removing entry for \"$oldsnap\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$oldsnap [^%]*%/%/"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $oldsnap 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $oldsnap\": $output" 1>&2 $LOGGER "Error removing snapshot of $filesys with label \"$oldsnap\": $output" rm -f $LOCKFILE exit 1 else $LOGGER "successfully removed snapshot of $filesys with label \"$oldsnap\"" fi done ############# Now check for free space ####################################### # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # get the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` while [ $freenow -le $MINFREE -a $snapcount -gt 0 ] do # must delete some snapshots, from the oldest first... todelete=`$MMLSSNAPSHOT|grep Valid |sed -n -e 's/ .*//' -e '1p'` if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $todelete # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate and from $tokeep ## echo "Removing entry for \"$todelete\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$todelete [^%]*%/%/"` tokeep=`echo $tokeep | sed -e "s/^$todelete //" -e "s/ $todelete / /" -e "s/ $todelete$//" -e "s/^$todelete$//"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $todelete 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $todelete\": $output" 1>&2 $LOGGER "Low disk space (${freenow}%) triggered attempt to remove snapshot of $filesys with label \"$todelete\" -- Error: $output" rm -f $LOCKFILE exit 1 else $LOGGER "removed snapshot \"$todelete\" from $filesys because ${freenow}% free disk is less than ${MINFREE}%" fi # update the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` done if [ $snapcount = 0 -a $freenow -ge $MINFREE ] ; then echo "All existing snapshots removed on $filesys, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" 1>&2 $LOGGER "All existing snapshots on $filesys removed, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" rm -f $LOCKFILE exit 1 fi $LOGGER "Free disk space on $filesys (${freenow}%) above minimum required (${MINFREE}%) to create new snapshot" ############################################################## if [ $TESTING = "yes" ] ; then # List snapshots being kept for oldsnap in $tokeep do echo "Keeping snapshot $oldsnap" done fi ############################################################# # Now create the current snapshot...do this after deleting snaps in order to reduce the chance of running # out of disk space results=`$MMCRSNAPSHOT $filesys $now 2>&1 | tr "\012" "%"` if [ $? != 0 ] ; then printf "Error from \"$MMCRSNAPSHOT $filesys $now\":\n\t" 1>&2 echo $results | tr '%' '\012' 1>&2 results=`echo $results | tr '%' '\012'` $LOGGER "Error creating snapshot of $filesys with label $now: \"$results\"" rm -f $LOCKFILE exit 1 else $LOGGER "successfully created snapshot of $filesys with label $now" fi rm -f $LOCKFILE From Jez.Tucker at rushes.co.uk Thu Feb 7 12:28:16 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 7 Feb 2013 12:28:16 +0000 Subject: [gpfsug-discuss] SOBAR Message-ID: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Hey all Is anyone using the SOBAR method of backing up the metadata and NSD configs? If so, how is your experience? >From reading the docs, it seems a bit odd that on restoration you have to re-init the FS and recall all the data. If so, what's the point of SOBAR? --- Jez Tucker Senior Sysadmin Rushes http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Thu Feb 7 12:47:30 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 12:47:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <5113A262.8080206@ed.ac.uk> On 07/02/13 12:28, Jez Tucker wrote: > Hey all > > Is anyone using the SOBAR method of backing up the metadata and NSD > configs? > > If so, how is your experience? > > From reading the docs, it seems a bit odd that on restoration you have > to re-init the FS and recall all the data. > > If so, what?s the point of SOBAR? Ooh - this is new. From first glance, it looks to be a DR solution? We're actually in the process of engineering our own DR solution based on a not-dissimilar concept: - build a second GPFS file system off-site, with HSM enabled (called "dr-fs" here) - each night, rsync the changed data from "prod-fs" to "dr-fs" - each day, migrate data from the disk pool in "dr-fs" to the tape pool to free up sufficient capacity for the next night's rsync You have a complete copy of the filesystem metadata from "prod-fs" on "dr-fs", so it looks (to a user) identical, but on "dr-fs" some of the ("older") data is on tape (ratios dependent on sizing of disk vs tape pools, of course). In the event of a disaster, you just flip over to "dr-fs". From the quick glance at SOBAR, it looks to me like the concept is that you don't have a separate file system, but you hold a secondary copy in TSM via the premigrate function, and store the filesystem metadata as a flat file dump backed up "in the normal way". In DR, you rebuild the FS from the metadata backup, and re-attach the HSM pool to this newly-restored filesystem, (and then start pushing the data back out of the HSM pool into the GPFS disk pool). As soon as the HSM pool is re-attached, users can start getting their data (as fast as TSM can give it to them), and the filesystem will look "normal" to them (albeit slow, if recalling from tape). Nice - good to see this kind of thing coming from IBM - restore of huge filesystems from traditional backup really doesn't make much sense nowadays - it'd just take too long. This kind of approach doesn't necessarily accelerate the overall time to restore, but it allows for a usable filesystem to be made available while the restore happens in the background. I'd look for clarity about the state of the filesystem on restore - particularly around what happens to data which arrives after the migration has happened but before the metadata snapshot is taken. I think it'd be lost, but the metadata would still point to it existing? Might get confusing... Just my 2 cents from a quick skim read mind - plus a whole bunch of thinking we've done on this subject recently :) -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Thu Feb 7 13:40:30 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 07 Feb 2013 13:40:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113A262.8080206@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> Message-ID: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: [SNIP] > Nice - good to see this kind of thing coming from IBM - restore of huge > filesystems from traditional backup really doesn't make much sense > nowadays - it'd just take too long. Define too long? It's perfectly doable, and the speed of the restore will depend on what resources you have to throw at the problem. The main issue is having lots of tape drives for the restore. Having a plan to buy more ASAP is a good idea. The second is don't let yourself get sidetracked doing "high priority" restores for individuals, it will radically delay the restore. Beyond that you need some way to recreate all your storage pools, filesets, junction points and quotas etc. Looks like the mmbackupconfig and mmrestoreconfig now take care of all that for you. That is a big time saver right there. > This kind of approach doesn't > necessarily accelerate the overall time to restore, but it allows for a > usable filesystem to be made available while the restore happens in the > background. > The problem is that your tape drives will go crazy with HSM activity. So while in theory it is usable it practice it won't be. Worse with the tape drives going crazy with the HSM they won't be available for restore. I would predict much much long times to recovery where recovery is defined as being back to where you where before the disaster occurred. > > I'd look for clarity about the state of the filesystem on restore - > particularly around what happens to data which arrives after the > migration has happened but before the metadata snapshot is taken. I > think it'd be lost, but the metadata would still point to it existing? I would imagine that you just do a standard HSM reconciliation to fix that. Should be really fast with the new policy based reconciliation after you spend several months backing all your HSM'ed files up again :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From orlando.richards at ed.ac.uk Thu Feb 7 13:51:25 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:51:25 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> Message-ID: <5113B15D.4080805@ed.ac.uk> On 07/02/13 13:40, Jonathan Buzzard wrote: > On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > > [SNIP] > >> Nice - good to see this kind of thing coming from IBM - restore of huge >> filesystems from traditional backup really doesn't make much sense >> nowadays - it'd just take too long. > > Define too long? It's perfectly doable, and the speed of the restore > will depend on what resources you have to throw at the problem. The main > issue is having lots of tape drives for the restore. I can tell you speak from (bitter?) experience :) I've always been "disappointed" with the speed of restores - but I've never tried a "restore everything", which presumably will run quicker. One problem I can see us having is that we have lots of small files, which tends to make everything go really slowly - but getting the thread count up would, I'm sure, help a lot. > Having a plan to > buy more ASAP is a good idea. The second is don't let yourself get > sidetracked doing "high priority" restores for individuals, it will > radically delay the restore. Quite. > Beyond that you need some way to recreate all your storage pools, > filesets, junction points and quotas etc. Looks like the mmbackupconfig > and mmrestoreconfig now take care of all that for you. That is a big > time saver right there. > >> This kind of approach doesn't >> necessarily accelerate the overall time to restore, but it allows for a >> usable filesystem to be made available while the restore happens in the >> background. >> > > The problem is that your tape drives will go crazy with HSM activity. So > while in theory it is usable it practice it won't be. Worse with the > tape drives going crazy with the HSM they won't be available for > restore. I would predict much much long times to recovery where recovery > is defined as being back to where you where before the disaster > occurred. Yup - I can see that too. I think a large disk pool would help there, along with some kind of logic around "what data is old?" to sensibly place stuff "likely to be accessed" on disk, and the "old" stuff on tape where it can be recalled at a more leisurely pace. >> >> I'd look for clarity about the state of the filesystem on restore - >> particularly around what happens to data which arrives after the >> migration has happened but before the metadata snapshot is taken. I >> think it'd be lost, but the metadata would still point to it existing? > > I would imagine that you just do a standard HSM reconciliation to fix > that. Should be really fast with the new policy based reconciliation > after you spend several months backing all your HSM'ed files up > again :-) > Ahh - but once you've got them in TSM, you can just do a storage pool backup, presumably to a third site, and always have lots of copies everywhere! Of course - you still need to keep generational history somewhere... -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Thu Feb 7 13:56:05 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:56:05 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B15D.4080805@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> Message-ID: <5113B275.7030401@ed.ac.uk> On 07/02/13 13:51, Orlando Richards wrote: > On 07/02/13 13:40, Jonathan Buzzard wrote: >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: >> >> [SNIP] >> >>> Nice - good to see this kind of thing coming from IBM - restore of huge >>> filesystems from traditional backup really doesn't make much sense >>> nowadays - it'd just take too long. >> >> Define too long? Oh - for us, this is rapidly approaching "anything more than a day, and can you do it faster than that please". Not much appetite for the costs of full replication though. :/ -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Fri Feb 8 09:40:27 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 08 Feb 2013 09:40:27 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B275.7030401@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> <5113B275.7030401@ed.ac.uk> Message-ID: <1360316427.16393.23.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 13:56 +0000, Orlando Richards wrote: > On 07/02/13 13:51, Orlando Richards wrote: > > On 07/02/13 13:40, Jonathan Buzzard wrote: > >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > >> > >> [SNIP] > >> > >>> Nice - good to see this kind of thing coming from IBM - restore of huge > >>> filesystems from traditional backup really doesn't make much sense > >>> nowadays - it'd just take too long. > >> > >> Define too long? > > I can tell you speak from (bitter?) experience :) Done two large GPFS restores. The first was to migrate a HSM file system to completely new hardware, new TSM version and new GPFS version. IBM would not warrant an upgrade procedure so we "restored" from tape onto the new hardware and then did rsync's to get it "identical". Big problem was the TSM server hardware at the time (a p630) just gave up the ghost about 5TB into the restore repeatedly. Had do it a user at a time which made it take *much* longer as I was repeatedly going over the same tapes. The second was from bitter experience. Someone else in a moment of complete and utter stupidity wiped some ~30 NSD's of their descriptors. Two file systems an instant and complete loss. Well not strictly true it was several days before it manifested itself when one of the NSD servers was rebooted. A day was then wasted working out what the hell had happened to the file system that could have gone to the restore. Took about three weeks to get back completely. Could have been done a lot lot faster if I had had more tape drives on day one and/or made a better job of getting more in, had not messed about prioritizing restores of particular individuals, and not had capacity issues on the TSM server to boot (it was scheduled for upgrade anyway and a CPU failed mid restore). I think TSM 6.x would have been faster as well as it has faster DB performance, and the restore consisted of some 50 million files in about 30TB and it was the number of files that was the killer for speed. It would be nice in a disaster scenario if TSM would also use the tapes in the copy pools for restore, especially when they are in a different library. Not sure if the automatic failover procedure in 6.3 does that. For large file systems I would seriously consider using virtual mount points in TSM and then collocating the file systems. I would also look to match my virtual mount points to file sets. The basic problem is that most people don't have the spare hardware to even try disaster recovery, and even then you are not going to be doing it under the same pressure, hindsight is always 20/20. > Oh - for us, this is rapidly approaching "anything more than a day, and > can you do it faster than that please". Not much appetite for the costs > of full replication though. > Remember you can have any two of cheap, fast and reliable. If you want it back in a day or less then that almost certainly requires a full mirror and is going to be expensive. Noting of course if it ain't offline it ain't backed up. See above if some numpty can wipe the NSD descriptors on your file systems then can do it to your replicated file system at the same time. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jez.Tucker at rushes.co.uk Fri Feb 8 13:17:03 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 8 Feb 2013 13:17:03 +0000 Subject: [gpfsug-discuss] Maximum number of files in a TSM dsmc archive filelist Message-ID: <39571EA9316BE44899D59C7A640C13F5306E9570@WARVWEXC1.uk.deluxe-eu.com> Allo I'm doing an archive with 1954846 files in a filelist. SEGV every time. (BA 6.4.0-0) Am I being optimistic with that number of files? Has anyone successfully done that many in a single archive? --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:29:12 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:29:12 -0600 Subject: [gpfsug-discuss] File system recovery question Message-ID: <20130213162912.GA22701@logos.ncsa.illinois.edu> I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. Executing od shows it is zero's. (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0040000 120070 156006 120070 156006 120070 156006 120070 156006 Dumping the header of one of the other disks shows read data for the other NSD's in that file system. (! 25)-> mmlsnsd -m | grep dh1_vd05_005 Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node (! 27)-> mmnsddiscover -d dh1_vd05_005 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... myhost: Rediscovery failed for dh1_vd05_005. mmnsddiscover: Finished. Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic Is there any way to repair the header on the NSD? Thanks for any ideas! Chad From Jez.Tucker at rushes.co.uk Wed Feb 13 16:43:50 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Wed, 13 Feb 2013 16:43:50 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> So, er. Fun. I checked our disks. 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0001000 Looks like you lost a fair bit. Presumably you don't have replication of 2? If so, I think you could just lose the NSD. Failing that: 1) Check your other disks and see if there's anything that you can figure out. Though TBH, this may take forever. 2) Restore 3) Call IBM and log a SEV 1. 3) then 2) is probably the best course of action Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Chad Kerner > Sent: 13 February 2013 16:29 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] File system recovery question > > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. I > can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for > the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a > while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > mmcommon preunmount invoked. File system: desarchive Reason: > SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From craigawilson at gmail.com Wed Feb 13 16:48:32 2013 From: craigawilson at gmail.com (Craig Wilson) Date: Wed, 13 Feb 2013 16:48:32 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> Message-ID: Dealt with a similar issue a couple of months ago. In that case the data was fine but two of the descriptors were over written. You can use "mmfsadm test readdescraw /dev/$drive" to see the descriptors, we managed to recover the disk but only after logging it to IBM and manually rebuilding the descriptor. -CW On 13 February 2013 16:43, Jez Tucker wrote: > So, er. Fun. > > I checked our disks. > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0001000 > > Looks like you lost a fair bit. > > > Presumably you don't have replication of 2? > If so, I think you could just lose the NSD. > > Failing that: > > 1) Check your other disks and see if there's anything that you can figure > out. Though TBH, this may take forever. > 2) Restore > 3) Call IBM and log a SEV 1. > > 3) then 2) is probably the best course of action > > Jez > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Chad Kerner > > Sent: 13 February 2013 16:29 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] File system recovery question > > > > I have a file system, and it appears that someone dd'd over the first > > part of one of the NSD's with zero's. I see the device in multipath. I > > can fdisk and dd the device out. > > > > Executing od shows it is zero's. > > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > > * > > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > > > Dumping the header of one of the other disks shows read data for > > the other NSD's in that file system. > > > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > > Disk name NSD volume ID Device Node name > > Remarks > > > --------------------------------------------------------------------------------------- > > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > > node > > > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > > mmnsddiscover: Attempting to rediscover the disks. This may take a > > while ... > > myhost: Rediscovery failed for dh1_vd05_005. > > mmnsddiscover: Finished. > > > > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > > volume dh1_vd05_005. > > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > > desarchive. > > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > > mmcommon preunmount invoked. File system: desarchive Reason: > > SGPanic > > > > Is there any way to repair the header on the NSD? > > > > Thanks for any ideas! > > Chad > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Feb 13 16:48:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:48:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> So what do you get if you run: mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 ? Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:29, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:52:30 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:52:30 -0600 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> Message-ID: <20130213165230.GA23294@logos.ncsa.illinois.edu> (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: > So what do you get if you run: > > mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > > ? > > > > > Vic Cornell > viccornell at gmail.com > > > On 13 Feb 2013, at 16:29, Chad Kerner wrote: > > > I have a file system, and it appears that someone dd'd over the first part > of one of the NSD's with zero's. I see the device in multipath. I can > fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other > NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. > Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the > system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: > desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > From viccornell at gmail.com Wed Feb 13 16:57:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:57:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213165230.GA23294@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> <20130213165230.GA23294@logos.ncsa.illinois.edu> Message-ID: <4D043736-06A7-44A0-830E-63D66438595F@gmail.com> Thats not pretty - but you can push the NSD descriptor on with something like: tspreparedisk -F -n /dev/mapper/dh1_vd05_005 -p 8D8EEA98506C69CE That leaves you with the FS and Disk descriptors to recover . . . . Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:52, Chad Kerner wrote: > > > (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 > No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 > No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 > > > > On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: >> So what do you get if you run: >> >> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 >> >> ? >> >> >> >> >> Vic Cornell >> viccornell at gmail.com >> >> >> On 13 Feb 2013, at 16:29, Chad Kerner wrote: >> >> >> I have a file system, and it appears that someone dd'd over the first part >> of one of the NSD's with zero's. I see the device in multipath. I can >> fdisk and dd the device out. >> >> Executing od shows it is zero's. >> (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 >> 0000000 000000 000000 000000 000000 000000 000000 000000 000000 >> * >> 0040000 120070 156006 120070 156006 120070 156006 120070 156006 >> >> Dumping the header of one of the other disks shows read data for the other >> NSD's in that file system. >> >> (! 25)-> mmlsnsd -m | grep dh1_vd05_005 >> Disk name NSD volume ID Device Node name >> Remarks >> --------------------------------------------------------------------------------------- >> dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server >> node >> >> (! 27)-> mmnsddiscover -d dh1_vd05_005 >> mmnsddiscover: Attempting to rediscover the disks. This may take a while >> ... >> myhost: Rediscovery failed for dh1_vd05_005. >> mmnsddiscover: Finished. >> >> >> Wed Feb 13 09:14:03.694 2013: Command: mount desarchive >> Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. >> Physical volume dh1_vd05_005. >> Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the >> system with return code 5 reason code 0 >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive >> Wed Feb 13 09:14:07.104 2013: Input/output error >> Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: >> desarchive Reason: SGPanic >> >> Is there any way to repair the header on the NSD? >> >> Thanks for any ideas! >> Chad >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> From jonathan at buzzard.me.uk Wed Feb 13 17:00:31 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 13 Feb 2013 17:00:31 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <1360774831.23342.9.camel@buzzard.phy.strath.ac.uk> On Wed, 2013-02-13 at 10:29 -0600, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. > I can fdisk and dd the device out. Log a SEV1 call with IBM. If it is only one NSD that is stuffed they might be able to get it back for you. However it is a custom procedure that requires developer time from Poughkeepsie. It will take some time. In the meantime I would strongly encourage you to start preparing for a total restore, which will include recreating the file system from scratch. Certainly if all the NSD headers are stuffed then the file system is a total loss. However even with only one lost it is not as I understand it certain you can get it back. It is probably a good idea to store the NSD headers somewhere off the file system in case some numpty wipes them. The most likely reason for this is that they ran an distro install on a system that has direct access to the disk. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Tobias.Kuebler at sva.de Wed Feb 13 17:00:37 2013 From: Tobias.Kuebler at sva.de (Tobias.Kuebler at sva.de) Date: Wed, 13 Feb 2013 18:00:37 +0100 Subject: [gpfsug-discuss] =?iso-8859-1?q?AUTO=3A_Tobias_Kuebler_ist_au=DFe?= =?iso-8859-1?q?r_Haus_=28R=FCckkehr_am_02/18/2013=29?= Message-ID: Ich bin bis 02/18/2013 abwesend. Vielen Dank f?r Ihre Nachricht. Ankommende E-Mails werden w?hrend meiner Abwesenheit nicht weitergeleitet, ich versuche Sie jedoch m?glichst rasch nach meiner R?ckkehr zu beantworten. In dringenden F?llen wenden Sie sich bitte an Ihren zust?ndigen Vertriebsbeauftragten. Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re: [gpfsug-discuss] File system recovery question" gesendet am 13.02.2013 17:43:50. Diese ist die einzige Benachrichtigung, die Sie empfangen werden, w?hrend diese Person abwesend ist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Thu Feb 28 17:25:26 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 28 Feb 2013 17:25:26 +0000 Subject: [gpfsug-discuss] Who uses TSM to archive HSMd data (inline) ? Message-ID: <39571EA9316BE44899D59C7A640C13F5306EED70@WARVWEXC1.uk.deluxe-eu.com> Hello all, I have to ask Does anyone else do this? We have a problem and I'm told that "it's so rare that anyone would archive data which is HSMd". I.E. to create an archive whereby a project is entirely or partially HSMd to LTO - online data is archived to tape - offline data is copied from HSM tape to archive tape 'inline' Surely nobody pulls back all their data to disk before re-archiving back to tape? --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuartb at 4gh.net Wed Feb 6 18:38:56 2013 From: stuartb at 4gh.net (Stuart Barkley) Date: Wed, 6 Feb 2013 13:38:56 -0500 (EST) Subject: [gpfsug-discuss] GPFS snapshot cron job Message-ID: I'm new on this list. It looks like it can be useful for exchanging GPFS experiences. We have been running GPFS for a couple of years now on one cluster and are in process of bringing it up on a couple of other clusters. One thing we would like, but have not had time to do is automatic snapshots similar to what NetApp does. For our purposes a cron job that ran every 4 hours that creates a new snapshot and removes older snapshots would be sufficient. The slightly hard task is correctly removing the older snapshots. Does anyone have such a cron script they can share? Or did I miss something in GPFS that handles automatic snapshots? Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From pete at realisestudio.com Wed Feb 6 19:28:40 2013 From: pete at realisestudio.com (Pete Smith) Date: Wed, 6 Feb 2013 19:28:40 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: Hi rsnapshot is probably what you're looking for. :-) On 6 Feb 2013 18:39, "Stuart Barkley" wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Feb 6 19:40:49 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 06 Feb 2013 19:40:49 +0000 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: <5112B1C1.3080403@buzzard.me.uk> On 06/02/13 18:38, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > Find attached a Perl script that does just what you want with a range of configurable parameters. It is intended to create snapshots that work with the Samba VFS module shadow_copy2 so that you can have a previous versions facility on your Windows boxes. Note it creates a "quiescent" lock that interacted with another script that was called to do a policy based tiering from fast disks to slow disks. That gets called based on a trigger for a percentage of the fast disk pool being full, and consequently can get called at any time. If the tiering is running then trying to take a snapshot at the same time will lead to race conditions and the file system will deadlock. Note that if you are creating snapshots in the background then a whole range of GPFS commands if run at the moment the snapshot is being created or deleted will lead to deadlocks. > Or did I miss something in GPFS that handles automatic snapshots? Yeah what you missed is that it will randomly lock your file system up. So while the script I have attached is all singing and all dancing. It has never stayed in production for very long. On a test file system that has little activity it runs for months without a hitch. When rolled out on busy file systems with in a few days we would a deadlock waiting for some file system quiescent state and everything would grind to a shuddering halt. Sometimes on creating the snapshot and sometimes on deleting them. Unless there has been a radical change in GPFS in the last few months, you cannot realistically do what you want. IBM's response was that you should not be taking snapshots or deleting old ones while the file system is "busy". Not that I would have thought the file system would have been that "busy" at 07:00 on a Saturday morning, but hey. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. -------------- next part -------------- A non-text attachment was scrubbed... Name: shadowcopy.pl Type: application/x-perl Size: 5787 bytes Desc: not available URL: From erich at uw.edu Wed Feb 6 19:45:05 2013 From: erich at uw.edu (Eric Horst) Date: Wed, 6 Feb 2013 11:45:05 -0800 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: References: Message-ID: It's easy if you use a chronologically sortable naming scheme. We use YYYY-MM-DD-hhmmss. This is a modified excerpt from the bash script I use. The prune function takes an arg of the number of snapshots to keep. SNAPROOT=/grfs/ud00/.snapshots function prune () { PCPY=$1 for s in $(/bin/ls -d "$SNAPROOT"/????-??-??-?????? | head --lines=-$PCPY); do mmdelsnapshot $FSNAME $s if [ $? != 0 ]; then echo ERROR: there was a mmdelsnapshot problem $? exit else echo Success fi done } echo Pruning snapshots prune 12 -Eric On Wed, Feb 6, 2013 at 10:38 AM, Stuart Barkley wrote: > I'm new on this list. It looks like it can be useful for exchanging > GPFS experiences. > > We have been running GPFS for a couple of years now on one cluster and > are in process of bringing it up on a couple of other clusters. > > One thing we would like, but have not had time to do is automatic > snapshots similar to what NetApp does. For our purposes a cron job > that ran every 4 hours that creates a new snapshot and removes older > snapshots would be sufficient. The slightly hard task is correctly > removing the older snapshots. > > Does anyone have such a cron script they can share? > > Or did I miss something in GPFS that handles automatic snapshots? > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bergman at panix.com Wed Feb 6 21:28:30 2013 From: bergman at panix.com (bergman at panix.com) Date: Wed, 06 Feb 2013 16:28:30 -0500 Subject: [gpfsug-discuss] GPFS snapshot cron job In-Reply-To: Your message of "Wed, 06 Feb 2013 13:38:56 EST." References: Message-ID: <20647.1360186110@localhost> In the message dated: Wed, 06 Feb 2013 13:38:56 -0500, The pithy ruminations from Stuart Barkley on <[gpfsug-discuss] GPFS snapshot cron job> were: => I'm new on this list. It looks like it can be useful for exchanging => GPFS experiences. => => We have been running GPFS for a couple of years now on one cluster and => are in process of bringing it up on a couple of other clusters. => => One thing we would like, but have not had time to do is automatic => snapshots similar to what NetApp does. For our purposes a cron job => that ran every 4 hours that creates a new snapshot and removes older => snapshots would be sufficient. The slightly hard task is correctly => removing the older snapshots. => => Does anyone have such a cron script they can share? Yes. I've attached the script that we run from cron. Our goal was to keep a decaying set of snapshots over a fairly long time period, so that users would be able to recover from "rm", while not using excessive space. Snapshots are named with timestamp, making it slightly easier to understand what data they contain and the remove the older ones. The cron job runs every 15 minutes on every GPFS server node, but checks if it is executing on the node that is the manager for the specified filesystem to avoid concurrency issues. The script will avoid making a snapshot if there isn't sufficient disk space. Our config file to manage snapshots is: ------------ CUT HERE -- CUT HERE -------------- case $1 in home) intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval MINFREE=5 # minimum free disk space, in percent ;; shared) intervals=(1 4 48) # hour number of each interval counts=(4 2 2) # max number of snapshots to keep per each interval MINFREE=20 # minimum free disk space, in percent ;; esac ------------ CUT HERE -- CUT HERE -------------- For the "home" filesystem, this says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 4 snapshots made in the most recent 4 hr interval (1 for each hour) keep 4 snapshots made in the most recent 24 hr interval (1 each 6hrs) keep 4 snapshots made in the most recent 48 hr interval (1 each 12 hrs) For the "shared" filesystem, the configuration says: keep 4 snapshots in the most recent hourly interval (every 15 minutes) keep 2 snapshots made in the most recent 4 hr interval (1 each 2 hours) keep 2 snapshots made in the most recent 48 hr interval (1 each 24 hrs) Those intervals "overlap", so there are a lot of recent snapshots, and fewer older ones. Each time a snapshot is made, older snapshots may be removed. So, at 5:01 PM on Thursday, there may be snapshots of the "home" filesystem from: 17:00 Thursday ---+-- 4 in the last hour 16:45 Thursday | 16:30 Thursday | 16:15 Thursday -- + 16:00 Thursday ---+-- 4 in the last 4 hours, including 15:00 Thursday | the 5:00PM Thursday snapshot 14:00 Thursday ---+ 11:00 Thursday ---+-- 4 in the last 24 hours, including 05:00 Thursday | 17:00 Thursday 23:00 Wednesday ---+ 17:00 Wednesday ---+-- 4 @ 12-hr intervals in the last 48 hours, 05:00 Wednesday ---+ including 17:00 & 05:00 Thursday Suggestions and patches are welcome. => => Or did I miss something in GPFS that handles automatic snapshots? We have seen periodic slowdowns when snapshots are running, but nothing to the extent described by Jonathan Buzzard. Mark => => Thanks, => Stuart Barkley => -- => I've never been lost; I was once bewildered for three days, but never lost! => -- Daniel Boone -------------- next part -------------- #! /bin/bash #$Id: snapshotter 858 2012-01-31 19:24:11Z$ # Manage snapshots of GPFS volumes # # Desgined to be called from cron at :15 intervals # ################################################################## # Defaults, may be overridden by /usr/local/etc/snappshotter.conf # or file specified by "-c" CONF=/usr/local/etc/snappshotter.conf # config file, supersceded by "-c" option MINFREE=10 # minimum free space, in percent. # # Series of intervals and counts. Intervals expressed as the end-point in hours. # count = number of snapshots to keep per-interval ############## # time # ==== # :00-59 keep snapshots at 15 minute intervals; ceiling of interval = 1hr # 01-03:59 keep snapshots at 1hr interval; ceiling of interval = 4hr # 04-23:59 keep snapshots at 6hr intervals; ceiling of interval = 24hr # 24-47:59 keep snapshots at 12hr intervals; ceiling of interval = 48hr intervals=(1 4 24 48) # hour number of each interval counts=(4 4 4 4) # max number of snapshots to keep per each interval # Note that the snapshots in interval (N+1) must be on a time interval # that corresponds to the snapshots kept in interval N. # # :00-59 keep snapshots divisible by 1/4hr: 00:00, 00:15, 00:30, 00:45, 01:00, 01:15 ... # 01-04:59 keep snapshots divisible by 4/4hr: 00:00, 01:00, 02:00, 03:00 ... # 05-23:59 keep snapshots divisible by 24/4hr: 00:00, 06:00, 12:00, 18:00 # 24-48:59 keep snapshots divisible by 48/4hr: 00:00, 12:00 # # ################################################################## TESTING="no" MMDF=/usr/lpp/mmfs/bin/mmdf MMCRSNAPSHOT=/usr/lpp/mmfs/bin/mmcrsnapshot MMLSSNAPSHOT=/usr/lpp/mmfs/bin/mmlssnapshot MMDELSNAPSHOT=/usr/lpp/mmfs/bin/mmdelsnapshot LOGGER="logger -p user.alert -t snapshotter" PATH="${PATH}:/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/local/sbin" # for access to 'ip' command, GPFS commands now=`date '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` secsINhr=$((60 * 60)) ##################################################################### usage() { cat - << E-O-USAGE 1>&2 $0 -- manage GPFS snapshots Create new GPFS snapshots and remove old snapshots. Options: -f filesystem required -- name of filesystem to snapshot -t testing test mode, report what would be done but perform no action -d "datestamp" test mode only; used supplied date stamp as if it was the current time. -c configfile use supplied configuration file in place of default: $CONF -L show license statement In test mode, the input data, in the same format as produced by "mmlssnap" must be supplied. This can be done on STDIN, as: $0 -t -f home -d "\`date --date "Dec 7 23:45"\`" < mmlssnap.data or $0 -t -f home -d "\`date --date "now +4hours"\`" < mmlssnap.data E-O-USAGE echo 1>&2 echo $1 1>&2 exit 1 } ##################################################################### license() { cat - << E-O-LICENSE Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania 3600 Market Street, Suite 380 Philadelphia, PA 19104 Web: http://www.rad.upenn.edu/sbia/ Email: sbia-software at uphs.upenn.edu SBIA Contribution and Software License Agreement ("Agreement") ============================================================== Version 1.0 (June 9, 2011) This Agreement covers contributions to and downloads from Software maintained by the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania ("SBIA"). Part A of this Agreement applies to contributions of software and/or data to the Software (including making revisions of or additions to code and/or data already in this Software). Part B of this Agreement applies to downloads of software and/or data from SBIA. Part C of this Agreement applies to all transactions with SBIA. If you distribute Software (as defined below) downloaded from SBIA, all of the paragraphs of Part B of this Agreement must be included with and apply to such Software. Your contribution of software and/or data to SBIA (including prior to the date of the first publication of this Agreement, each a "Contribution") and/or downloading, copying, modifying, displaying, distributing or use of any software and/or data from SBIA (collectively, the "Software") constitutes acceptance of all of the terms and conditions of this Agreement. If you do not agree to such terms and conditions, you have no right to contribute your Contribution, or to download, copy, modify, display, distribute or use the Software. PART A. CONTRIBUTION AGREEMENT - LICENSE TO SBIA WITH RIGHT TO SUBLICENSE ("CONTRIBUTION AGREEMENT"). ----------------------------------------------------------------------------------------------------- 1. As used in this Contribution Agreement, "you" means the individual contributing the Contribution to the Software maintained by SBIA and the institution or entity which employs or is otherwise affiliated with such individual in connection with such Contribution. 2. This Contribution Agreement applies to all Contributions made to the Software maintained by SBIA, including without limitation Contributions made prior to the date of first publication of this Agreement. If at any time you make a Contribution to the Software, you represent that (i) you are legally authorized and entitled to make such Contribution and to grant all licenses granted in this Contribution Agreement with respect to such Contribution; (ii) if your Contribution includes any patient data, all such data is de-identified in accordance with U.S. confidentiality and security laws and requirements, including but not limited to the Health Insurance Portability and Accountability Act (HIPAA) and its regulations, and your disclosure of such data for the purposes contemplated by this Agreement is properly authorized and in compliance with all applicable laws and regulations; and (iii) you have preserved in the Contribution all applicable attributions, copyright notices and licenses for any third party software or data included in the Contribution. 3. Except for the licenses granted in this Agreement, you reserve all right, title and interest in your Contribution. 4. You hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no charge, royalty-free, irrevocable license to use, reproduce, make derivative works of, display and distribute the Contribution. If your Contribution is protected by patent, you hereby grant to SBIA, with the right to sublicense, a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable license under your interest in patent rights covering the Contribution, to make, have made, use, sell and otherwise transfer your Contribution, alone or in combination with any other code. 5. You acknowledge and agree that SBIA may incorporate your Contribution into the Software and may make the Software available to members of the public on an open source basis under terms substantially in accordance with the Software License set forth in Part B of this Agreement. You further acknowledge and agree that SBIA shall have no liability arising in connection with claims resulting from your breach of any of the terms of this Agreement. 6. YOU WARRANT THAT TO THE BEST OF YOUR KNOWLEDGE YOUR CONTRIBUTION DOES NOT CONTAIN ANY CODE THAT REQUIRES OR PRESCRIBES AN "OPEN SOURCE LICENSE" FOR DERIVATIVE WORKS (by way of non-limiting example, the GNU General Public License or other so-called "reciprocal" license that requires any derived work to be licensed under the GNU General Public License or other "open source license"). PART B. DOWNLOADING AGREEMENT - LICENSE FROM SBIA WITH RIGHT TO SUBLICENSE ("SOFTWARE LICENSE"). ------------------------------------------------------------------------------------------------ 1. As used in this Software License, "you" means the individual downloading and/or using, reproducing, modifying, displaying and/or distributing the Software and the institution or entity which employs or is otherwise affiliated with such individual in connection therewith. The Section of Biomedical Image Analysis, Department of Radiology at the Universiy of Pennsylvania ("SBIA") hereby grants you, with right to sublicense, with respect to SBIA's rights in the software, and data, if any, which is the subject of this Software License (collectively, the "Software"), a royalty-free, non-exclusive license to use, reproduce, make derivative works of, display and distribute the Software, provided that: (a) you accept and adhere to all of the terms and conditions of this Software License; (b) in connection with any copy of or sublicense of all or any portion of the Software, all of the terms and conditions in this Software License shall appear in and shall apply to such copy and such sublicense, including without limitation all source and executable forms and on any user documentation, prefaced with the following words: "All or portions of this licensed product (such portions are the "Software") have been obtained under license from the Section of Biomedical Image Analysis, Department of Radiology at the University of Pennsylvania and are subject to the following terms and conditions:" (c) you preserve and maintain all applicable attributions, copyright notices and licenses included in or applicable to the Software; (d) modified versions of the Software must be clearly identified and marked as such, and must not be misrepresented as being the original Software; and (e) you consider making, but are under no obligation to make, the source code of any of your modifications to the Software freely available to others on an open source basis. 2. The license granted in this Software License includes without limitation the right to (i) incorporate the Software into proprietary programs (subject to any restrictions applicable to such programs), (ii) add your own copyright statement to your modifications of the Software, and (iii) provide additional or different license terms and conditions in your sublicenses of modifications of the Software; provided that in each case your use, reproduction or distribution of such modifications otherwise complies with the conditions stated in this Software License. 3. This Software License does not grant any rights with respect to third party software, except those rights that SBIA has been authorized by a third party to grant to you, and accordingly you are solely responsible for (i) obtaining any permissions from third parties that you need to use, reproduce, make derivative works of, display and distribute the Software, and (ii) informing your sublicensees, including without limitation your end-users, of their obligations to secure any such required permissions. 4. The Software has been designed for research purposes only and has not been reviewed or approved by the Food and Drug Administration or by any other agency. YOU ACKNOWLEDGE AND AGREE THAT CLINICAL APPLICATIONS ARE NEITHER RECOMMENDED NOR ADVISED. Any commercialization of the Software is at the sole risk of the party or parties engaged in such commercialization. You further agree to use, reproduce, make derivative works of, display and distribute the Software in compliance with all applicable governmental laws, regulations and orders, including without limitation those relating to export and import control. 5. The Software is provided "AS IS" and neither SBIA nor any contributor to the software (each a "Contributor") shall have any obligation to provide maintenance, support, updates, enhancements or modifications thereto. SBIA AND ALL CONTRIBUTORS SPECIFICALLY DISCLAIM ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL SBIA OR ANY CONTRIBUTOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED TO THE SOFTWARE, EVEN IF SBIA OR ANY CONTRIBUTOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. TO THE MAXIMUM EXTENT NOT PROHIBITED BY LAW OR REGULATION, YOU FURTHER ASSUME ALL LIABILITY FOR YOUR USE, REPRODUCTION, MAKING OF DERIVATIVE WORKS, DISPLAY, LICENSE OR DISTRIBUTION OF THE SOFTWARE AND AGREE TO INDEMNIFY AND HOLD HARMLESS SBIA AND ALL CONTRIBUTORS FROM AND AGAINST ANY AND ALL CLAIMS, SUITS, ACTIONS, DEMANDS AND JUDGMENTS ARISING THEREFROM. 6. None of the names, logos or trademarks of SBIA or any of SBIA's affiliates or any of the Contributors, or any funding agency, may be used to endorse or promote products produced in whole or in part by operation of the Software or derived from or based on the Software without specific prior written permission from the applicable party. 7. Any use, reproduction or distribution of the Software which is not in accordance with this Software License shall automatically revoke all rights granted to you under this Software License and render Paragraphs 1 and 2 of this Software License null and void. 8. This Software License does not grant any rights in or to any intellectual property owned by SBIA or any Contributor except those rights expressly granted hereunder. PART C. MISCELLANEOUS --------------------- This Agreement shall be governed by and construed in accordance with the laws of The Commonwealth of Pennsylvania without regard to principles of conflicts of law. This Agreement shall supercede and replace any license terms that you may have agreed to previously with respect to Software from SBIA. E-O-LICENSE exit } ##################################################################### # Parse the command-line while [ "X$1" != "X" ] do case $1 in -L) license ;; -t) TESTING="yes" shift ;; -d) # Date stamp given...only valid in testing mode shift # Convert the user-supplied date to the YYYY_Mo_DD_HH:MM form, # throwing away the seconds UserDATE="$1" now=`date --date "$1" '+%Y_%m_%d_%H:%M'` nowsecs=`echo $now | sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` nowsecs=`date --date "$nowsecs" "+%s"` shift ;; -c) shift CONF=$1 if [ ! -f $CONF ] ; then usage "Specified configuration file ($CONF) not found" fi shift ;; -f) shift filesys=$1 shift ;; *) usage "Unrecognized option: \"$1\"" ;; esac done ############## End of command line parsing LOCKFILE=/var/run/snapshotter.$filesys if [ -f $LOCKFILE ] ; then PIDs=`cat $LOCKFILE | tr "\012" " "` echo "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." 1>&2 $LOGGER "Lockfile $LOCKFILE from snapshotter process $PID exists. Will not continue." exit 1 else echo $$ > $LOCKFILE if [ $? != 0 ] ; then echo "Could not create lockfile $LOCKFILE for process $$. Exiting." 1>&2 $LOGGER "Could not create lockfile $LOCKFILE for process $$" exit 2 fi fi ######## Check sanity of user-supplied values if [ "X$filesys" = "X" ] ; then $LOGGER "Filesystem must be specified" usage "Filesystem must be specified" fi if [ $TESTING = "yes" ] ; then # testing mode: # accept faux filesystem argument # accept faux datestamp as arguments # read faux results from mmlssnapshot on STDIN # MMDF # # Do not really use mmdf executable, so that the testing can be # done outside a GPFS cluster Use a 2-digit random number 00 .. 99 # from $RANDOM, but fill the variable with dummy fields so the # the random number corresponds to field5, where it would be in # the mmdf output. MMDF="eval echo \(total\) f1 f2 f3 f4 \(\${RANDOM: -2:2}%\) " MMCRSNAPSHOT="echo mmcrsnapshot" MMDELSNAPSHOT="echo mmdelsnapshot" MMLSSNAPDATA=`cat - | tr "\012" "%"` MMLSSNAPSHOT="eval echo \$MMLSSNAPDATA|tr '%' '\012'" LOGGER="echo Log message: " else if [ "X$UserDATE" != "X" ] ; then $LOGGER "Option \"-d\" only valid in testing mode" usage "Option \"-d\" only valid in testing mode" fi /usr/lpp/mmfs/bin/mmlsfs $filesys -T 1> /dev/null 2>&1 if [ $? != 0 ] ; then $LOGGER "Error accessing GPFS filesystem: $filesys" echo "Error accessing GPFS filesystem: $filesys" 1>&2 rm -f $LOCKFILE exit 1 fi # Check if the node where this script is running is the GPFS manager node for the # specified filesystem manager=`/usr/lpp/mmfs/bin/mmlsmgr $filesys | grep -w "^$filesys" |awk '{print $2}'` ip addr list | grep -qw "$manager" if [ $? != 0 ] ; then # This node is not the manager...exit rm -f $LOCKFILE exit fi MMLSSNAPSHOT="$MMLSSNAPSHOT $filesys" fi # It is valid for the default config file not to exist, so check if # is there before sourcing it if [ -f $CONF ] ; then . $CONF $filesys # load variables found in $CONF, based on $filesys fi # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # Produce list of valid snapshot names (w/o header lines) snapnames=`$MMLSSNAPSHOT |grep Valid |sed -e '$d' -e 's/ .*//'` # get the number of existing snapshots snapcount=($snapnames) ; snapcount=${#snapcount[*]} ########################################################### # given a list of old snapshot names, in the form: # YYYY_Mo_DD_HH:MM # fill the buckets by time. A snapshot can only go # into one bucket! ########################################################### for oldsnap in $snapnames do oldstamp=`echo $oldsnap|sed -e "s/_\([^_]*\)$/ \1/" -e "s/_/\//g"` oldsecs=`date --date "$oldstamp" "+%s"` diff=$((nowsecs - oldsecs)) # difference in seconds between 'now' and old snapshot if [ $diff -lt 0 ] ; then # this can happen during testing...we have got a faux # snapshot date in the future...skip it continue fi index=0 prevbucket=0 filled=No while [ $index -lt ${#intervals[*]} -a $filled != "Yes" ] do bucket=${intervals[$index]} # ceiling for number of hours for this bucket (1 more than the number of # actual hours, ie., "7" means that the bucket can contain snapshots that are # at least 6:59 (hh:mm) old. count=${counts[$index]} # max number of items in this bucket bucketinterval=$(( bucket * ( secsINhr / count ) )) # Number of hours (in seconds) between snapshots that should be retained # for this bucket...convert from hrs (bucket/count) to seconds in order to deal with :15 minute intervals # Force the mathematical precedence to do (secsINhr / count) so that cases where count>bucket (like the first 1hr # that may have a count of 4 retained snapshots) doesn't result in the shell throwing away the fraction if [ $diff -ge $((prevbucket * secsINhr)) -a $diff -lt $((bucket * secsINhr)) ] ; then # We found the correct bucket filled=Yes ## printf "Checking if $oldsnap should be retained if it is multiple of $bucketinterval [ ($oldsecs %% $bucketinterval) = 0]" # Does the snapshot being examined fall on the interval determined above for the snapshots that should be retained? if [ $(( oldsecs % bucketinterval )) = 0 ] ; then # The hour of the old snapshot is evenly divisible by the number of snapshots that should be # retained in this interval...keep it tokeep="$tokeep $oldsnap" ## printf "...yes\n" else todelete="$todelete $oldsnap" ## printf "...no\n" fi prevbucket=$bucket fi index=$((index + 1)) done if [ $diff -ge $((bucket * secsINhr )) ] ; then filled=Yes # This is too old...schedule it for deletion $LOGGER "Scheduling old snapshot $oldsnap from $filesys for deletion" todelete="$todelete $oldsnap" fi # We should not get here if [ $filled != Yes ] ; then $LOGGER "Snapshot \"$oldsnap\" on $filesys does not match any intervals" fi done # Sort the lists to make reading the testing output easier todelete=`echo $todelete | tr " " "\012" | sort -bdfu` tokeep=`echo $tokeep | tr " " "\012" | sort -bdfu` ############################################################# for oldsnap in $todelete do if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $oldsnap # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate ## echo "Removing entry for \"$oldsnap\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$oldsnap [^%]*%/%/"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $oldsnap 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $oldsnap\": $output" 1>&2 $LOGGER "Error removing snapshot of $filesys with label \"$oldsnap\": $output" rm -f $LOCKFILE exit 1 else $LOGGER "successfully removed snapshot of $filesys with label \"$oldsnap\"" fi done ############# Now check for free space ####################################### # Get current free space freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` # get the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` while [ $freenow -le $MINFREE -a $snapcount -gt 0 ] do # must delete some snapshots, from the oldest first... todelete=`$MMLSSNAPSHOT|grep Valid |sed -n -e 's/ .*//' -e '1p'` if [ $TESTING = "yes" ] ; then # "run" $MMDELSNAPSHOT without capturing results in order to produce STDOUT in testing mode $MMDELSNAPSHOT $filesys $todelete # remove the entry for the snapshot scheduled for deletion # from MMLSSNAPDATA so that the next call to MMLSSNAPSHOT is accurate and from $tokeep ## echo "Removing entry for \"$todelete\" from \$MMLSSNAPDATA" MMLSSNAPDATA=`echo $MMLSSNAPDATA | sed -e "s/%$todelete [^%]*%/%/"` tokeep=`echo $tokeep | sed -e "s/^$todelete //" -e "s/ $todelete / /" -e "s/ $todelete$//" -e "s/^$todelete$//"` else # Run mmdelsnapshot, and capture the output to prevent verbose messages from being # sent as the result of each cron job. Only display the messages in case of error. output=`$MMDELSNAPSHOT $filesys $todelete 2>&1` fi if [ $? != 0 ] ; then printf "Error from \"$MMDELSNAPSHOT $filesys $todelete\": $output" 1>&2 $LOGGER "Low disk space (${freenow}%) triggered attempt to remove snapshot of $filesys with label \"$todelete\" -- Error: $output" rm -f $LOCKFILE exit 1 else $LOGGER "removed snapshot \"$todelete\" from $filesys because ${freenow}% free disk is less than ${MINFREE}%" fi # update the number of existing snapshots snapcount=`$MMLSSNAPSHOT |grep Valid |wc -l` freenow=`$MMDF $filesys|grep '(total)' | sed -e "s/%.*//" -e "s/.*( *//"` done if [ $snapcount = 0 -a $freenow -ge $MINFREE ] ; then echo "All existing snapshots removed on $filesys, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" 1>&2 $LOGGER "All existing snapshots on $filesys removed, but insufficient disk space to create a new snapshot: ${freenow}% free is less than ${MINFREE}%" rm -f $LOCKFILE exit 1 fi $LOGGER "Free disk space on $filesys (${freenow}%) above minimum required (${MINFREE}%) to create new snapshot" ############################################################## if [ $TESTING = "yes" ] ; then # List snapshots being kept for oldsnap in $tokeep do echo "Keeping snapshot $oldsnap" done fi ############################################################# # Now create the current snapshot...do this after deleting snaps in order to reduce the chance of running # out of disk space results=`$MMCRSNAPSHOT $filesys $now 2>&1 | tr "\012" "%"` if [ $? != 0 ] ; then printf "Error from \"$MMCRSNAPSHOT $filesys $now\":\n\t" 1>&2 echo $results | tr '%' '\012' 1>&2 results=`echo $results | tr '%' '\012'` $LOGGER "Error creating snapshot of $filesys with label $now: \"$results\"" rm -f $LOCKFILE exit 1 else $LOGGER "successfully created snapshot of $filesys with label $now" fi rm -f $LOCKFILE From Jez.Tucker at rushes.co.uk Thu Feb 7 12:28:16 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 7 Feb 2013 12:28:16 +0000 Subject: [gpfsug-discuss] SOBAR Message-ID: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Hey all Is anyone using the SOBAR method of backing up the metadata and NSD configs? If so, how is your experience? >From reading the docs, it seems a bit odd that on restoration you have to re-init the FS and recall all the data. If so, what's the point of SOBAR? --- Jez Tucker Senior Sysadmin Rushes http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From orlando.richards at ed.ac.uk Thu Feb 7 12:47:30 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 12:47:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> Message-ID: <5113A262.8080206@ed.ac.uk> On 07/02/13 12:28, Jez Tucker wrote: > Hey all > > Is anyone using the SOBAR method of backing up the metadata and NSD > configs? > > If so, how is your experience? > > From reading the docs, it seems a bit odd that on restoration you have > to re-init the FS and recall all the data. > > If so, what?s the point of SOBAR? Ooh - this is new. From first glance, it looks to be a DR solution? We're actually in the process of engineering our own DR solution based on a not-dissimilar concept: - build a second GPFS file system off-site, with HSM enabled (called "dr-fs" here) - each night, rsync the changed data from "prod-fs" to "dr-fs" - each day, migrate data from the disk pool in "dr-fs" to the tape pool to free up sufficient capacity for the next night's rsync You have a complete copy of the filesystem metadata from "prod-fs" on "dr-fs", so it looks (to a user) identical, but on "dr-fs" some of the ("older") data is on tape (ratios dependent on sizing of disk vs tape pools, of course). In the event of a disaster, you just flip over to "dr-fs". From the quick glance at SOBAR, it looks to me like the concept is that you don't have a separate file system, but you hold a secondary copy in TSM via the premigrate function, and store the filesystem metadata as a flat file dump backed up "in the normal way". In DR, you rebuild the FS from the metadata backup, and re-attach the HSM pool to this newly-restored filesystem, (and then start pushing the data back out of the HSM pool into the GPFS disk pool). As soon as the HSM pool is re-attached, users can start getting their data (as fast as TSM can give it to them), and the filesystem will look "normal" to them (albeit slow, if recalling from tape). Nice - good to see this kind of thing coming from IBM - restore of huge filesystems from traditional backup really doesn't make much sense nowadays - it'd just take too long. This kind of approach doesn't necessarily accelerate the overall time to restore, but it allows for a usable filesystem to be made available while the restore happens in the background. I'd look for clarity about the state of the filesystem on restore - particularly around what happens to data which arrives after the migration has happened but before the metadata snapshot is taken. I think it'd be lost, but the metadata would still point to it existing? Might get confusing... Just my 2 cents from a quick skim read mind - plus a whole bunch of thinking we've done on this subject recently :) -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Thu Feb 7 13:40:30 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 07 Feb 2013 13:40:30 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113A262.8080206@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> Message-ID: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: [SNIP] > Nice - good to see this kind of thing coming from IBM - restore of huge > filesystems from traditional backup really doesn't make much sense > nowadays - it'd just take too long. Define too long? It's perfectly doable, and the speed of the restore will depend on what resources you have to throw at the problem. The main issue is having lots of tape drives for the restore. Having a plan to buy more ASAP is a good idea. The second is don't let yourself get sidetracked doing "high priority" restores for individuals, it will radically delay the restore. Beyond that you need some way to recreate all your storage pools, filesets, junction points and quotas etc. Looks like the mmbackupconfig and mmrestoreconfig now take care of all that for you. That is a big time saver right there. > This kind of approach doesn't > necessarily accelerate the overall time to restore, but it allows for a > usable filesystem to be made available while the restore happens in the > background. > The problem is that your tape drives will go crazy with HSM activity. So while in theory it is usable it practice it won't be. Worse with the tape drives going crazy with the HSM they won't be available for restore. I would predict much much long times to recovery where recovery is defined as being back to where you where before the disaster occurred. > > I'd look for clarity about the state of the filesystem on restore - > particularly around what happens to data which arrives after the > migration has happened but before the metadata snapshot is taken. I > think it'd be lost, but the metadata would still point to it existing? I would imagine that you just do a standard HSM reconciliation to fix that. Should be really fast with the new policy based reconciliation after you spend several months backing all your HSM'ed files up again :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From orlando.richards at ed.ac.uk Thu Feb 7 13:51:25 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:51:25 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> Message-ID: <5113B15D.4080805@ed.ac.uk> On 07/02/13 13:40, Jonathan Buzzard wrote: > On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > > [SNIP] > >> Nice - good to see this kind of thing coming from IBM - restore of huge >> filesystems from traditional backup really doesn't make much sense >> nowadays - it'd just take too long. > > Define too long? It's perfectly doable, and the speed of the restore > will depend on what resources you have to throw at the problem. The main > issue is having lots of tape drives for the restore. I can tell you speak from (bitter?) experience :) I've always been "disappointed" with the speed of restores - but I've never tried a "restore everything", which presumably will run quicker. One problem I can see us having is that we have lots of small files, which tends to make everything go really slowly - but getting the thread count up would, I'm sure, help a lot. > Having a plan to > buy more ASAP is a good idea. The second is don't let yourself get > sidetracked doing "high priority" restores for individuals, it will > radically delay the restore. Quite. > Beyond that you need some way to recreate all your storage pools, > filesets, junction points and quotas etc. Looks like the mmbackupconfig > and mmrestoreconfig now take care of all that for you. That is a big > time saver right there. > >> This kind of approach doesn't >> necessarily accelerate the overall time to restore, but it allows for a >> usable filesystem to be made available while the restore happens in the >> background. >> > > The problem is that your tape drives will go crazy with HSM activity. So > while in theory it is usable it practice it won't be. Worse with the > tape drives going crazy with the HSM they won't be available for > restore. I would predict much much long times to recovery where recovery > is defined as being back to where you where before the disaster > occurred. Yup - I can see that too. I think a large disk pool would help there, along with some kind of logic around "what data is old?" to sensibly place stuff "likely to be accessed" on disk, and the "old" stuff on tape where it can be recalled at a more leisurely pace. >> >> I'd look for clarity about the state of the filesystem on restore - >> particularly around what happens to data which arrives after the >> migration has happened but before the metadata snapshot is taken. I >> think it'd be lost, but the metadata would still point to it existing? > > I would imagine that you just do a standard HSM reconciliation to fix > that. Should be really fast with the new policy based reconciliation > after you spend several months backing all your HSM'ed files up > again :-) > Ahh - but once you've got them in TSM, you can just do a storage pool backup, presumably to a third site, and always have lots of copies everywhere! Of course - you still need to keep generational history somewhere... -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From orlando.richards at ed.ac.uk Thu Feb 7 13:56:05 2013 From: orlando.richards at ed.ac.uk (Orlando Richards) Date: Thu, 07 Feb 2013 13:56:05 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B15D.4080805@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> Message-ID: <5113B275.7030401@ed.ac.uk> On 07/02/13 13:51, Orlando Richards wrote: > On 07/02/13 13:40, Jonathan Buzzard wrote: >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: >> >> [SNIP] >> >>> Nice - good to see this kind of thing coming from IBM - restore of huge >>> filesystems from traditional backup really doesn't make much sense >>> nowadays - it'd just take too long. >> >> Define too long? Oh - for us, this is rapidly approaching "anything more than a day, and can you do it faster than that please". Not much appetite for the costs of full replication though. :/ -- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From jonathan at buzzard.me.uk Fri Feb 8 09:40:27 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Fri, 08 Feb 2013 09:40:27 +0000 Subject: [gpfsug-discuss] SOBAR In-Reply-To: <5113B275.7030401@ed.ac.uk> References: <39571EA9316BE44899D59C7A640C13F5306E8EC1@WARVWEXC1.uk.deluxe-eu.com> <5113A262.8080206@ed.ac.uk> <1360244430.31600.133.camel@buzzard.phy.strath.ac.uk> <5113B15D.4080805@ed.ac.uk> <5113B275.7030401@ed.ac.uk> Message-ID: <1360316427.16393.23.camel@buzzard.phy.strath.ac.uk> On Thu, 2013-02-07 at 13:56 +0000, Orlando Richards wrote: > On 07/02/13 13:51, Orlando Richards wrote: > > On 07/02/13 13:40, Jonathan Buzzard wrote: > >> On Thu, 2013-02-07 at 12:47 +0000, Orlando Richards wrote: > >> > >> [SNIP] > >> > >>> Nice - good to see this kind of thing coming from IBM - restore of huge > >>> filesystems from traditional backup really doesn't make much sense > >>> nowadays - it'd just take too long. > >> > >> Define too long? > > I can tell you speak from (bitter?) experience :) Done two large GPFS restores. The first was to migrate a HSM file system to completely new hardware, new TSM version and new GPFS version. IBM would not warrant an upgrade procedure so we "restored" from tape onto the new hardware and then did rsync's to get it "identical". Big problem was the TSM server hardware at the time (a p630) just gave up the ghost about 5TB into the restore repeatedly. Had do it a user at a time which made it take *much* longer as I was repeatedly going over the same tapes. The second was from bitter experience. Someone else in a moment of complete and utter stupidity wiped some ~30 NSD's of their descriptors. Two file systems an instant and complete loss. Well not strictly true it was several days before it manifested itself when one of the NSD servers was rebooted. A day was then wasted working out what the hell had happened to the file system that could have gone to the restore. Took about three weeks to get back completely. Could have been done a lot lot faster if I had had more tape drives on day one and/or made a better job of getting more in, had not messed about prioritizing restores of particular individuals, and not had capacity issues on the TSM server to boot (it was scheduled for upgrade anyway and a CPU failed mid restore). I think TSM 6.x would have been faster as well as it has faster DB performance, and the restore consisted of some 50 million files in about 30TB and it was the number of files that was the killer for speed. It would be nice in a disaster scenario if TSM would also use the tapes in the copy pools for restore, especially when they are in a different library. Not sure if the automatic failover procedure in 6.3 does that. For large file systems I would seriously consider using virtual mount points in TSM and then collocating the file systems. I would also look to match my virtual mount points to file sets. The basic problem is that most people don't have the spare hardware to even try disaster recovery, and even then you are not going to be doing it under the same pressure, hindsight is always 20/20. > Oh - for us, this is rapidly approaching "anything more than a day, and > can you do it faster than that please". Not much appetite for the costs > of full replication though. > Remember you can have any two of cheap, fast and reliable. If you want it back in a day or less then that almost certainly requires a full mirror and is going to be expensive. Noting of course if it ain't offline it ain't backed up. See above if some numpty can wipe the NSD descriptors on your file systems then can do it to your replicated file system at the same time. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jez.Tucker at rushes.co.uk Fri Feb 8 13:17:03 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Fri, 8 Feb 2013 13:17:03 +0000 Subject: [gpfsug-discuss] Maximum number of files in a TSM dsmc archive filelist Message-ID: <39571EA9316BE44899D59C7A640C13F5306E9570@WARVWEXC1.uk.deluxe-eu.com> Allo I'm doing an archive with 1954846 files in a filelist. SEGV every time. (BA 6.4.0-0) Am I being optimistic with that number of files? Has anyone successfully done that many in a single archive? --- Jez Tucker Senior Sysadmin Rushes DDI: +44 (0) 207 851 6276 http://www.rushes.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:29:12 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:29:12 -0600 Subject: [gpfsug-discuss] File system recovery question Message-ID: <20130213162912.GA22701@logos.ncsa.illinois.edu> I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. Executing od shows it is zero's. (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0040000 120070 156006 120070 156006 120070 156006 120070 156006 Dumping the header of one of the other disks shows read data for the other NSD's in that file system. (! 25)-> mmlsnsd -m | grep dh1_vd05_005 Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node (! 27)-> mmnsddiscover -d dh1_vd05_005 mmnsddiscover: Attempting to rediscover the disks. This may take a while ... myhost: Rediscovery failed for dh1_vd05_005. mmnsddiscover: Finished. Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic Is there any way to repair the header on the NSD? Thanks for any ideas! Chad From Jez.Tucker at rushes.co.uk Wed Feb 13 16:43:50 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Wed, 13 Feb 2013 16:43:50 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> So, er. Fun. I checked our disks. 0000000 000000 000000 000000 000000 000000 000000 000000 000000 * 0001000 Looks like you lost a fair bit. Presumably you don't have replication of 2? If so, I think you could just lose the NSD. Failing that: 1) Check your other disks and see if there's anything that you can figure out. Though TBH, this may take forever. 2) Restore 3) Call IBM and log a SEV 1. 3) then 2) is probably the best course of action Jez > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > bounces at gpfsug.org] On Behalf Of Chad Kerner > Sent: 13 February 2013 16:29 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] File system recovery question > > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. I > can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for > the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a > while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > mmcommon preunmount invoked. File system: desarchive Reason: > SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From craigawilson at gmail.com Wed Feb 13 16:48:32 2013 From: craigawilson at gmail.com (Craig Wilson) Date: Wed, 13 Feb 2013 16:48:32 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <39571EA9316BE44899D59C7A640C13F5306EA7E6@WARVWEXC1.uk.deluxe-eu.com> Message-ID: Dealt with a similar issue a couple of months ago. In that case the data was fine but two of the descriptors were over written. You can use "mmfsadm test readdescraw /dev/$drive" to see the descriptors, we managed to recover the disk but only after logging it to IBM and manually rebuilding the descriptor. -CW On 13 February 2013 16:43, Jez Tucker wrote: > So, er. Fun. > > I checked our disks. > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0001000 > > Looks like you lost a fair bit. > > > Presumably you don't have replication of 2? > If so, I think you could just lose the NSD. > > Failing that: > > 1) Check your other disks and see if there's anything that you can figure > out. Though TBH, this may take forever. > 2) Restore > 3) Call IBM and log a SEV 1. > > 3) then 2) is probably the best course of action > > Jez > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss- > > bounces at gpfsug.org] On Behalf Of Chad Kerner > > Sent: 13 February 2013 16:29 > > To: gpfsug-discuss at gpfsug.org > > Subject: [gpfsug-discuss] File system recovery question > > > > I have a file system, and it appears that someone dd'd over the first > > part of one of the NSD's with zero's. I see the device in multipath. I > > can fdisk and dd the device out. > > > > Executing od shows it is zero's. > > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > > * > > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > > > Dumping the header of one of the other disks shows read data for > > the other NSD's in that file system. > > > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > > Disk name NSD volume ID Device Node name > > Remarks > > > --------------------------------------------------------------------------------------- > > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > > node > > > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > > mmnsddiscover: Attempting to rediscover the disks. This may take a > > while ... > > myhost: Rediscovery failed for dh1_vd05_005. > > mmnsddiscover: Finished. > > > > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive Wed Feb > > 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical > > volume dh1_vd05_005. > > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by > > the system with return code 5 reason code 0 Wed Feb 13 09:14:07.103 > > 2013: Input/output error Wed Feb 13 09:14:07.102 2013: Failed to open > > desarchive. > > Wed Feb 13 09:14:07.103 2013: Input/output error Wed Feb 13 > > 09:14:07.102 2013: Command: err 666: mount desarchive Wed Feb 13 > > 09:14:07.104 2013: Input/output error Wed Feb 13 09:14:07 CST 2013: > > mmcommon preunmount invoked. File system: desarchive Reason: > > SGPanic > > > > Is there any way to repair the header on the NSD? > > > > Thanks for any ideas! > > Chad > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at gpfsug.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viccornell at gmail.com Wed Feb 13 16:48:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:48:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> So what do you get if you run: mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 ? Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:29, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first part of one of the NSD's with zero's. I see the device in multipath. I can fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckerner at ncsa.uiuc.edu Wed Feb 13 16:52:30 2013 From: ckerner at ncsa.uiuc.edu (Chad Kerner) Date: Wed, 13 Feb 2013 10:52:30 -0600 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> Message-ID: <20130213165230.GA23294@logos.ncsa.illinois.edu> (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: > So what do you get if you run: > > mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > > ? > > > > > Vic Cornell > viccornell at gmail.com > > > On 13 Feb 2013, at 16:29, Chad Kerner wrote: > > > I have a file system, and it appears that someone dd'd over the first part > of one of the NSD's with zero's. I see the device in multipath. I can > fdisk and dd the device out. > > Executing od shows it is zero's. > (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 > 0000000 000000 000000 000000 000000 000000 000000 000000 000000 > * > 0040000 120070 156006 120070 156006 120070 156006 120070 156006 > > Dumping the header of one of the other disks shows read data for the other > NSD's in that file system. > > (! 25)-> mmlsnsd -m | grep dh1_vd05_005 > Disk name NSD volume ID Device Node name > Remarks > --------------------------------------------------------------------------------------- > dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server > node > > (! 27)-> mmnsddiscover -d dh1_vd05_005 > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > myhost: Rediscovery failed for dh1_vd05_005. > mmnsddiscover: Finished. > > > Wed Feb 13 09:14:03.694 2013: Command: mount desarchive > Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. > Physical volume dh1_vd05_005. > Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the > system with return code 5 reason code 0 > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. > Wed Feb 13 09:14:07.103 2013: Input/output error > Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive > Wed Feb 13 09:14:07.104 2013: Input/output error > Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: > desarchive Reason: SGPanic > > Is there any way to repair the header on the NSD? > > Thanks for any ideas! > Chad > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > From viccornell at gmail.com Wed Feb 13 16:57:55 2013 From: viccornell at gmail.com (Vic Cornell) Date: Wed, 13 Feb 2013 16:57:55 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213165230.GA23294@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> <56EBDAB4-AFE8-4DF3-AEE9-5FD517863715@gmail.com> <20130213165230.GA23294@logos.ncsa.illinois.edu> Message-ID: <4D043736-06A7-44A0-830E-63D66438595F@gmail.com> Thats not pretty - but you can push the NSD descriptor on with something like: tspreparedisk -F -n /dev/mapper/dh1_vd05_005 -p 8D8EEA98506C69CE That leaves you with the FS and Disk descriptors to recover . . . . Vic Cornell viccornell at gmail.com On 13 Feb 2013, at 16:52, Chad Kerner wrote: > > > (! 41)-> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 > No NSD descriptor in sector 2 of /dev/mapper/dh1_vd05_005 > No Disk descriptor in sector 1 of /dev/mapper/dh1_vd05_005 > No FS descriptor in sector 8 of /dev/mapper/dh1_vd05_005 > > > > On Wed, Feb 13, 2013 at 04:48:55PM +0000, Vic Cornell wrote: >> So what do you get if you run: >> >> mmfsadm test readdescraw /dev/mapper/dh1_vd05_005 >> >> ? >> >> >> >> >> Vic Cornell >> viccornell at gmail.com >> >> >> On 13 Feb 2013, at 16:29, Chad Kerner wrote: >> >> >> I have a file system, and it appears that someone dd'd over the first part >> of one of the NSD's with zero's. I see the device in multipath. I can >> fdisk and dd the device out. >> >> Executing od shows it is zero's. >> (! 21)-> od /dev/mapper/dh1_vd05_005 | head -n 5 >> 0000000 000000 000000 000000 000000 000000 000000 000000 000000 >> * >> 0040000 120070 156006 120070 156006 120070 156006 120070 156006 >> >> Dumping the header of one of the other disks shows read data for the other >> NSD's in that file system. >> >> (! 25)-> mmlsnsd -m | grep dh1_vd05_005 >> Disk name NSD volume ID Device Node name >> Remarks >> --------------------------------------------------------------------------------------- >> dh1_vd05_005 8D8EEA98506C69CE - myhost (not found) server >> node >> >> (! 27)-> mmnsddiscover -d dh1_vd05_005 >> mmnsddiscover: Attempting to rediscover the disks. This may take a while >> ... >> myhost: Rediscovery failed for dh1_vd05_005. >> mmnsddiscover: Finished. >> >> >> Wed Feb 13 09:14:03.694 2013: Command: mount desarchive >> Wed Feb 13 09:14:07.101 2013: Disk failure. Volume desarchive. rc = 19. >> Physical volume dh1_vd05_005. >> Wed Feb 13 09:14:07.102 2013: File System desarchive unmounted by the >> system with return code 5 reason code 0 >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Failed to open desarchive. >> Wed Feb 13 09:14:07.103 2013: Input/output error >> Wed Feb 13 09:14:07.102 2013: Command: err 666: mount desarchive >> Wed Feb 13 09:14:07.104 2013: Input/output error >> Wed Feb 13 09:14:07 CST 2013: mmcommon preunmount invoked. File system: >> desarchive Reason: SGPanic >> >> Is there any way to repair the header on the NSD? >> >> Thanks for any ideas! >> Chad >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> From jonathan at buzzard.me.uk Wed Feb 13 17:00:31 2013 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 13 Feb 2013 17:00:31 +0000 Subject: [gpfsug-discuss] File system recovery question In-Reply-To: <20130213162912.GA22701@logos.ncsa.illinois.edu> References: <20130213162912.GA22701@logos.ncsa.illinois.edu> Message-ID: <1360774831.23342.9.camel@buzzard.phy.strath.ac.uk> On Wed, 2013-02-13 at 10:29 -0600, Chad Kerner wrote: > I have a file system, and it appears that someone dd'd over the first > part of one of the NSD's with zero's. I see the device in multipath. > I can fdisk and dd the device out. Log a SEV1 call with IBM. If it is only one NSD that is stuffed they might be able to get it back for you. However it is a custom procedure that requires developer time from Poughkeepsie. It will take some time. In the meantime I would strongly encourage you to start preparing for a total restore, which will include recreating the file system from scratch. Certainly if all the NSD headers are stuffed then the file system is a total loss. However even with only one lost it is not as I understand it certain you can get it back. It is probably a good idea to store the NSD headers somewhere off the file system in case some numpty wipes them. The most likely reason for this is that they ran an distro install on a system that has direct access to the disk. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Tobias.Kuebler at sva.de Wed Feb 13 17:00:37 2013 From: Tobias.Kuebler at sva.de (Tobias.Kuebler at sva.de) Date: Wed, 13 Feb 2013 18:00:37 +0100 Subject: [gpfsug-discuss] =?iso-8859-1?q?AUTO=3A_Tobias_Kuebler_ist_au=DFe?= =?iso-8859-1?q?r_Haus_=28R=FCckkehr_am_02/18/2013=29?= Message-ID: Ich bin bis 02/18/2013 abwesend. Vielen Dank f?r Ihre Nachricht. Ankommende E-Mails werden w?hrend meiner Abwesenheit nicht weitergeleitet, ich versuche Sie jedoch m?glichst rasch nach meiner R?ckkehr zu beantworten. In dringenden F?llen wenden Sie sich bitte an Ihren zust?ndigen Vertriebsbeauftragten. Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re: [gpfsug-discuss] File system recovery question" gesendet am 13.02.2013 17:43:50. Diese ist die einzige Benachrichtigung, die Sie empfangen werden, w?hrend diese Person abwesend ist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jez.Tucker at rushes.co.uk Thu Feb 28 17:25:26 2013 From: Jez.Tucker at rushes.co.uk (Jez Tucker) Date: Thu, 28 Feb 2013 17:25:26 +0000 Subject: [gpfsug-discuss] Who uses TSM to archive HSMd data (inline) ? Message-ID: <39571EA9316BE44899D59C7A640C13F5306EED70@WARVWEXC1.uk.deluxe-eu.com> Hello all, I have to ask Does anyone else do this? We have a problem and I'm told that "it's so rare that anyone would archive data which is HSMd". I.E. to create an archive whereby a project is entirely or partially HSMd to LTO - online data is archived to tape - offline data is copied from HSM tape to archive tape 'inline' Surely nobody pulls back all their data to disk before re-archiving back to tape? --- Jez Tucker Senior Sysadmin Rushes GPFSUG Chairman (chair at gpfsug.org) -------------- next part -------------- An HTML attachment was scrubbed... URL: