<font size=2 face="sans-serif">You can change the cluster configuration

to online unmount the file system when there is error accessing metadata.

  This can be done run the following command:</font><br><font size=2 face="sans-serif">   mmchconfig unmountOnDiskFail=meta

-i </font><br><br><font size=2 face="sans-serif">After this configuration change, you

should be able to stop all 5 NSDs with mmchdisk stop command.    While

these NSDs are in down state, any user IO to files resides on these disks

will fail but your file system should state mounted and usable.</font><br><br><font size=2 face="sans-serif">Steve Y. Xiao</font><br><tt><font size=2><br>> Date: Thu, 8 Feb 2018 15:59:44 +0000<br>> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh@Vanderbilt.Edu><br>> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>> Subject: [gpfsug-discuss] mmchdisk suspend / stop<br>> Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109@vanderbilt.edu><br>> Content-Type: text/plain; charset="utf-8"<br>> <br>> Hi All,<br>> <br>> We are in a bit of a difficult situation right now with one of our

<br>> non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware!<br>> <grin>) and are looking for some advice on how to deal with

this <br>> unfortunate situation.<br>> <br>> We have a non-IBM FC storage array with dual-?redundant? <br>> controllers.  One of those controllers is dead and the vendor

is <br>> sending us a replacement.  However, the replacement controller

will <br>> have mis-matched firmware with the surviving controller and - long

<br>> story short - the vendor says there is no way to resolve that <br>> without taking the storage array down for firmware upgrades.  <br>> Needless to say there?s more to that story than what I?ve included

<br>> here, but I won?t bore everyone with unnecessary details.<br>> <br>> The storage array has 5 NSDs on it, but fortunately enough they are

<br>> part of our ?capacity? pool ? i.e. the only way a file lands here

is<br>> if an mmapplypolicy scan moved it there because the *access* time

is<br>> greater than 90 days.  Filesystem data replication is set to

one.<br>> <br>> So ? what I was wondering if I could do is to use mmchdisk to either<br>> suspend or (preferably) stop those NSDs, do the firmware upgrade,

<br>> and resume the NSDs?  The problem I see is that suspend doesn?t

stop<br>> I/O, it only prevents the allocation of new blocks ? so, in theory,

<br>> if a user suddenly decided to start using a file they hadn?t needed

<br>> for 3 months then I?ve got a problem.  Stopping all I/O to the

disks<br>> is what I really want to do.  However, according to the mmchdisk

man<br>> page stop cannot be used on a filesystem with replication set to one.<br>> <br>> There?s over 250 TB of data on those 5 NSDs, so restriping off of

<br>> them or setting replication to two are not options.<br>> <br>> It is very unlikely that anyone would try to access a file on those

<br>> NSDs during the hour or so I?d need to do the firmware upgrades, but<br>> how would GPFS itself react to those (suspended) disks going away

<br>> for a while?  I?m thinking I could be OK if there was just a

way to <br>> actually stop them rather than suspend them.  Any undocumented

<br>> options to mmchdisk that I?m not aware of???<br>> <br>> Are there other options - besides buying IBM hardware - that I am

<br>> overlooking?  Thanks...<br>> <br>> ?<br>> Kevin Buterbaugh - Senior System Administrator<br>> Vanderbilt University - Advanced Computing Center for Research and

Education<br>> Kevin.Buterbaugh@vanderbilt.edu<</font></tt><a href=mailto:Kevin.Buterbaugh@vanderbilt.edu><tt><font size=2>mailto:Kevin.Buterbaugh@vanderbilt.edu</font></tt></a><tt><font size=2><br>> > - (615)875-9633<br>> <br>> <br>> <br><br></font></tt><BR>