[gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L)

Bryan Banister bbanister at jumptrading.com
Thu Feb 8 19:38:33 GMT 2018


I don't know or care who the hardware vendor is, but they can DEFINITELY ship you a controller with the right firmware!  Just demand it, which is what I do and they have basically always complied with the request.

There is the risk associated with running even longer with a single point of failure, only using the surviving controller, but if this storage system has been in production a long time (e.g. a year or so) and is generally reliable, then they should be able to get you a new, factory tested controller with the right FW versions in a couple of days.

The choice is yours of course,
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Steve Xiao
Sent: Thursday, February 08, 2018 11:18 AM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L)

Note: External Email
________________________________
You can change the cluster configuration to online unmount the file system when there is error accessing metadata.   This can be done run the following command:
   mmchconfig unmountOnDiskFail=meta -i

After this configuration change, you should be able to stop all 5 NSDs with mmchdisk stop command.    While these NSDs are in down state, any user IO to files resides on these disks will fail but your file system should state mounted and usable.

Steve Y. Xiao

> Date: Thu, 8 Feb 2018 15:59:44 +0000
> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
> Subject: [gpfsug-discuss] mmchdisk suspend / stop
> Message-ID: <8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu<mailto:8DCA682D-9850-4C03-8930-EA6C68B41109 at vanderbilt.edu>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi All,
>
> We are in a bit of a difficult situation right now with one of our
> non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware!
> <grin>) and are looking for some advice on how to deal with this
> unfortunate situation.
>
> We have a non-IBM FC storage array with dual-?redundant?
> controllers.  One of those controllers is dead and the vendor is
> sending us a replacement.  However, the replacement controller will
> have mis-matched firmware with the surviving controller and - long
> story short - the vendor says there is no way to resolve that
> without taking the storage array down for firmware upgrades.
> Needless to say there?s more to that story than what I?ve included
> here, but I won?t bore everyone with unnecessary details.
>
> The storage array has 5 NSDs on it, but fortunately enough they are
> part of our ?capacity? pool ? i.e. the only way a file lands here is
> if an mmapplypolicy scan moved it there because the *access* time is
> greater than 90 days.  Filesystem data replication is set to one.
>
> So ? what I was wondering if I could do is to use mmchdisk to either
> suspend or (preferably) stop those NSDs, do the firmware upgrade,
> and resume the NSDs?  The problem I see is that suspend doesn?t stop
> I/O, it only prevents the allocation of new blocks ? so, in theory,
> if a user suddenly decided to start using a file they hadn?t needed
> for 3 months then I?ve got a problem.  Stopping all I/O to the disks
> is what I really want to do.  However, according to the mmchdisk man
> page stop cannot be used on a filesystem with replication set to one.
>
> There?s over 250 TB of data on those 5 NSDs, so restriping off of
> them or setting replication to two are not options.
>
> It is very unlikely that anyone would try to access a file on those
> NSDs during the hour or so I?d need to do the firmware upgrades, but
> how would GPFS itself react to those (suspended) disks going away
> for a while?  I?m thinking I could be OK if there was just a way to
> actually stop them rather than suspend them.  Any undocumented
> options to mmchdisk that I?m not aware of???
>
> Are there other options - besides buying IBM hardware - that I am
> overlooking?  Thanks...
>
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu
> > - (615)875-9633
>
>
>


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180208/50c15418/attachment-0002.htm>


More information about the gpfsug-discuss mailing list