[gpfsug-discuss] proper gpfs shutdown when node disappears

J. Eric Wonderley eric.wonderley at vt.edu
Fri Feb 3 13:46:49 GMT 2017


Well we got it into the down state using mmsdrrestore -p to recover stuff
into /var/mmfs/gen to cl004.

Anyhow we ended up unknown for cl004 when it powered off.  Short of
removing node, unknown is the state you get.

Unknown seems stable for a hopefully short outage of cl004.


Thanks

On Thu, Feb 2, 2017 at 4:28 PM, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:

> many ways lead to Rome .. and I agree .. mmexpelnode is a nice command ..
> another approach...
> power it off .. (not reachable by ping) .. mmdelnode ... power on/boot ...
> mmaddnode ..
>
>
>
> From:        Aaron Knister <aaron.s.knister at nasa.gov>
> To:        <gpfsug-discuss at spectrumscale.org>
> Date:        02/02/2017 08:37 PM
> Subject:        Re: [gpfsug-discuss] proper gpfs shutdown when node
> disappears
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> You could forcibly expel the node (one of my favorite GPFS commands):
>
> mmexpelnode -N $nodename
>
> and then power it off after the expulsion is complete and then do
>
> mmepelenode -r -N $nodename
>
> which will allow it to join the cluster next time you try and start up
> GPFS on it. You'll still likely have to go through recovery but you'll
> skip the part where GPFS wonders where the node went prior to it
> expelling it.
>
> -Aaron
>
> On 2/2/17 2:28 PM, valdis.kletnieks at vt.edu wrote:
> > On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said:
> >
> >> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's
> why you
> >> see a message like this..
> >> have you reinstalled that node / any backup/restore thing ?
> >
> > The internal RAID controller died a horrid death and basically took
> > all the OS partitions with it.  So the node was just sort of limping
> along,
> > where the mmfsd process was still coping because it wasn't doing any
> > I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work
> > because that requires accessing stuff in /var.
> >
> > At that point, it starts getting tempting to just use ipmitool from
> > another node to power the comatose one down - but that often causes
> > a cascade of other issues while things are stuck waiting for timeouts.
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170203/1280d74c/attachment-0002.htm>


More information about the gpfsug-discuss mailing list