[gpfsug-discuss] proper gpfs shutdown when node disappears

valdis.kletnieks at vt.edu valdis.kletnieks at vt.edu
Thu Feb 2 19:28:05 GMT 2017


On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said:

> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's why you
> see a message like this..
> have you reinstalled that node / any backup/restore thing ?

The internal RAID controller died a horrid death and basically took
all the OS partitions with it.  So the node was just sort of limping along,
where the mmfsd process was still coping because it wasn't doing any
I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work
because that requires accessing stuff in /var.

At that point, it starts getting tempting to just use ipmitool from
another node to power the comatose one down - but that often causes
a cascade of other issues while things are stuck waiting for timeouts.





More information about the gpfsug-discuss mailing list