[gpfsug-discuss] Best of spectrum scale
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Tue Sep 8 17:10:59 BST 2020
On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has
> collapsed and as long as we do not need to cross it again our journey
> should reach its intended destination :-) As I understand the intent of
> this message is to alert the user (and our support teams) that the
> directory from which a command was executed no longer exist. Should
> that be of consequence to the execution of the command then failure is
> not unexpected, however, many commands do not make use of the current
> directory so they likely will succeed. If you consider the view point
> of a command failing because the working directory was removed, but not
> knowing that was the root cause, I think you can see why this message
> was added into the administration infrastructure. It allows this odd
> failure scenario to be quickly recognized saving time for both the user
> and IBM support, in tracking down the root cause.
>
I think the issue being taken is that you get an error message of
The command may fail in an unexpected way. Processing continues ..
Now to my mind that is an instant WTF, and if your description is
correct the command should IMHO have exiting saying something like
Working directory vanished, exiting command
If there is any chance of the command failing then it should not be
executed IMHO. I would rather issue it again from a directory that exists.
The way I look at it is that file systems have "state", that is if
something goes wrong then you could be looking at extended downtime as
you break the backup out and start restoring. GPFS file systems have a
tendency to be large, so even if you have a backup it is not a pleasant
process and could easily take weeks to get things back to rights.
Consequently most system admins would prefer the command does not
continue if there is any possibility of it failing and messing up the
"state" of my file system.
That's unlike say the configuration on a network switch that can be
quickly be put back with minimal interruption.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list