[gpfsug-discuss] Best of spectrum scale

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Tue Sep 8 17:10:59 BST 2020


On 08/09/2020 14:04, IBM Spectrum Scale wrote:
> I think a better metaphor is that the bridge we just crossed has 
> collapsed and as long as we do not need to cross it again our journey 
> should reach its intended destination :-)  As I understand the intent of 
> this message is to alert the user (and our support teams) that the 
> directory from which a command was executed no longer exist.  Should 
> that be of consequence to the execution of the command then failure is 
> not unexpected, however, many commands do not make use of the current 
> directory so they likely will succeed.  If you consider the view point 
> of a command failing because the working directory was removed, but not 
> knowing that was the root cause, I think you can see why this message 
> was added into the administration infrastructure.  It allows this odd 
> failure scenario to be quickly recognized saving time for both the user 
> and IBM support, in tracking down the root cause.
> 

I think the issue being taken is that you get an error message of

     The command may fail in an unexpected way.  Processing continues ..

Now to my mind that is an instant WTF, and if your description is 
correct the command should IMHO have exiting saying something like

     Working directory vanished, exiting command

If there is any chance of the command failing then it should not be 
executed IMHO. I would rather issue it again from a directory that exists.

The way I look at it is that file systems have "state", that is if 
something goes wrong then you could be looking at extended downtime as 
you break the backup out and start restoring. GPFS file systems have a 
tendency to be large, so even if you have a backup it is not a pleasant 
process and could easily take weeks to get things back to rights.

Consequently most system admins would prefer the command does not 
continue if there is any possibility of it failing and messing up the 
"state" of my file system.

That's unlike say the configuration on a network switch that can be 
quickly be put back with minimal interruption.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list