[gpfsug-discuss] Best of spectrum scale

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Fri Sep 11 20:53:45 BST 2020


On 11/09/2020 15:25, Stephen Ulmer wrote:
> 
>> On Sep 9, 2020, at 10:04 AM, Skylar Thompson <skylar2 at uw.edu 
>> <mailto:skylar2 at uw.edu>> wrote:
>>
>> On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
>>> On 08/09/2020 18:37, IBM Spectrum Scale wrote:
>>>> I think it is incorrect to assume that a command that continues
>>>> after detecting the working directory has been removed is going to
>>>> cause damage to the file system.
>>>
>>> No I am not assuming it will cause damage. I am making the fairly 
>>> reasonable
>>> assumption that any command which fails has an increased probability of
>>> causing damage to the file system over one that completes successfully.
>>
>> I think there is another angle here, which is that this command's output
>> has the possibility of triggering an "oh ----" (fill in your preferred
>> colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
>> reaction has the possibility of causing its own problems (i.e. not sure if
>> mmafmctl touches CCR, but aborting it midway could leave CCR 
>> inconsistent).
>> I'm with Jonathan here: the command should fail with an informative
>> message, and the admin can correct the problem (just cd somewhere else).
>>
> 
> I’m now (genuinely) curious as to what Spectrum Scale commands 
> *actually* depend on the working directory existing and why. They 
> shouldn’t depend on anything but existing well-known directories (logs, 
> SDR, /tmp, et cetera) and any file or directories passed as arguments to 
> the command. This is the Unix way.
> 
> It seems like the *right* solution is to armor commands against doing 
> something “bad” if they lose a resource required to complete their task. 
> If $PWD goes away because an admin’s home goes away in the middle of a 
> long restripe, it’s better to complete the work and let them look in the 
> logs. It's not Scale’s problem if something not affecting its work happens.
 >
 > Maybe I’ve got a blind spot here...
 >

This jogged my memory that best practice would be to have a call to 
chdir to set the working directory to "/" very early on. Before anything 
critical is started.

I am 99.999% sure that its covered in Steven's (can't check as I am away 
for the weekend) so really there is no excuse. If / goes away then 
really really bad things have happened and it all sort of becomes moot 
anyway.


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list