[gpfsug-discuss] Lost disks

IBM Spectrum Scale scale at us.ibm.com
Mon Jul 31 05:57:44 BST 2017


Jonathan,

Regarding 

>> Thing is GPFS does not look at the NSD descriptors that much. So in my
>> case it was several days before it was noticed, and only then because I
>> rebooted the last NSD server as part of a rolling upgrade of GPFS. I
>> could have cruised for weeks/months with no NSD descriptors if I had 
not
>> restarted all the NSD servers. The moral of this is the overwrite could
>> have take place quite some time ago.

While GPFS does not normally read the NSD descriptors in the course of 
performing file system operations, as of 4.1.1 a periodic check is done on 
the content of various descriptors, and a message like

[E] On-disk NSD descriptor of <disk> is valid but has a different ID. ID 
in cache is <NSD ID> and ID on-disk is<NSD ID>

should get issued if the content of the descriptor on disk changes.


Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:   Jonathan Buzzard <jonathan at buzzard.me.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/27/2017 06:58 AM
Subject:        Re: [gpfsug-discuss] Lost disks
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



On Wed, 2017-07-26 at 17:45 +0000, Oesterlin, Robert wrote:
> One way this could possible happen would be a system is being
> installed (I’m assuming this is Linux) and the FC adapter is active;
> then the OS install will see disks and wipe out the NSD descriptor on
> those disks. (Which is why the NSD V2 format was invented, to prevent
> this from happening) If you don’t lose all of the descriptors, it’s
> sometimes possible to manually re-construct the missing header
> information - I’m assuming since you opened a PMR, IBM has looked at
> this. This is a scenario I’ve had to recover from - twice. Back-end
> array issue seems unlikely to me, I’d keep looking at the systems with
> access to those LUNs and see what commands/operations could have been
> run.

I would concur that this is the most likely scenario; an install where
for whatever reason the machine could see the disks and they are gone. I
know that RHEL6 and its derivatives will do that for you. Has happened
to me at previous place of work where another admin forgot to de-zone a
server, went to install CentOS6 as part of a cluster upgrade from
CentOS5 and overwrote all the NSD descriptors.

Thing is GPFS does not look at the NSD descriptors that much. So in my
case it was several days before it was noticed, and only then because I
rebooted the last NSD server as part of a rolling upgrade of GPFS. I
could have cruised for weeks/months with no NSD descriptors if I had not
restarted all the NSD servers. The moral of this is the overwrite could
have take place quite some time ago.

Basically if the disks are all missing then the NSD descriptor has been
overwritten, and the protestations of the client are irrelevant. The
chances of the disk array doing it to *ALL* the disks is somewhere
around ħ IMHO.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170731/bdd7b03e/attachment-0002.htm>


More information about the gpfsug-discuss mailing list