[gpfsug-discuss] Checking a file-system for errors

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Wed Oct 11 15:13:03 BST 2017


So with the help of IBM support and Venkat (thanks guys!), we think its a
problem with DMAPI. As we initially saw this as an issue with AFM
replication, we had traces from there, and had entries like:

gpfsWrite exit: failed err 688


Now apparently err 688 relates to "DMAPI disposition", once we had this we
were able to get someone to take a look at the HSM dsmrecalld, it was
running, but had failed over to a node that wasn't able to service
requests properly. (multiple NSD servers with different file-systems each
running dsmrecalld, but I don't think you can scope nods XYZ to filesystem
ABC but not DEF).

Anyway once we got that fixed, a bunch of stuff in the AFM cache popped
out (and a little poke for some stuff that hadn't updated metadata cache
probably).

So hopefully its now also solved for our other users.

What is complicated here is that a DMAPI issue was giving intermittent IO
errors, people could write into new folders, but not existing files,
though I could (some sort of Schrödinger's cat IO issue??).

So hopefully we are fixed...

Simon

On 11/10/2017, 15:01, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of UWEFALKE at de.ibm.com" <gpfsug-discuss-bounces at spectrumscale.org on
behalf of UWEFALKE at de.ibm.com> wrote:

>Usually, IO errors point to some basic problem reading/writing data .
>if there are repoducible errors, it's IMHO always a nice thing to trace
>GPFS for such an access. Often that reveals already the area where the
>cause lies and maybe even the details of it.
> 
>
>
> 
>Mit freundlichen Grüßen / Kind regards
>
> 
>Dr. Uwe Falke
> 
>IT Specialist
>High Performance Computing Services / Integrated Technology Services /
>Data Center Services
>--------------------------------------------------------------------------
>-----------------------------------------------------------------
>IBM Deutschland
>Rathausstr. 7
>09111 Chemnitz
>Phone: +49 371 6978 2165
>Mobile: +49 175 575 2877
>E-Mail: uwefalke at de.ibm.com
>--------------------------------------------------------------------------
>-----------------------------------------------------------------
>IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
>Thomas Wolter, Sven Schooß
>Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
>HRB 17122 
>
>
>
>
>From:   "Simon Thompson (IT Research Support)" <S.J.Thompson at bham.ac.uk>
>To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Date:   10/11/2017 01:22 PM
>Subject:        Re: [gpfsug-discuss] Checking a file-system for errors
>Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>
>
>
>Yes I get we should only be doing this if we think we have a problem.
>
>And the answer is, right now, we're not entirely clear.
>
>We have a couple of issues our users are reporting to us, and its not
>clear to us if they are related, an FS problem or ACLs getting in the way.
>
>We do have users who are trying to work on files getting IO error, and we
>have an AFM sync issue. The disks are all online, I poked the FS with
>tsdbfs and the files look OK - (small files, but content of the block
>matches).
>
>Maybe we have a problem with DMAPI and TSM/HSM (could that cause IO error
>reported to user when they access a file even if its not an offline
>file??)
>
>We have a PMR open with IBM on this already.
>
>But there's a wanting to be sure in our own minds that we don't have an
>underlying FS problem. I.e. I have confidence that I can tell my users,
>yes I know you are seeing weird stuff, but we have run checks and are not
>introducing data corruption.
>
>Simon
>
>On 11/10/2017, 11:58, "gpfsug-discuss-bounces at spectrumscale.org on behalf
>of UWEFALKE at de.ibm.com" <gpfsug-discuss-bounces at spectrumscale.org on
>behalf of UWEFALKE at de.ibm.com> wrote:
>
>>Mostly, however,  filesystem checks are only done if fs issues are
>>indicated by errors in the logs. Do you have reason to assume your fs has
>>probs?
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss




More information about the gpfsug-discuss mailing list