[gpfsug-discuss] Bad disk but not failed in DSS-G
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Thu Jun 20 21:14:09 BST 2024
So came to light because I was checking the mmbackup logs and found that
we had not been getting any successful backups for several days and
seeing lots of errors like this
Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: [E]
Error on gpfs_iopen([/gpfs/users/xxxyyyyy/.swr],68050746): Stale file handle
Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: [E]
Summary of errors:: _dirscan failures:3, _serious unclassified errors:3.
After some digging around wondering what was going on I came across
these being logged on one of the DSS-G nodes
[Wed Jun 12 22:22:05 2024] blk_update_request: I/O error, dev sdbv,
sector 9144672512 op 0x1:(WRITE) flags 0x700 phys_seg 17 prio class 0
Yikes looks like I have a failed disk/ However if I do
[root at gpfs2 ~]# mmvdisk pdisk list --recovery-group all --not-ok
mmvdisk: All pdisks are ok.
Clearly that's a load of rubbish.
After a lot more prodding
[root at gpfs2 ~]# mmvdisk pdisk list --recovery-group dssg2 --pdisk e1d2s25 -L
pdisk:
replacementPriority = 1000
name = "e1d2s25"
device =
"//gpfs1/dev/sdft(notEnabled),//gpfs1/dev/sdfu(notEnabled),//gpfs2/dev/sdfb,//gpfs2/dev/sdbv"
recoveryGroup = "dssg2"
declusteredArray = "DA1"
state = "ok"
IOErrors = 444
IOTimeouts = 8958
mediaErrors = 15
What on earth gives? Why has the disk not been failed? It's not great
that a clearly bad disk is allowed to stick around in the file system
and cause problems IMHO.
When I try and prepare the disk for removal I get
[root at gpfs2 ~]# mmvdisk pdisk replace --prepare --rg dssg2 --pdisk e1d2s25
mmvdisk: Pdisk e1d2s25 of recovery group dssg2 is not currently
scheduled for replacement.
mmvdisk:
mmvdisk:
mmvdisk: Command failed. Examine previous error messages to determine cause.
Do I have to use the --force option? I would like to get this disk out
the file system ASAP.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list