[gpfsug-discuss] GSS GPFS Storage Server show one path for one Disk

atmane khiredine a.khiredine at meteo.dz
Tue Oct 24 10:20:25 BST 2017


Dear All


we owning a solution for our HPC a GSS gpfs ​​storage server native raid
I noticed 3 days ago that a disk shows a single path
my configuration is as follows

GSS configuration: 4 enclosures, 6 SSDs, 2 empty slots, 238 total disks, 0 NVRAM partitions

if I search with fdisk I have the following result
476 disk in GSS0 and GSS1
with an old file
cat mmlspdisk.old
#####
replacementPriority = 1000
name = "e3d5s05"
device = "/dev/sdkt,/dev/sdob" << -
recoveryGroup = "BB1RGL"
declusteredArray = "DA2"
state = "ok"
userLocation = "Enclosure 2021-20E-SV25262728 Drawer 5 Slot 5"
userCondition = "normal"
nPaths = 2 activates 4 total << - while the disk contains the 2 paths
#####
ls /dev/sdob
/Dev/ sdob
ls /dev/sdkt
/Dev/sdkt
mmlspdisk all >> mmlspdisk.log
vi mmlspdisk.log
replacementPriority = 1000
name = "e3d5s05"
device = "/dev/sdkt" << --- the disk contains 1 path
recoveryGroup = "BB1RGL"
declusteredArray = "DA2"
state = "ok"
userLocation = "Enclosure 2021-20E-SV25262728 Drawer 5 Slot 5"
userCondition = "normal"
nPaths = 1 active 3 total
here is the result of the log file in GSS1

grep e3d5s05 /var/adm/ras/mmfs.log.latest

################## START LOG GSS1 #####################
0 result
################# END LOG GSS 1 #####################

here is the result of the log file in GSS0

grep e3d5s05 /var/adm/ras/mmfs.log.latest

################# START LOG GSS 0 #####################
Thu Sep 14 16:35:01.619 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 4673959648 length 4112 err 5.
Thu Sep 14 16:35:01.620 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Thu Sep 14 16:35:01.787 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Thu Sep 14 16:35:01.788 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Thu Sep 14 16:35:03.709 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Thu Sep 14 17:53:13.209 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 3658399408 length 4112 err 5.
Thu Sep 14 17:53:13.210 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Thu Sep 14 17:53:15.685 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Thu Sep 14 17:56:10.410 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 796658640 length 4112 err 5.
Thu Sep 14 17:56:10.411 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Thu Sep 14 17:56:10.593 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on write: sector 738304 length 512 err 5.
Thu Sep 14 17:56:11.236 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Thu Sep 14 17:56:11.237 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Thu Sep 14 17:56:13.127 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Thu Sep 14 17:59:14.322 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Thu Sep 14 18:02:16.580 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:08:01.464 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 682228176 length 4112 err 5.
Fri Sep 15 00:08:01.465 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:08:03.391 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:21:41.785 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 4063038688 length 4112 err 5.
Fri Sep 15 00:21:41.786 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:21:42.559 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:21:42.560 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:21:44.336 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:36:11.899 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 2503485424 length 4112 err 5.
Fri Sep 15 00:36:11.900 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:36:12.676 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:36:12.677 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:36:14.458 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:40:16.038 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 4113538928 length 4112 err 5.
Fri Sep 15 00:40:16.039 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:40:16.801 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:40:16.802 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:40:18.307 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:47:11.468 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 4185195728 length 4112 err 5.
Fri Sep 15 00:47:11.469 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:47:12.238 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:47:12.239 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:47:13.995 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:51:01.323 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 1637135520 length 4112 err 5.
Fri Sep 15 00:51:01.324 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:51:01.486 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:51:01.487 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:51:03.437 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:55:27.595 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 3646618336 length 4112 err 5.
Fri Sep 15 00:55:27.596 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 00:55:27.749 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 00:55:27.750 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 00:55:29.675 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 00:58:29.900 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 02:15:44.428 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 768931040 length 4112 err 5.
Fri Sep 15 02:15:44.429 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Fri Sep 15 02:15:44.596 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Fri Sep 15 02:15:44.597 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Fri Sep 15 02:15:46.486 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Fri Sep 15 02:18:46.826 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Fri Sep 15 02:21:47.317 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 02:24:47.723 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Fri Sep 15 02:27:48.152 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Fri Sep 15 02:30:48.392 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Sun Sep 24 15:40:18.434 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 2733386136 length 264 err 5.
Sun Sep 24 15:40:18.435 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Sun Sep 24 15:40:19.326 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Sun Sep 24 15:40:41.619 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on write: sector 3021316920 length 520 err 5.
Sun Sep 24 15:40:41.620 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Sun Sep 24 15:40:42.446 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Sun Sep 24 15:40:57.977 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on read: sector 4939800712 length 264 err 5.
Sun Sep 24 15:40:57.978 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Sun Sep 24 15:40:58.133 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Sun Sep 24 15:40:58.134 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to error.
Sun Sep 24 15:40:58.984 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
Sun Sep 24 15:44:00.932 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).
Sun Sep 24 15:47:02.352 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Sun Sep 24 15:50:03.149 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b (aborted command) asc/ascq=0x4b04 (NAK received).
Mon Sep 25 08:31:07.906 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge: I/O error on write: sector 942033152 length 264 err 5.
Mon Sep 25 08:31:07.907 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from ok to diagnosing.
Mon Sep 25 08:31:07.908 2017: [D] Pdisk e3d5s05 of RG BB1RGL path //gss0-ib0/dev/sdge: SCSI op=0x00 Test Unit Ready: Ioctl or RPC Failed: err=19.
Mon Sep 25 08:31:07.909 2017: [D] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge status changed from active to noDevice.
Mon Sep 25 08:31:07.910 2017: [E] Pdisk e3d5s05 of RG BB1RGL path /dev/sdge failed; location 'SV25262728-5-5'.
Mon Sep 25 08:31:08.770 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed from diagnosing to ok.
################## END LOG #####################

is it a HW or SW problem?

thank you

Atmane Khiredine
HPC System Administrator | Office National de la Météorologie
Tél : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz


More information about the gpfsug-discuss mailing list