[gpfsug-discuss] GSS GPFS Storage Server show one path for one Disk

Sandeep Naik1 sannaik2 at in.ibm.com
Fri Oct 27 08:06:50 BST 2017


Hi Atmane,

The missing path from old mmlspdisk (/dev/sdob) and the log file 
(/dev/sdge) do not match. This may be because server was booted after the 
old mmlspdisk was taken. The path name are not guarantied across reboot. 

The log is reporting problem with /dev/sdge. You should check if OS can 
see path /dev/sdge  (use lsscsi). If the disk is accessible from other 
path than I don't believe it is problem with the disk. 
 
Thanks,

Sandeep Naik
Elastic Storage server / GPFS Test 
ETZ-B, Hinjewadi Pune India
(+91) 8600994314



From:   atmane khiredine <a.khiredine at meteo.dz>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   24/10/2017 02:50 PM
Subject:        [gpfsug-discuss] GSS GPFS Storage Server show one path for 
one Disk
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Dear All





we owning a solution for our HPC a GSS gpfs ​​storage server native raid

I noticed 3 days ago that a disk shows a single path

my configuration is as follows



GSS configuration: 4 enclosures, 6 SSDs, 2 empty slots, 238 total disks, 0 
NVRAM partitions



if I search with fdisk I have the following result

476 disk in GSS0 and GSS1

with an old file

cat mmlspdisk.old

#####

replacementPriority = 1000

name = "e3d5s05"

device = "/dev/sdkt,/dev/sdob" << -

recoveryGroup = "BB1RGL"

declusteredArray = "DA2"

state = "ok"

userLocation = "Enclosure 2021-20E-SV25262728 Drawer 5 Slot 5"

userCondition = "normal"

nPaths = 2 activates 4 total << - while the disk contains the 2 paths

#####

ls /dev/sdob

/Dev/ sdob

ls /dev/sdkt

/Dev/sdkt

mmlspdisk all >> mmlspdisk.log

vi mmlspdisk.log

replacementPriority = 1000

name = "e3d5s05"

device = "/dev/sdkt" << --- the disk contains 1 path

recoveryGroup = "BB1RGL"

declusteredArray = "DA2"

state = "ok"

userLocation = "Enclosure 2021-20E-SV25262728 Drawer 5 Slot 5"

userCondition = "normal"

nPaths = 1 active 3 total

here is the result of the log file in GSS1



grep e3d5s05 /var/adm/ras/mmfs.log.latest



################## START LOG GSS1 #####################

0 result

################# END LOG GSS 1 #####################



here is the result of the log file in GSS0



grep e3d5s05 /var/adm/ras/mmfs.log.latest



################# START LOG GSS 0 #####################

Thu Sep 14 16:35:01.619 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 4673959648 length 4112 err 5.

Thu Sep 14 16:35:01.620 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Thu Sep 14 16:35:01.787 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Thu Sep 14 16:35:01.788 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Thu Sep 14 16:35:03.709 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Thu Sep 14 17:53:13.209 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 3658399408 length 4112 err 5.

Thu Sep 14 17:53:13.210 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Thu Sep 14 17:53:15.685 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Thu Sep 14 17:56:10.410 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 796658640 length 4112 err 5.

Thu Sep 14 17:56:10.411 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Thu Sep 14 17:56:10.593 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on write: sector 738304 length 512 err 5.

Thu Sep 14 17:56:11.236 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Thu Sep 14 17:56:11.237 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Thu Sep 14 17:56:13.127 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Thu Sep 14 17:59:14.322 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Thu Sep 14 18:02:16.580 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:08:01.464 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 682228176 length 4112 err 5.

Fri Sep 15 00:08:01.465 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:08:03.391 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:21:41.785 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 4063038688 length 4112 err 5.

Fri Sep 15 00:21:41.786 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:21:42.559 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:21:42.560 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:21:44.336 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:36:11.899 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 2503485424 length 4112 err 5.

Fri Sep 15 00:36:11.900 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:36:12.676 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:36:12.677 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:36:14.458 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:40:16.038 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 4113538928 length 4112 err 5.

Fri Sep 15 00:40:16.039 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:40:16.801 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:40:16.802 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:40:18.307 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:47:11.468 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 4185195728 length 4112 err 5.

Fri Sep 15 00:47:11.469 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:47:12.238 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:47:12.239 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:47:13.995 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:51:01.323 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 1637135520 length 4112 err 5.

Fri Sep 15 00:51:01.324 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:51:01.486 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:51:01.487 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:51:03.437 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:55:27.595 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 3646618336 length 4112 err 5.

Fri Sep 15 00:55:27.596 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 00:55:27.749 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 00:55:27.750 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 00:55:29.675 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 00:58:29.900 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 02:15:44.428 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 768931040 length 4112 err 5.

Fri Sep 15 02:15:44.429 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Fri Sep 15 02:15:44.596 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Fri Sep 15 02:15:44.597 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Fri Sep 15 02:15:46.486 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Fri Sep 15 02:18:46.826 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Fri Sep 15 02:21:47.317 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 02:24:47.723 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Fri Sep 15 02:27:48.152 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Fri Sep 15 02:30:48.392 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Sun Sep 24 15:40:18.434 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 2733386136 length 264 err 5.

Sun Sep 24 15:40:18.435 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Sun Sep 24 15:40:19.326 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Sun Sep 24 15:40:41.619 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on write: sector 3021316920 length 520 err 5.

Sun Sep 24 15:40:41.620 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Sun Sep 24 15:40:42.446 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Sun Sep 24 15:40:57.977 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on read: sector 4939800712 length 264 err 5.

Sun Sep 24 15:40:57.978 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Sun Sep 24 15:40:58.133 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Sun Sep 24 15:40:58.134 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to error.

Sun Sep 24 15:40:58.984 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

Sun Sep 24 15:44:00.932 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b03 (ACK/NAK timeout).

Sun Sep 24 15:47:02.352 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Sun Sep 24 15:50:03.149 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x28 Read (10): Check Condition: skey=0x0b 
(aborted command) asc/ascq=0x4b04 (NAK received).

Mon Sep 25 08:31:07.906 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge: I/O error on write: sector 942033152 length 264 err 5.

Mon Sep 25 08:31:07.907 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from ok to diagnosing.

Mon Sep 25 08:31:07.908 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
//gss0-ib0/dev/sdge: SCSI op=0x00 Test Unit Ready: Ioctl or RPC Failed: 
err=19.

Mon Sep 25 08:31:07.909 2017: [D] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge status changed from active to noDevice.

Mon Sep 25 08:31:07.910 2017: [E] Pdisk e3d5s05 of RG BB1RGL path 
/dev/sdge failed; location 'SV25262728-5-5'.

Mon Sep 25 08:31:08.770 2017: [D] Pdisk e3d5s05 of RG BB1RGL state changed 
from diagnosing to ok.

################## END LOG #####################



is it a HW or SW problem?



thank you



Atmane Khiredine

HPC System Administrator | Office National de la Météorologie

Tél : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : 
a.khiredine at meteo.dz

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DXkezTwrVXsEOfvoqY7_DLS86P5FtQszjm9zok6upRU&m=QsMCUxg_qSYCs6Joccb2Brey1phAF_tJFrEnVD6LNoc&s=eSulhfhE2jQnmMrmb9_eoomafxb5xI3KL5Y6n3rH5CE&e=






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171027/a5326f09/attachment-0002.htm>


More information about the gpfsug-discuss mailing list