[gpfsug-discuss] waiters and files causing waiters
IBM Spectrum Scale
scale at us.ibm.com
Fri Oct 18 09:34:01 BST 2019
Right for the example from Ryan(and according to the thread name, you know
that it is writing to a file or directory), but for other cases, it may
take more steps to figure out what access to which file is causing the
long waiters(i.e., when mmap is being used on some nodes, or token revoke
pending from some node, and etc.).
Regards, The Spectrum Scale (GPFS) team
------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.
From: Ryan Novosielski <novosirj at rutgers.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 2019/10/18 09:18 AM
Subject: [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing
waiters
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Found my notes on this; very similar to what Behrooz was saying.
This here is from “mmfsadm dump waiters,selected_files”; as you can see
here, we’re looking at thread 29168. Apparently below, “inodeFlushHolder”
corresponds to that same thread in the case I was looking at.
You could then look up the inode with “tsfindinode -i <inode> <fsname>”,
so like for the below, "tsfindinode -i 41538053 /gpfs/cache” on our
system.
===== dump waiters ====
Current time 2019-05-01_13:48:26-0400
Waiting 0.1669 sec since 13:48:25, monitored, thread 29168
FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8
(MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node
192.168.33.7 <c1n1>
===== dump selected_files =====
Current time 2019-05-01_13:48:36-0400
...
OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @
0x1806AC5EAC8
cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8
Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823
lock state [ wf: 1 ] x [] flags [ ]
Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823
DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821
SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D:
) seq 170823
lock state [ M(2) D: ] x [] flags [ ]
SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30
(pfro+pfxw) seq 170822
BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823
treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW:
BLK [0,INF] mode XW node <403>
Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823
treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW:
BLK [0,INF] mode XW node <403>
inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600:
-rw-------
tmmgr node <c1n1> (other)
metanode <c1n403> (me) fail+panic count -1 flags 0x0, remoteStart 0
remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0
locks held in mode xw:
0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0
BRL nXLocksOrRelinquishes 285
vfsReference 1
dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000
hasWriterInstance 1
inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1
metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1
bufferListCount 1 bufferListChangeCount 3
dirty status: flushed dirtiedSyncNum 1477623
SMB oplock state: nWriters 1
indBlockDeallocLock:
sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0
inodeValid 1
objectVersion 240
flushVersion 8086700 mnodeChangeCount 1
block size code 5 (32 subblocksPerFileBlock)
dataBytesPerFileBlock 4194304
fileSize 0 synchedFileSize 0 indirectionLevel 1
atime 1556732911.496160000
mtime 1556732911.496479000
ctime 1556732911.496479000
crtime 1556732911.496160000
owner uid 169589 gid 169589
> On Oct 10, 2019, at 4:43 PM, Damir Krstic <damir.krstic at gmail.com>
wrote:
>
> is it possible via some set of mmdiag --waiters or mmfsadm dump ? to
figure out which files or directories access (whether it's read or write)
is causing long-er waiters?
>
> in all my looking i have not been able to get that information out of
various diagnostic commands.
>
> thanks,
> damir
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e=
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20191018/c8c3929f/attachment.htm>
More information about the gpfsug-discuss
mailing list