[gpfsug-discuss] waiters and files causing waiters

Fri Oct 18 09:34:01 BST 2019

Right for the example from Ryan(and according to the thread name, you know 
that it is writing to a file or directory), but for other cases, it may 
take more steps to figure out what access to which file is causing the 
long waiters(i.e., when mmap is being used on some nodes, or token revoke 
pending from some node, and etc.).

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.

From:   Ryan Novosielski <novosirj at rutgers.edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   2019/10/18 09:18 AM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing 
waiters
Sent by:        gpfsug-discuss-bounces at spectrumscale.org

Found my notes on this; very similar to what Behrooz was saying. 

This here is from “mmfsadm dump waiters,selected_files”; as you can see 
here, we’re looking at thread 29168. Apparently below, “inodeFlushHolder” 
corresponds to that same thread in the case I was looking at.

You could then look up the inode with “tsfindinode -i <inode> <fsname>”, 
so like for the below, "tsfindinode -i 41538053 /gpfs/cache” on our 
system.

===== dump waiters ====
Current time 2019-05-01_13:48:26-0400
Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 
FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 
(MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 
192.168.33.7 <c1n1>

===== dump selected_files =====
Current time 2019-05-01_13:48:36-0400

...

OpenFile:  4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 
0x1806AC5EAC8
 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8
 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823
   lock state [ wf: 1 ] x [] flags [ ]
 Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823
 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821
 SMBOpen: valid eff token (A:RMA D:   ) @ 0x1806AC5EB50, ctMode (A:RMA D:  
) seq 170823
   lock state [ M(2) D: ] x [] flags [ ]
 SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 
(pfro+pfxw) seq 170822
 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823
   treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW:
   BLK [0,INF] mode XW node <403>
 Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823
   treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW:
   BLK [0,INF] mode XW node <403>
 inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: 
-rw-------
 tmmgr node <c1n1> (other)
 metanode <c1n403> (me) fail+panic count -1 flags 0x0, remoteStart 0 
remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0
 locks held in mode xw:
   0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0
 BRL nXLocksOrRelinquishes 285
 vfsReference 1
 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000
 hasWriterInstance 1
 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1
 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1
 bufferListCount 1 bufferListChangeCount 3
 dirty status: flushed dirtiedSyncNum 1477623
 SMB oplock state: nWriters 1
 indBlockDeallocLock:
   sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0
 inodeValid 1
 objectVersion 240
 flushVersion 8086700 mnodeChangeCount 1
 block size code 5 (32 subblocksPerFileBlock)
 dataBytesPerFileBlock 4194304
 fileSize 0 synchedFileSize 0 indirectionLevel 1
 atime 1556732911.496160000
 mtime 1556732911.496479000
 ctime 1556732911.496479000
 crtime 1556732911.496160000
 owner uid 169589 gid 169589

> On Oct 10, 2019, at 4:43 PM, Damir Krstic <damir.krstic at gmail.com> 
wrote:
> 
> is it possible via some set of mmdiag --waiters or mmfsadm dump ? to 
figure out which files or directories access (whether it's read or write) 
is causing long-er waiters?
> 
> in all my looking i have not been able to get that information out of 
various diagnostic commands.
> 
> thanks,
> damir
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20191018/c8c3929f/attachment-0002.htm>