[gpfsug-discuss] strange waiters + filesystem deadlock

Fri Mar 24 17:53:02 GMT 2017

Thanks Bob, Jonathan.

We're running GPFS 4.1.1.10 and no HSM/LTFSEE.

I'm currently gathering, as requested, a snap from all nodes (with 
traces). With 3500 nodes this ought to be entertaining.

-Aaron

On 3/24/17 12:50 PM, Oesterlin, Robert wrote:
> Hi Aaron
>
> Yes, I have seen this several times over the last 6 months. I opened at least one PMR on it and they never could track it down. I did some snap dumps but without some traces, they did not have enough. I ended up getting out of it by selectively rebooting some of my NSD servers. My suspicion is that one of them had a thread deadlocked and was holding up IO to the rest of the filesystem.
>
> I haven’t seen it since I updated to 4.2.2-2, but I’m not convinced (yet) that it’s not lurking in the background.
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
>
>
> On 3/24/17, 11:43 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" <gpfsug-discuss-bounces at spectrumscale.org on behalf of aaron.s.knister at nasa.gov> wrote:
>
>     Since yesterday morning we've noticed some deadlocks on one of our
>     filesystems that seem to be triggered by writing to it. The waiters on
>     the clients look like this:
>
>     0x19450B0 (   6730) waiting 2063.294589599 seconds, SyncHandlerThread:
>     on ThCond 0x1802585CB10 (0xFFFFC9002585CB10) (InodeFlushCondVar), reason
>     'waiting for the flush flag to commit metadata'
>     0x7FFFDA65E200 (  22850) waiting 0.000246257 seconds,
>     AllocReduceHelperThread: on ThCond 0x7FFFDAC7FE28 (0x7FFFDAC7FE28)
>     (MsgRecordCondvar), reason 'RPC wait' for allocMsgTypeRelinquishRegion
>     on node 10.1.52.33 <c0n3271>
>     0x197EE70 (   6776) waiting 0.000198354 seconds,
>     FileBlockWriteFetchHandlerThread: on ThCond 0x7FFFF00CD598
>     (0x7FFFF00CD598) (MsgRecordCondvar), reason 'RPC wait' for
>     allocMsgTypeRequestRegion on node 10.1.52.33 <c0n3271>
>
>     (10.1.52.33/c0n3271 is the fs manager for the filesystem in question)
>
>     there's a single process running on this node writing to the filesystem
>     in question (well, trying to write, it's been blocked doing nothing for
>     half an hour now). There are ~10 other client nodes in this situation
>     right now. We had many more last night before the problem seemed to
>     disappear in the early hours of the morning and now its back.
>
>     Waiters on the fs manager look like this. While the individual waiter is
>     short it's a near constant stream:
>
>     0x7FFF60003540 (   8269) waiting 0.001151588 seconds, Msg handler
>     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0 (0xFFFFC9002163A2E0)
>     (AllocManagerMutex)
>     0x7FFF601C8860 (  20606) waiting 0.001115712 seconds, Msg handler
>     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>     (0xFFFFC9002163A2E0) (AllocManagerMutex)
>     0x7FFF91C10080 (  14723) waiting 0.000959649 seconds, Msg handler
>     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0 (0xFFFFC9002163A2E0)
>     (AllocManagerMutex)
>     0x7FFFB03C2910 (  12636) waiting 0.000769611 seconds, Msg handler
>     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0 (0xFFFFC9002163A2E0)
>     (AllocManagerMutex)
>     0x7FFF8C092850 (  18215) waiting 0.000682275 seconds, Msg handler
>     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>     (0xFFFFC9002163A2E0) (AllocManagerMutex)
>     0x7FFF9423F730 (  12652) waiting 0.000641915 seconds, Msg handler
>     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0 (0xFFFFC9002163A2E0)
>     (AllocManagerMutex)
>     0x7FFF9422D770 (  12625) waiting 0.000494256 seconds, Msg handler
>     allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0 (0xFFFFC9002163A2E0)
>     (AllocManagerMutex)
>     0x7FFF9423E310 (  12651) waiting 0.000437760 seconds, Msg handler
>     allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>     (0xFFFFC9002163A2E0) (AllocManagerMutex)
>
>     I don't know if this data point is useful but both yesterday and today
>     the metadata NSDs for this filesystem have had a constant aggregate
>     stream of 25MB/s 4kop/s reads during each episode (very low latency
>     though so I don't believe the storage is a bottleneck here). Writes are
>     only a few hundred ops and didn't strike me as odd.
>
>     I have a PMR open for this but I'm curious if folks have seen this in
>     the wild and what it might mean.
>
>     -Aaron
>
>     --
>     Aaron Knister
>     NASA Center for Climate Simulation (Code 606.2)
>     Goddard Space Flight Center
>     (301) 286-2776
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>     https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=fUSq1Wdz1p_yJQVOgOoqe7fu7nsPXewvz8BUeKJyiRg&s=9sXk_xPuEtEyJgVhsZ7FCgM-rfytQGDDC2EyqwYgLhQ&e=
>
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776