[gpfsug-discuss] strange waiters + filesystem deadlock
Aaron Knister
aaron.s.knister at nasa.gov
Fri Mar 24 17:34:30 GMT 2017
Here's the screenshot from the other node with the high cpu utilization.
On 3/24/17 1:32 PM, Aaron Knister wrote:
> heh, yep we're on sles :)
>
> here's a screenshot of the fs manager from the deadlocked filesystem. I
> don't think there's an nsd server or manager node that's running full
> throttle across all cpus. There is one that's got relatively high CPU
> utilization though (300-400%). I'll send a screenshot of it in a sec.
>
> no zimon yet but we do have other tools to see cpu utilization.
>
> -Aaron
>
> On 3/24/17 1:22 PM, Sven Oehme wrote:
>> you must be on sles as this segfaults only on sles to my knowledge :-)
>>
>> i am looking for a NSD or manager node in your cluster that runs at 100%
>> cpu usage.
>>
>> do you have zimon deployed to look at cpu utilization across your nodes ?
>>
>> sven
>>
>>
>>
>> On Fri, Mar 24, 2017 at 10:08 AM Aaron Knister <aaron.s.knister at nasa.gov
>> <mailto:aaron.s.knister at nasa.gov>> wrote:
>>
>> Hi Sven,
>>
>> Which NSD server should I run top on, the fs manager? If so the
>> CPU load
>> is about 155%. I'm working on perf top but not off to a great
>> start...
>>
>> # perf top
>> PerfTop: 1095 irqs/sec kernel:61.9% exact: 0.0% [1000Hz
>> cycles], (all, 28 CPUs)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Segmentation fault
>>
>> -Aaron
>>
>> On 3/24/17 1:04 PM, Sven Oehme wrote:
>> > while this is happening run top and see if there is very high cpu
>> > utilization at this time on the NSD Server.
>> >
>> > if there is , run perf top (you might need to install perf
>> command) and
>> > see if the top cpu contender is a spinlock . if so send a
>> screenshot of
>> > perf top as i may know what that is and how to fix.
>> >
>> > sven
>> >
>> >
>> > On Fri, Mar 24, 2017 at 9:43 AM Aaron Knister
>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
>> > <mailto:aaron.s.knister at nasa.gov
>> <mailto:aaron.s.knister at nasa.gov>>> wrote:
>> >
>> > Since yesterday morning we've noticed some deadlocks on one
>> of our
>> > filesystems that seem to be triggered by writing to it. The
>> waiters on
>> > the clients look like this:
>> >
>> > 0x19450B0 ( 6730) waiting 2063.294589599 seconds,
>> SyncHandlerThread:
>> > on ThCond 0x1802585CB10 (0xFFFFC9002585CB10)
>> (InodeFlushCondVar), reason
>> > 'waiting for the flush flag to commit metadata'
>> > 0x7FFFDA65E200 ( 22850) waiting 0.000246257 seconds,
>> > AllocReduceHelperThread: on ThCond 0x7FFFDAC7FE28
>> (0x7FFFDAC7FE28)
>> > (MsgRecordCondvar), reason 'RPC wait' for
>> allocMsgTypeRelinquishRegion
>> > on node 10.1.52.33 <c0n3271>
>> > 0x197EE70 ( 6776) waiting 0.000198354 seconds,
>> > FileBlockWriteFetchHandlerThread: on ThCond 0x7FFFF00CD598
>> > (0x7FFFF00CD598) (MsgRecordCondvar), reason 'RPC wait' for
>> > allocMsgTypeRequestRegion on node 10.1.52.33 <c0n3271>
>> >
>> > (10.1.52.33/c0n3271 <http://10.1.52.33/c0n3271>
>> <http://10.1.52.33/c0n3271> is the fs manager
>> > for the filesystem in question)
>> >
>> > there's a single process running on this node writing to the
>> filesystem
>> > in question (well, trying to write, it's been blocked doing
>> nothing for
>> > half an hour now). There are ~10 other client nodes in this
>> situation
>> > right now. We had many more last night before the problem
>> seemed to
>> > disappear in the early hours of the morning and now its back.
>> >
>> > Waiters on the fs manager look like this. While the
>> individual waiter is
>> > short it's a near constant stream:
>> >
>> > 0x7FFF60003540 ( 8269) waiting 0.001151588 seconds, Msg
>> handler
>> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
>> (0xFFFFC9002163A2E0)
>> > (AllocManagerMutex)
>> > 0x7FFF601C8860 ( 20606) waiting 0.001115712 seconds, Msg
>> handler
>> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
>> > 0x7FFF91C10080 ( 14723) waiting 0.000959649 seconds, Msg
>> handler
>> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
>> (0xFFFFC9002163A2E0)
>> > (AllocManagerMutex)
>> > 0x7FFFB03C2910 ( 12636) waiting 0.000769611 seconds, Msg
>> handler
>> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
>> (0xFFFFC9002163A2E0)
>> > (AllocManagerMutex)
>> > 0x7FFF8C092850 ( 18215) waiting 0.000682275 seconds, Msg
>> handler
>> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
>> > 0x7FFF9423F730 ( 12652) waiting 0.000641915 seconds, Msg
>> handler
>> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
>> (0xFFFFC9002163A2E0)
>> > (AllocManagerMutex)
>> > 0x7FFF9422D770 ( 12625) waiting 0.000494256 seconds, Msg
>> handler
>> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
>> (0xFFFFC9002163A2E0)
>> > (AllocManagerMutex)
>> > 0x7FFF9423E310 ( 12651) waiting 0.000437760 seconds, Msg
>> handler
>> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
>> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
>> >
>> > I don't know if this data point is useful but both yesterday
>> and today
>> > the metadata NSDs for this filesystem have had a constant
>> aggregate
>> > stream of 25MB/s 4kop/s reads during each episode (very low
>> latency
>> > though so I don't believe the storage is a bottleneck here).
>> Writes are
>> > only a few hundred ops and didn't strike me as odd.
>> >
>> > I have a PMR open for this but I'm curious if folks have
>> seen this in
>> > the wild and what it might mean.
>> >
>> > -Aaron
>> >
>> > --
>> > Aaron Knister
>> > NASA Center for Climate Simulation (Code 606.2)
>> > Goddard Space Flight Center
>> > (301) 286-2776 <tel:(301)%20286-2776> <tel:(301)%20286-2776>
>> > _______________________________________________
>> > gpfsug-discuss mailing list
>> > gpfsug-discuss at spectrumscale.org
>> <http://spectrumscale.org> <http://spectrumscale.org>
>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> >
>> >
>> >
>> > _______________________________________________
>> > gpfsug-discuss mailing list
>> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> >
>>
>> --
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776 <tel:(301)%20286-2776>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsd35.png
Type: image/png
Size: 243009 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170324/4fe3c787/attachment-0002.png>
More information about the gpfsug-discuss
mailing list