[gpfsug-discuss] strange waiters + filesystem deadlock
Sven Oehme
oehmes at gmail.com
Fri Mar 24 18:05:58 GMT 2017
was this filesystem creates with -n 5000 ? or was that changed later with
mmchfs ?
please send the mmlsconfig/mmlscluster output to me at oehmes at us.ibm.com
On Fri, Mar 24, 2017 at 10:58 AM Aaron Knister <aaron.s.knister at nasa.gov>
wrote:
> I feel a little awkward about posting wlists of IP's and hostnames on
> the mailing list (even though they're all internal) but I'm happy to
> send to you directly. I've attached both an lsfs and an mmdf output of
> the fs in question here since that may be useful for others to see. Just
> a note about disk d23_02_021-- it's been evacuated for several weeks now
> due to a hardware issue in the disk enclosure.
>
> The fs is rather full percentage wise (93%) but in terms of capacity
> there's a good amount free. 93% full of a 7PB filesystem still leaves
> 551T. Metadata, as you'll see, is 31% free (roughly 800GB).
>
> The fs has 40M inodes allocated and 12M free.
>
> -Aaron
>
> On 3/24/17 1:41 PM, Sven Oehme wrote:
> > ok, that seems a different problem then i was thinking.
> > can you send output of mmlscluster, mmlsconfig, mmlsfs all ?
> > also are you getting close to fill grade on inodes or capacity on any of
> > the filesystems ?
> >
> > sven
> >
> >
> > On Fri, Mar 24, 2017 at 10:34 AM Aaron Knister <aaron.s.knister at nasa.gov
> > <mailto:aaron.s.knister at nasa.gov>> wrote:
> >
> > Here's the screenshot from the other node with the high cpu
> utilization.
> >
> > On 3/24/17 1:32 PM, Aaron Knister wrote:
> > > heh, yep we're on sles :)
> > >
> > > here's a screenshot of the fs manager from the deadlocked
> filesystem. I
> > > don't think there's an nsd server or manager node that's running
> full
> > > throttle across all cpus. There is one that's got relatively high
> CPU
> > > utilization though (300-400%). I'll send a screenshot of it in a
> sec.
> > >
> > > no zimon yet but we do have other tools to see cpu utilization.
> > >
> > > -Aaron
> > >
> > > On 3/24/17 1:22 PM, Sven Oehme wrote:
> > >> you must be on sles as this segfaults only on sles to my
> knowledge :-)
> > >>
> > >> i am looking for a NSD or manager node in your cluster that runs
> at 100%
> > >> cpu usage.
> > >>
> > >> do you have zimon deployed to look at cpu utilization across your
> nodes ?
> > >>
> > >> sven
> > >>
> > >>
> > >>
> > >> On Fri, Mar 24, 2017 at 10:08 AM Aaron Knister <
> aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
> > >> <mailto:aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>>
> wrote:
> > >>
> > >> Hi Sven,
> > >>
> > >> Which NSD server should I run top on, the fs manager? If so
> the
> > >> CPU load
> > >> is about 155%. I'm working on perf top but not off to a great
> > >> start...
> > >>
> > >> # perf top
> > >> PerfTop: 1095 irqs/sec kernel:61.9% exact: 0.0%
> [1000Hz
> > >> cycles], (all, 28 CPUs)
> > >>
> > >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >>
> > >> Segmentation fault
> > >>
> > >> -Aaron
> > >>
> > >> On 3/24/17 1:04 PM, Sven Oehme wrote:
> > >> > while this is happening run top and see if there is very
> high cpu
> > >> > utilization at this time on the NSD Server.
> > >> >
> > >> > if there is , run perf top (you might need to install perf
> > >> command) and
> > >> > see if the top cpu contender is a spinlock . if so send a
> > >> screenshot of
> > >> > perf top as i may know what that is and how to fix.
> > >> >
> > >> > sven
> > >> >
> > >> >
> > >> > On Fri, Mar 24, 2017 at 9:43 AM Aaron Knister
> > >> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
> > <mailto:aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>
> > >> > <mailto:aaron.s.knister at nasa.gov <mailto:
> aaron.s.knister at nasa.gov>
> > >> <mailto:aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>>>
> wrote:
> > >> >
> > >> > Since yesterday morning we've noticed some deadlocks on
> one
> > >> of our
> > >> > filesystems that seem to be triggered by writing to it.
> The
> > >> waiters on
> > >> > the clients look like this:
> > >> >
> > >> > 0x19450B0 ( 6730) waiting 2063.294589599 seconds,
> > >> SyncHandlerThread:
> > >> > on ThCond 0x1802585CB10 (0xFFFFC9002585CB10)
> > >> (InodeFlushCondVar), reason
> > >> > 'waiting for the flush flag to commit metadata'
> > >> > 0x7FFFDA65E200 ( 22850) waiting 0.000246257 seconds,
> > >> > AllocReduceHelperThread: on ThCond 0x7FFFDAC7FE28
> > >> (0x7FFFDAC7FE28)
> > >> > (MsgRecordCondvar), reason 'RPC wait' for
> > >> allocMsgTypeRelinquishRegion
> > >> > on node 10.1.52.33 <c0n3271>
> > >> > 0x197EE70 ( 6776) waiting 0.000198354 seconds,
> > >> > FileBlockWriteFetchHandlerThread: on ThCond
> 0x7FFFF00CD598
> > >> > (0x7FFFF00CD598) (MsgRecordCondvar), reason 'RPC wait'
> for
> > >> > allocMsgTypeRequestRegion on node 10.1.52.33 <c0n3271>
> > >> >
> > >> > (10.1.52.33/c0n3271 <http://10.1.52.33/c0n3271>
> > <http://10.1.52.33/c0n3271>
> > >> <http://10.1.52.33/c0n3271> is the fs manager
> > >> > for the filesystem in question)
> > >> >
> > >> > there's a single process running on this node writing
> to the
> > >> filesystem
> > >> > in question (well, trying to write, it's been blocked
> doing
> > >> nothing for
> > >> > half an hour now). There are ~10 other client nodes in
> this
> > >> situation
> > >> > right now. We had many more last night before the
> problem
> > >> seemed to
> > >> > disappear in the early hours of the morning and now its
> back.
> > >> >
> > >> > Waiters on the fs manager look like this. While the
> > >> individual waiter is
> > >> > short it's a near constant stream:
> > >> >
> > >> > 0x7FFF60003540 ( 8269) waiting 0.001151588 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
> > >> (0xFFFFC9002163A2E0)
> > >> > (AllocManagerMutex)
> > >> > 0x7FFF601C8860 ( 20606) waiting 0.001115712 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
> > >> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
> > >> > 0x7FFF91C10080 ( 14723) waiting 0.000959649 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
> > >> (0xFFFFC9002163A2E0)
> > >> > (AllocManagerMutex)
> > >> > 0x7FFFB03C2910 ( 12636) waiting 0.000769611 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
> > >> (0xFFFFC9002163A2E0)
> > >> > (AllocManagerMutex)
> > >> > 0x7FFF8C092850 ( 18215) waiting 0.000682275 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
> > >> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
> > >> > 0x7FFF9423F730 ( 12652) waiting 0.000641915 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
> > >> (0xFFFFC9002163A2E0)
> > >> > (AllocManagerMutex)
> > >> > 0x7FFF9422D770 ( 12625) waiting 0.000494256 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRequestRegion: on ThMutex 0x1802163A2E0
> > >> (0xFFFFC9002163A2E0)
> > >> > (AllocManagerMutex)
> > >> > 0x7FFF9423E310 ( 12651) waiting 0.000437760 seconds,
> Msg
> > >> handler
> > >> > allocMsgTypeRelinquishRegion: on ThMutex 0x1802163A2E0
> > >> > (0xFFFFC9002163A2E0) (AllocManagerMutex)
> > >> >
> > >> > I don't know if this data point is useful but both
> yesterday
> > >> and today
> > >> > the metadata NSDs for this filesystem have had a
> constant
> > >> aggregate
> > >> > stream of 25MB/s 4kop/s reads during each episode (very
> low
> > >> latency
> > >> > though so I don't believe the storage is a bottleneck
> here).
> > >> Writes are
> > >> > only a few hundred ops and didn't strike me as odd.
> > >> >
> > >> > I have a PMR open for this but I'm curious if folks have
> > >> seen this in
> > >> > the wild and what it might mean.
> > >> >
> > >> > -Aaron
> > >> >
> > >> > --
> > >> > Aaron Knister
> > >> > NASA Center for Climate Simulation (Code 606.2)
> > >> > Goddard Space Flight Center
> > >> > (301) 286-2776 <tel:(301)%20286-2776>
> <tel:(301)%20286-2776>
> > <tel:(301)%20286-2776>
> > >> > _______________________________________________
> > >> > gpfsug-discuss mailing list
> > >> > gpfsug-discuss at spectrumscale.org <
> http://spectrumscale.org>
> > >> <http://spectrumscale.org> <http://spectrumscale.org>
> > >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > gpfsug-discuss mailing list
> > >> > gpfsug-discuss at spectrumscale.org <
> http://spectrumscale.org> <http://spectrumscale.org>
> > >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > >> >
> > >>
> > >> --
> > >> Aaron Knister
> > >> NASA Center for Climate Simulation (Code 606.2)
> > >> Goddard Space Flight Center
> > >> (301) 286-2776 <tel:(301)%20286-2776> <tel:(301)%20286-2776>
> > >> _______________________________________________
> > >> gpfsug-discuss mailing list
> > >> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> <http://spectrumscale.org>
> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> gpfsug-discuss mailing list
> > >> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > >>
> > >
> > >
> > >
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > >
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776 <tel:(301)%20286-2776>
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170324/c5111d6e/attachment.htm>
More information about the gpfsug-discuss
mailing list