[gpfsug-discuss] SGExceptionLogBufferFullThread waiter

Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] aaron.s.knister at nasa.gov
Sat Oct 15 18:07:42 BST 2016


Understood. Thank you for your help.

By the way, I was able to figure out by poking mmpmon gfis that the job is performing 20k a second each of inode creations, updates and deletions across 64 nodes. There's my 60k iops on the backend. While I'm impressed and not surprised GPFS can keep up with this...that's a pretty hefty workload.



From: Olaf Weiser
Sent: 10/15/16, 12:47 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
well - hard to say.. 60K IO may or may not be a problem... it depends on your storage backends..

check the response times to the physical disk on the NSD server... concerning the output you provided ... check particularly
10.1.53.5 <c0n20> and 10.1.53.7

.... if they are in the same (bad/ poor) range .. then your storage back end is in trouble or maybe just too heavily utilized ...
 if the response times to physical disks on the NSD server are ok... .. than maybe the network from client <-->  NSD server is somehow in trouble ..




From:        Aaron Knister <aaron.s.knister at nasa.gov>
To:        <gpfsug-discuss at spectrumscale.org>
Date:        10/15/2016 08:28 AM
Subject:        Re: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________



It absolutely does, thanks Olaf!

The tasks running on these nodes are running on 63 other nodes and
generating ~60K iop/s of metadata writes and I *think* about the same in
reads. Do you think that could be contributing to the higher waiter
times? I'm not sure quite what the job is up to. It's seemingly doing
very little data movement, the cpu %used is very low but the load is
rather high.

-Aaron

On 10/15/16 11:23 AM, Olaf Weiser wrote:
> from your file system configuration .. mmfs <dev> -L you'll find the
> size of the LOG
> since release 4.x ..you can change it, but you need to re-mount the FS
> on every client , to make the change effective ...
>
> when a clients initiate writes/changes to GPFS  it needs to update its
> changes to the log -  if this narrows a certain filling degree, GPFS
> triggers so called logWrapThreads to write content to disk and so free
> space
>
> with your given numbers ... double digit [ms] waiter times .. you fs
> get's probably slowed down.. and there's something suspect with the
> storage, because LOG-IOs are rather small and should not take that long
>
> to give you an example from a healthy environment... the IO times are so
> small, that you usually don't see waiters for this..
>
> I/O start time RW    Buf type disk:sectorNum     nSec  time ms      tag1
>      tag2           Disk UID typ      NSD node context   thread
> --------------- -- ----------- ----------------- -----  -------
> --------- --------- ------------------ --- --------------- ---------
> ----------
> 06:23:32.358851  W     logData    2:524306424        8    0.439
> 0         0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W     logData    1:524257280        8    0.646
> 0         0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.358851  W     logData    2:524306424        8    0.439
> 0         0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W     logData    1:524257280        8    0.646
> 0         0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.212426  W   iallocSeg    1:524490048       64    0.733
> 2       245  C0A70D08:57CF40D0 cli   192.167.20.16 Logwrap
> LogWrapHelperThread
> 06:23:32.212412  W     logWrap    2:524552192        8    0.755
> 0    179200  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212432  W     logWrap    2:525162760        8    0.737
> 0    125473  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212416  W   iallocSeg    2:524488384       64    0.763
> 2       347  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212414  W     logWrap    2:525266944        8    2.160
> 0    177664  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
>
>
> hope this helps ..
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66
> E-Mail: olaf.weiser at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter,
> Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
> Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>
> From:        Aaron Knister <aaron.s.knister at nasa.gov>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        10/15/2016 07:23 AM
> Subject:        [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> I've got a node that's got some curious waiters on it (see below). Could
> someone explain what the "SGExceptionLogBufferFullThread" waiter means?
>
> Thanks!
>
> -Aaron
>
> === mmdiag: waiters ===
> 0x7FFFF040D600 waiting 0.038822715 seconds,
> SGExceptionLogBufferFullThread: on ThCond 0x7FFFDBB07628
> (0x7FFFDBB07628) (parallelWaitCond), reason 'wait for parallel write'
> for NSD I/O completion on node 10.1.53.5 <c0n20>
> 0x7FFFE83F3D60 waiting 0.039629116 seconds, CleanBufferThread: on ThCond
> 0x17B1488 (0x17B1488) (MsgRecordCondvar), reason 'RPC wait' for NSD I/O
> completion on node 10.1.53.7 <c0n22>
> 0x7FFFE8373A90 waiting 0.038921480 seconds, CleanBufferThread: on ThCond
> 0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
> 'force wait on force active buffer write'
> 0x42CD9B0 waiting 0.028227004 seconds, CleanBufferThread: on ThCond
> 0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
> 'force wait for buffer write to complete'
> 0x7FFFE0F0EAD0 waiting 0.027864343 seconds, CleanBufferThread: on ThCond
> 0x7FFFDC0EEA88 (0x7FFFDC0EEA88) (MsgRecordCondvar), reason 'RPC wait'
> for NSD I/O completion on node 10.1.53.7 <c0n22>
> 0x1575560 waiting 0.028045975 seconds, RemoveHandlerThread: on ThCond
> 0x18020CE4E08 (0xFFFFC90020CE4E08) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1570560 waiting 0.038724949 seconds, CreateHandlerThread: on ThCond
> 0x18020CE50A0 (0xFFFFC90020CE50A0) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1563D60 waiting 0.073919918 seconds, RemoveHandlerThread: on ThCond
> 0x180235F6440 (0xFFFFC900235F6440) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1561560 waiting 0.054854513 seconds, RemoveHandlerThread: on ThCond
> 0x1802292D200 (0xFFFFC9002292D200) (LkObjCondvar), reason 'waiting for
> LX lock'
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161015/10b90b0d/attachment-0002.htm>


More information about the gpfsug-discuss mailing list