[gpfsug-discuss] SGExceptionLogBufferFullThread waiter

Aaron Knister aaron.s.knister at nasa.gov
Sat Oct 15 16:27:42 BST 2016


It absolutely does, thanks Olaf!

The tasks running on these nodes are running on 63 other nodes and 
generating ~60K iop/s of metadata writes and I *think* about the same in 
reads. Do you think that could be contributing to the higher waiter 
times? I'm not sure quite what the job is up to. It's seemingly doing 
very little data movement, the cpu %used is very low but the load is 
rather high.

-Aaron

On 10/15/16 11:23 AM, Olaf Weiser wrote:
> from your file system configuration .. mmfs <dev> -L you'll find the
> size of the LOG
> since release 4.x ..you can change it, but you need to re-mount the FS
> on every client , to make the change effective ...
>
> when a clients initiate writes/changes to GPFS  it needs to update its
> changes to the log -  if this narrows a certain filling degree, GPFS
> triggers so called logWrapThreads to write content to disk and so free
> space
>
> with your given numbers ... double digit [ms] waiter times .. you fs
> get's probably slowed down.. and there's something suspect with the
> storage, because LOG-IOs are rather small and should not take that long
>
> to give you an example from a healthy environment... the IO times are so
> small, that you usually don't see waiters for this..
>
> I/O start time RW    Buf type disk:sectorNum     nSec  time ms      tag1
>      tag2           Disk UID typ      NSD node context   thread
> --------------- -- ----------- ----------------- -----  -------
> --------- --------- ------------------ --- --------------- ---------
> ----------
> 06:23:32.358851  W     logData    2:524306424        8    0.439
> 0         0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W     logData    1:524257280        8    0.646
> 0         0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.358851  W     logData    2:524306424        8    0.439
> 0         0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W     logData    1:524257280        8    0.646
> 0         0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.212426  W   iallocSeg    1:524490048       64    0.733
> 2       245  C0A70D08:57CF40D0 cli   192.167.20.16 Logwrap
> LogWrapHelperThread
> 06:23:32.212412  W     logWrap    2:524552192        8    0.755
> 0    179200  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212432  W     logWrap    2:525162760        8    0.737
> 0    125473  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212416  W   iallocSeg    2:524488384       64    0.763
> 2       347  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212414  W     logWrap    2:525266944        8    2.160
> 0    177664  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
>
>
> hope this helps ..
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66
> E-Mail: olaf.weiser at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter,
> Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
> Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>
> From:        Aaron Knister <aaron.s.knister at nasa.gov>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        10/15/2016 07:23 AM
> Subject:        [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> I've got a node that's got some curious waiters on it (see below). Could
> someone explain what the "SGExceptionLogBufferFullThread" waiter means?
>
> Thanks!
>
> -Aaron
>
> === mmdiag: waiters ===
> 0x7FFFF040D600 waiting 0.038822715 seconds,
> SGExceptionLogBufferFullThread: on ThCond 0x7FFFDBB07628
> (0x7FFFDBB07628) (parallelWaitCond), reason 'wait for parallel write'
> for NSD I/O completion on node 10.1.53.5 <c0n20>
> 0x7FFFE83F3D60 waiting 0.039629116 seconds, CleanBufferThread: on ThCond
> 0x17B1488 (0x17B1488) (MsgRecordCondvar), reason 'RPC wait' for NSD I/O
> completion on node 10.1.53.7 <c0n22>
> 0x7FFFE8373A90 waiting 0.038921480 seconds, CleanBufferThread: on ThCond
> 0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
> 'force wait on force active buffer write'
> 0x42CD9B0 waiting 0.028227004 seconds, CleanBufferThread: on ThCond
> 0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
> 'force wait for buffer write to complete'
> 0x7FFFE0F0EAD0 waiting 0.027864343 seconds, CleanBufferThread: on ThCond
> 0x7FFFDC0EEA88 (0x7FFFDC0EEA88) (MsgRecordCondvar), reason 'RPC wait'
> for NSD I/O completion on node 10.1.53.7 <c0n22>
> 0x1575560 waiting 0.028045975 seconds, RemoveHandlerThread: on ThCond
> 0x18020CE4E08 (0xFFFFC90020CE4E08) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1570560 waiting 0.038724949 seconds, CreateHandlerThread: on ThCond
> 0x18020CE50A0 (0xFFFFC90020CE50A0) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1563D60 waiting 0.073919918 seconds, RemoveHandlerThread: on ThCond
> 0x180235F6440 (0xFFFFC900235F6440) (LkObjCondvar), reason 'waiting for
> LX lock'
> 0x1561560 waiting 0.054854513 seconds, RemoveHandlerThread: on ThCond
> 0x1802292D200 (0xFFFFC9002292D200) (LkObjCondvar), reason 'waiting for
> LX lock'
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>



More information about the gpfsug-discuss mailing list