[gpfsug-discuss] Waiter identification help - Quota related
Aaron Knister
aaron.s.knister at nasa.gov
Fri Jan 27 01:26:49 GMT 2017
This might be a stretch but do you happen to have a user/fileset/group
over it's hard quota or soft quota + grace period? We've had this really
upset our cluster before. At least with 3.5 each op that's done against
an over quota user/group/fileset results in at least one rpc from the fs
manager to every node in the cluster.
Are those waiters from an fs manager node? If so perhaps briefly fire up
tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds
then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for
"TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of
lines that contain:
TRACE_QUOTA: qu.server revoke reply type
that might be what's going on. You can also see the behavior if you look
at the output of mmdiag --network on your fs manager nodes and see a
bunch of RPC's with all of your cluster node listed as the recipients.
Can't recall what the RPC is called that you're looking for, though.
Hope that helps!
-Aaron
On 1/26/17 7:57 PM, Oesterlin, Robert wrote:
> OK, I have a sick cluster, and it seems to be tied up with quota related
> RPCs like this. Any help in narrowing down what the issue is?
>
>
>
> Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler
> quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler
> quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
>
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 507-269-0413
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list