[gpfsug-discuss] waiting for conn rdmas < conn maxrdmas

Sven Oehme oehmes at gmail.com
Fri Feb 24 19:39:30 GMT 2017


its more likely you run out of verbsRdmasPerNode which is the top limit
across all connections for a given node.

Sven


On Fri, Feb 24, 2017 at 11:31 AM Aaron Knister <aaron.s.knister at nasa.gov>
wrote:

Interesting, thanks Sven!

Could "resources" I'm running out of include NSD server queues?

On 2/23/17 12:12 PM, Sven Oehme wrote:
> all this waiter shows is that you have more in flight than the node or
> connection can currently serve. the reasons for that can be
> misconfiguration or you simply run out of resources on the node, not the
> connection. with latest code you shouldn't see this anymore for node
> limits as the system automatically adjusts the number of maximum RDMA's
> according to the systems Node capabilities :
>
> you should see messages in your mmfslog like :
>
> 2017-02-23_06:19:50.056-0800: [I] VERBS RDMA starting with
> verbsRdmaCm=no verbsRdmaSend=yes verbsRdmaUseMultiCqThreads=yes
> verbsRdmaUseCompVectors=yes
> 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA library libibverbs.so
> (version >= 1.1) loaded and initialized.
> 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA verbsRdmasPerNode increased
> from*_3072 to 3740 because verbsRdmasPerNodeOptimize is set to yes._*
> 2017-02-23_06:19:50.121-0800: [I] VERBS RDMA discover mlx5_5 port 1
> transport IB link  IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB9CD state ACTIVE
> 2017-02-23_06:19:50.137-0800: [I] VERBS RDMA discover mlx5_4 port 1
> transport IB link  IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB9CC state ACTIVE
> 2017-02-23_06:19:50.153-0800: [I] VERBS RDMA discover mlx5_3 port 1
> transport IB link  IB NUMA node  1 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB751 state ACTIVE
> 2017-02-23_06:19:50.169-0800: [I] VERBS RDMA discover mlx5_2 port 1
> transport IB link  IB NUMA node  1 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB750 state ACTIVE
> 2017-02-23_06:19:50.185-0800: [I] VERBS RDMA discover mlx5_1 port 1
> transport IB link  IB NUMA node  0 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB78D state ACTIVE
> 2017-02-23_06:19:50.201-0800: [I] VERBS RDMA discover mlx5_0 port 1
> transport IB link  IB NUMA node  0 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB78C state ACTIVE
>
> we want to eliminate all this configurable limits eventually, but this
> takes time, but as you can see above, we make progress on each release
:-)
>
> Sven
>
>
>
>
> On Thu, Feb 23, 2017 at 9:05 AM Aaron Knister <aaron.s.knister at nasa.gov
> <mailto:aaron.s.knister at nasa.gov>> wrote:
>
>     On a particularly heavy loaded NSD server I'm seeing a lot of these
>     messages:
>
>     0x7FFFF08B63E0 (  15539) waiting 0.004139456 seconds, NSDThread: on
>     ThCond 0x7FFFA80772C8 (0x7FFFA80772C8) (VERBSEventWaitCondvar), reason
>     'waiting for conn rdmas < conn maxrdmas'
>     0x7FFFF08EED80 (  15584) waiting 0.004075718 seconds, NSDThread: on
>     ThCond 0x7FFF680008F8 (0x7FFF680008F8) (VERBSEventWaitCondvar), reason
>     'waiting for conn rdmas < conn maxrdmas'
>     0x7FFFF08FDF00 (  15596) waiting 0.003965504 seconds, NSDThread: on
>     ThCond 0x7FFF8C00E288 (0x7FFF8C00E288) (VERBSEventWaitCondvar), reason
>     'waiting for conn rdmas < conn maxrdmas'
>     0x7FFFF09185A0 (  15617) waiting 0.003916346 seconds, NSDThread: on
>     ThCond 0x7FFF9000CB18 (0x7FFF9000CB18) (VERBSEventWaitCondvar), reason
>     'waiting for conn rdmas < conn maxrdmas'
>     0x7FFFF092B380 (  15632) waiting 0.003659610 seconds, NSDThread: on
>     ThCond 0x1DB04B8 (0x1DB04B8) (VERBSEventWaitCondvar), reason 'waiting
>     for conn rdmas < conn maxrdmas'
>
>     I've tried tweaking verbsRdmasPerConnection but the issue seems to
>     persist. Has anyone has encountered this and if so how'd you fix it?
>
>     -Aaron
>
>     --
>     Aaron Knister
>     NASA Center for Climate Simulation (Code 606.2)
>     Goddard Space Flight Center
>     (301) 286-2776 <tel:(301)%20286-2776>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170224/c6db6493/attachment-0002.htm>


More information about the gpfsug-discuss mailing list