[gpfsug-discuss] gpfsug-discuss Digest, Vol 102, Issue 12

Prasad Surampudi prasad.surampudi at theatsgroup.com
Thu Jul 23 14:33:13 BST 2020


Hi Yaron,

Please see the outputs of mmlsconfig and ibstat below:

sudo /usr/lpp/mmfs/bin/mmlsconfig |grep -i verbs

verbsRdmasPerNode 192
verbsRdma enable
verbsRdmaSend yes
verbsRdmasPerConnection 48
verbsRdmasPerConnection 16
verbsPorts mlx5_4/1/1 mlx5_5/1/2
verbsPorts mlx4_0/1/0 mlx4_0/2/0
verbsPorts  mlx5_0/1/1 mlx5_1/1/2
verbsPorts mlx5_0/1/1 mlx5_2/1/2
verbsPorts mlx5_2/1/1 mlx5_3/1/2


​ibstat output on NSD server:


CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0x506b4b03000fdb74
System image GUID: 0x506b4b03000fdb74
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x526b4bfffe0fdb74
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0x506b4b03000fdb75
System image GUID: 0x506b4b03000fdb74
Port 1:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x526b4bfffe0fdb75
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xec0d9a0300a7e928
System image GUID: 0xec0d9a0300a7e928
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x526b4bfffe0fdb74
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xec0d9a0300a7e929
System image GUID: 0xec0d9a0300a7e928
Port 1:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xee0d9afffea7e929
Link layer: Ethernet
CA 'mlx5_4'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xec0d9a0300da5f92
System image GUID: 0xec0d9a0300da5f92
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 13
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0xec0d9a0300da5f92
Link layer: InfiniBand
CA 'mlx5_5'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xec0d9a0300da5f93
System image GUID: 0xec0d9a0300da5f92
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 6
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0xec0d9a0300da5f93
Link layer: InfiniBand


​ibstat output on CES server:


CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.22.4030
Hardware version: 0
Node GUID: 0xb88303ffff5ec6ec
System image GUID: 0xb88303ffff5ec6ec
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 9
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0xb88303ffff5ec6ec
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.22.4030
Hardware version: 0
Node GUID: 0xb88303ffff5ec6ed
System image GUID: 0xb88303ffff5ec6ec
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 12
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0xb88303ffff5ec6ed
Link layer: InfiniBand



Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC<http://www.theatsgroup.com/>


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Thursday, July 23, 2020 3:09 AM
To: gpfsug-discuss at spectrumscale.org <gpfsug-discuss at spectrumscale.org>
Subject: gpfsug-discuss Digest, Vol 102, Issue 12

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Spectrum Scale pagepool size with RDMA (Prasad Surampudi)
   2. Re: Spectrum Scale pagepool size with RDMA (Yaron Daniel)


----------------------------------------------------------------------

Message: 1
Date: Thu, 23 Jul 2020 00:34:02 +0000
From: Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To: "gpfsug-discuss at spectrumscale.org"
        <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Spectrum Scale pagepool size with RDMA
Message-ID:
        <MN2PR13MB2976E6B29CDE6EC77B9B2B529E790 at MN2PR13MB2976.namprd13.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hi,

We have an ESS clusters with two CES nodes. The pagepool is set to 128 GB ( Real Memory is 256 GB ) on both ESS NSD servers and CES nodes as well. Occasionally we see the mmfsd process memory usage reaches 90% on NSD servers and CES nodes and stays there until GPFS is recycled. I have couple of questions in this scenario:


  1.   What are the general recommendations of pagepool size for nodes with RDMA enabled? On, IBM knowledge center for RDMA tuning says "If the GPFS pagepool is set to 32 GB, then the mapping of the RDMA for this pagepool must be at least 64 GB."  So, does this mean that the pagepool can't be more than half of real memory with RDMA enabled? Also, Is this the reason why mmfsd memory usage exceeds pagepool size and spikes to almost 90%?
  2.  If we dont want to see high mmfsd process memory usage on NSD/CES nodes, should we decrease the pagepool size?
  3.  Can we tune  log_num_mtt parameter to limit the memory usage? Currently its set to 0 for both NSD (ppc64_le) and CES (x86_64).
  4.  We also see messages like "Verbs RDMA disabled for xx.xx.xx.xx due to no matching port found" . Any idea what this message indicate? I dont see any Verbs RDMA enabled message after these warning messages. Does it get enabled automatically?


Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC<http://www.theatsgroup.com/>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/e2eee8fe/attachment-0001.html>

------------------------------

Message: 2
Date: Thu, 23 Jul 2020 10:09:17 +0300
From: "Yaron Daniel" <YARD at il.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale pagepool size with RDMA
Message-ID:
        <OFBF335AE3.531C9498-ONC22585AE.00273AFB-C22585AE.00274DB2 at notes.na.collabserv.com>

Content-Type: text/plain; charset="iso-8859-1"

Hi


What is the output for:
#mmlsconfig |grep -i verbs
#ibstat


Regards





Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect ? IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel



Phone:
+972-3-916-5672


Fax:
+972-3-916-5672


Mobile:
+972-52-8395593


e-mail:
yard at il.ibm.com


Webex:            https://ibm.webex.com/meet/yard
IBM Israel










From:   Prasad Surampudi <prasad.surampudi at theatsgroup.com>
To:     "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date:   07/23/2020 03:34 AM
Subject:        [EXTERNAL] [gpfsug-discuss] Spectrum Scale pagepool size
with RDMA
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi,

We have an ESS clusters with two CES nodes. The pagepool is set to 128 GB
( Real Memory is 256 GB ) on both ESS NSD servers and CES nodes as well.
Occasionally we see the mmfsd process memory usage reaches 90% on NSD
servers and CES nodes and stays there until GPFS is recycled. I have
couple of questions in this scenario:

 What are the general recommendations of pagepool size for nodes with RDMA
enabled? On, IBM knowledge center for RDMA tuning says "If the GPFS
pagepool is set to 32 GB, then the mapping of the RDMA for this pagepool
must be at least 64 GB."  So, does this mean that the pagepool can't be
more than half of real memory with RDMA enabled? Also, Is this the reason
why mmfsd memory usage exceeds pagepool size and spikes to almost 90%?
If we dont want to see high mmfsd process memory usage on NSD/CES nodes,
should we decrease the pagepool size?
Can we tune  log_num_mtt parameter to limit the memory usage? Currently
its set to 0 for both NSD (ppc64_le) and CES (x86_64).
We also see messages like "Verbs RDMA disabled for xx.xx.xx.xx due to no
matching port found" . Any idea what this message indicate? I dont see any
Verbs RDMA enabled message after these warning messages. Does it get
enabled automatically?

Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=3V12EzdqYBk1P235cOvncsD-pOXNf5e5vPp85RnNhP8&s=XxlITEUK0nSjIyiu9XY1DEbYiVzVbp5XHcvQPfFJ2NY&e=





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1114 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 4105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3847 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4266 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0001.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3747 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0002.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3793 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0003.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4301 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0004.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3739 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0005.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3855 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0006.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 4084 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 3776 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0007.jpe>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 102, Issue 12
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200723/9c37aa0b/attachment-0001.htm>


More information about the gpfsug-discuss mailing list