<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Aptos;}
@font-face
        {font-family:"Cascadia Code";
        panose-1:2 11 6 9 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        font-size:12.0pt;
        font-family:"Aptos",sans-serif;
        mso-ligatures:standardcontextual;
        mso-fareast-language:EN-US;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        font-size:12.0pt;
        font-family:"Aptos",sans-serif;
        mso-ligatures:standardcontextual;
        mso-fareast-language:EN-US;}
span.EmailStyle19
        {mso-style-type:personal-compose;
        font-family:"Aptos",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:939216039;
        mso-list-type:hybrid;
        mso-list-template-ids:-1044494532 134807553 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;}
@list l0:level1
        {mso-level-number-format:bullet;
        mso-level-text:\F0B7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:\F0A7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:\F0B7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:\F0A7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:\F0B7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:\F0A7;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        font-family:Wingdings;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style>
</head>
<body lang="EN-GB" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Is anyone using RoCE with good results? We are planning on it, but initial tests are not great &#8211; we get much better performance using plain Ethernet over the exact same links.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">It&#8217;s up and working, I can see RDMA connections and counters, no errors, but performance is unstable. And worse than Ethernet, which was just meant to be a sanity check!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Things I&#8217;ve looked at based on Lenovo and IBM guides, which I think are all configured correctly:<o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">RoCE interfaces all on the same subnet<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">They all have IPv6 enabled with &nbsp;addresses using eui64 addr-gen-mode<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">DSCP trust mode on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">PFC flow control on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Global Pause disabled on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">ToS configured for RDMA_CM<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Source based routing for multiple interfaces on the same subnet.<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Switches (nvidia cumulus) all enabled for RoCE QOS
<o:p></o:p></span></li></ul>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line speed of the NIC in question &#8211; 25Gbps. Testing basic RDMA connections with ib_send_bw gets about the same. But GPFS over RoCE gets
 from 0.7GB/s to 1.9GB/s.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What&#8217;s frustrating and confusing is that we get better performance when we just enable 1 card at the server end, and also get better performance
 if we have 1 fabric ID per NIC on the server (with all 4 fabric ID on the same NIC at the client end).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I can go into more details if anyone has experience! Does this sound familiar to anyone? I am planning to open a call with Lenovo and/or IBM as I&#8217;m not quite sure where to look next.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Luke<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">--
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Luke Sudbery<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Principal Engineer (HPC and Storage).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Architecture, Infrastructure and Systems<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Advanced Research Computing, IT Services<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Room 132, Computer Centre G5, Elms Road<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Please note I don&#8217;t work on Monday.<o:p></o:p></span></b></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</body>
</html>