<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:"Cascadia Code";
panose-1:2 11 6 9 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;
mso-fareast-language:EN-US;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:12.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;
mso-fareast-language:EN-US;}
span.EmailStyle19
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:939216039;
mso-list-type:hybrid;
mso-list-template-ids:-1044494532 134807553 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style>
</head>
<body lang="EN-GB" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Is anyone using RoCE with good results? We are planning on it, but initial tests are not great – we get much better performance using plain Ethernet over the exact same links.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">It’s up and working, I can see RDMA connections and counters, no errors, but performance is unstable. And worse than Ethernet, which was just meant to be a sanity check!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Things I’ve looked at based on Lenovo and IBM guides, which I think are all configured correctly:<o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">RoCE interfaces all on the same subnet<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">They all have IPv6 enabled with addresses using eui64 addr-gen-mode<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">DSCP trust mode on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">PFC flow control on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Global Pause disabled on NICs<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">ToS configured for RDMA_CM<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Source based routing for multiple interfaces on the same subnet.<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1"><span style="font-size:11.0pt">Switches (nvidia cumulus) all enabled for RoCE QOS
<o:p></o:p></span></li></ul>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line speed of the NIC in question – 25Gbps. Testing basic RDMA connections with ib_send_bw gets about the same. But GPFS over RoCE gets
from 0.7GB/s to 1.9GB/s.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What’s frustrating and confusing is that we get better performance when we just enable 1 card at the server end, and also get better performance
if we have 1 fabric ID per NIC on the server (with all 4 fabric ID on the same NIC at the client end).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I can go into more details if anyone has experience! Does this sound familiar to anyone? I am planning to open a call with Lenovo and/or IBM as I’m not quite sure where to look next.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Luke<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">--
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Luke Sudbery<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Principal Engineer (HPC and Storage).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Architecture, Infrastructure and Systems<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Advanced Research Computing, IT Services<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Room 132, Computer Centre G5, Elms Road<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:9.0pt;color:#1F497D;mso-ligatures:none;mso-fareast-language:EN-GB">Please note I don’t work on Monday.<o:p></o:p></span></b></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>