<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
p.Code, li.Code, div.Code
{mso-style-name:Code;
mso-style-link:"Code Char";
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
background:#FFF2CC;
font-size:11.0pt;
font-family:"Courier New";
mso-fareast-language:EN-US;}
span.CodeChar
{mso-style-name:"Code Char";
mso-style-link:Code;
font-family:"Courier New";
background:#FFF2CC;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">On further investigation the command does eventually complete, after 11 minutes rather than a couple of seconds.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">[root@rds-pg-dssg01 ~]# time mmvdisk pdisk list --rg rds_er_dssg02 --not-ok</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">mmvdisk: All pdisks of recovery group 'rds_er_dssg02' are ok.</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="Code"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">real 11m14.106s</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">user 0m1.430s</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">sys 0m0.555s</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="Code"><span style="color:black;mso-fareast-language:EN-GB">[root@rds-pg-dssg01 ~]#</span><span style="mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Looking at the process tree, the bits that hang are:<o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# time tslsrecgroup rds_er_dssg02 -Y --v2 --failure-domain</span><o:p></o:p></p>
<p class="Code"><span style="color:black">Failed to connect to file system daemon: Connection timed out</span><o:p></o:p></p>
<p class="Code"><o:p> </o:p></p>
<p class="Code"><span style="color:black">real 5m30.181s</span><o:p></o:p></p>
<p class="Code"><span style="color:black">user 0m0.001s</span><o:p></o:p></p>
<p class="Code"><span style="color:black">sys 0m0.003s</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">and then<o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# time tslspdisk --recovery-group rds_er_dssg02 --notOK</span><o:p></o:p></p>
<p class="Code"><span style="color:black">Failed to connect to file system daemon: Connection timed out</span><o:p></o:p></p>
<p class="Code"><o:p> </o:p></p>
<p class="Code"><span style="color:black">real 5m30.247s</span><o:p></o:p></p>
<p class="Code"><span style="color:black">user 0m0.003s</span><o:p></o:p></p>
<p class="Code"><span style="color:black">sys 0m0.002s</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]#</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Which adds up to the 11 minutes.... then it does something else and just works. Or maybe it <i>doesn't work</i> and just wouldn't report any failed disks is there were any….<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">While hanging, the ts commands appear to be LISTENing, not attempting to make connections:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# pidof tslspdisk</span><o:p></o:p></p>
<p class="Code"><span style="color:black">2156809</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# netstat -apt | grep 2156809</span><o:p></o:p></p>
<p class="Code"><span style="color:black">tcp 0 0 0.0.0.0:60000 0.0.0.0:* LISTEN 2156809/tslspdisk</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]#</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Port 60000 is the lowest of our <span class="CodeChar"><span style="color:black">tscCmdPortRange</span></span>.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Don’t know if that helps anyone….<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Cheers,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Luke<o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">--
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Luke Sudbery<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Principal Engineer (HPC and Storage).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Architecture, Infrastructure and Systems<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Advanced Research Computing, IT Services<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Room 132, Computer Centre G5, Elms Road<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Please note I don’t work on Monday.<o:p></o:p></span></b></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:EN-GB">From:</span></b><span lang="EN-US" style="mso-fareast-language:EN-GB"> gpfsug-discuss <gpfsug-discuss-bounces@gpfsug.org>
<b>On Behalf Of </b>Luke Sudbery<br>
<b>Sent:</b> 17 March 2023 15:11<br>
<b>To:</b> gpfsug-discuss@gpfsug.org<br>
<b>Subject:</b> [gpfsug-discuss] mmvdisk version/communication issues?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hello,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We 3 Lenovo DSSG “Building Blocks” as they call them – 2x GNR server pairs.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We’ve just upgraded the 1st of them from 3.2a (GPFS 5.1.1.0) to 4.3a (5.1.5.1 efix 20).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Now the older systems can’t communicated with the newer in certain circumstances, specifically querying recovery groups hosted on other servers.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">It works old->old, new->old and new->new but not old->new.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If fairly sure it is not a TCP comms problem. I can ssh between the node as root and as the GPFS
<span class="CodeChar"><span style="color:black">sudoUser</span></span>. Port 1191 and the
<span class="CodeChar"><span style="color:black">tscCmdPortRange</span></span> are open and accessible in both direction between the nodes. There are connections present between the nodes in netstat and in
<span class="CodeChar"><span style="color:black">mmfsd.latest.log</span></span>. No pending message (to that node) in
<span class="CodeChar"><span style="color:black">mmdiag --network.</span></span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In these examples <span class="CodeChar"><span style="color:black">rds-er-dssg01/2</span></span> are upgraded,
<span class="CodeChar"><span style="color:black">rds-pg-dssg01/2</span></span> are downlevel:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="Code"><span style="color:black">[root@rds-er-dssg01 ~]# mmvdisk pdisk list --rg rds_er_dssg02 --not-ok # New to new</span><o:p></o:p></p>
<p class="Code"><span style="color:black">mmvdisk: All pdisks of recovery group 'rds_er_dssg02' are ok.</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-er-dssg01 ~]# mmvdisk pdisk list --rg rds_pg_dssg02 --not-ok # New to old</span><o:p></o:p></p>
<p class="Code"><span style="color:black">mmvdisk: All pdisks of recovery group 'rds_pg_dssg02' are ok.</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-er-dssg01 ~]#</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# mmvdisk pdisk list --rg rds_pg_dssg02 --not-ok # Old to old</span><o:p></o:p></p>
<p class="Code"><span style="color:black">mmvdisk: All pdisks of recovery group 'rds_pg_dssg02' are ok.</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]# mmvdisk pdisk list --rg rds_er_dssg02 --not-ok # Old to new [HANGS]</span><o:p></o:p></p>
<p class="Code"><span style="color:black">^Cmmvdisk: Command failed. Examine previous error messages to determine cause.</span><o:p></o:p></p>
<p class="Code"><span style="color:black">[root@rds-pg-dssg01 ~]#</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Has anyone come across this? mmvdisk should work across slightly different versions of 5.1, right? No recovery group, cluster or filesystem versions have been changed yet.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We will also log a ticket with snaps and more info but wondered if anyone had seen this.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">And while this particular command is not a major issue, we don’t know what else it may affect, before we proceed with the reset of the cluster.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Many thanks,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Luke<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">--
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Luke Sudbery<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Principal Engineer (HPC and Storage).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Architecture, Infrastructure and Systems<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Advanced Research Computing, IT Services<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Room 132, Computer Centre G5, Elms Road<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:9.0pt;color:#1F497D;mso-fareast-language:EN-GB">Please note I don’t work on Monday.<o:p></o:p></span></b></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>