<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri",sans-serif;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">I was wondering why that 0 was left on that line alone… hahaha,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">-B<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> gpfsug-discuss-bounces@spectrumscale.org [mailto:gpfsug-discuss-bounces@spectrumscale.org]
<b>On Behalf Of </b>Simon Thompson (IT Research Support)<br>
<b>Sent:</b> Thursday, May 11, 2017 1:05 PM<br>
<b>To:</b> gpfsug main discussion list <gpfsug-discuss@spectrumscale.org><br>
<b>Subject:</b> Re: [gpfsug-discuss] Edge case failure mode<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Cheers Bryan ...<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><a href="http://goo.gl/YXitIF">http://goo.gl/YXitIF</a><o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Points to: (Outlook/mailing list is line breaking and cutting the trailing 0)<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><a href="https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030">https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030</a><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black">Simon<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="color:black">From: </span></b><span style="color:black"><<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a>> on behalf of "<a href="mailto:bbanister@jumptrading.com">bbanister@jumptrading.com</a>"
<<a href="mailto:bbanister@jumptrading.com">bbanister@jumptrading.com</a>><br>
<b>Reply-To: </b>"<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>" <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>><br>
<b>Date: </b>Thursday, 11 May 2017 at 18:58<br>
<b>To: </b>"<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>" <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>><br>
<b>Subject: </b>Re: [gpfsug-discuss] Edge case failure mode<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<div>
<p class="MsoPlainText"><span style="color:black">Hey Simon,<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">I clicked your link but I think it went to a page that is not about this RFE:<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"><img border="0" width="824" height="447" id="Picture_x0020_1" src="cid:image001.png@01D2CA59.65CF7300"><o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Cheers,<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">-Bryan<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">-----Original Message-----<br>
From: <a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a> [<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">mailto:gpfsug-discuss-bounces@spectrumscale.org</a>] On Behalf Of Simon Thompson (IT Research
Support)<br>
Sent: Thursday, May 11, 2017 12:49 PM<br>
To: <a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a><br>
Subject: [gpfsug-discuss] Edge case failure mode<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Just following up on some discussions we had at the UG this week. I<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">mentioned a few weeks back that we were having issues with failover of<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">NFS, and we figured a work around to our clients for this so that failover<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">works great now (plus there is some code fixes coming down the line as<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">well to help).<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Here's my story of fun with protocol nodes ...<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Since then we've occasionally been seeing the load average of 1 CES node<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">A lot of digging and we found that CTDB was reporting > 80% memory used,<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">so we tweaked the page pool down to solve this.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Great we thought ... But alas that wasn't the cause.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Just to be clear 95% of the time, the CES node is fine, I can do and ls in<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">the mounted file-systems and all is good. When the load rises to 400, an<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">ls takes 20-30 seconds, so they are related, but what is the initial<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">cause? Other CES nodes are 100% fine and if we do mmces node suspend, and<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">then resume all is well on the node (and no other CES node assumes the<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">problem as the IP moves). Its not always the same CES IP, node or even<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">data centre, and most of the time is looks fine.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">I logged a ticket with OCF today, and one thing they suggested was to<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">disable NFSv3 as they've seen similar behaviour at another site. As far as<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">I know, all my NFS clients are v4, but sure we disable v3 anyway as its<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">not actually needed. (Both at the ganesha layer, change the default for<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">exports and reconfigure all existing exports to v4 only for good measure).<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">That didn't help, but certainly worth a try!<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Note that my CES cluster is multi-cluster mounting the file-systems and<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">from the POSIX side, its fine most of the time.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">We've used the mmnetverify command to check that all is well as well. Of<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">course this only checks the local cluster, not remote nodes, but as we<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">aren't seeing expels and can access the FS, we assume that the GPFS layer<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">is working fine.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">So we finally log a PMR with IBM, I catch a node in a broken state and<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">pull a trace from it and upload that, and ask what other traces they might<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">want (apparently there is no protocol trace for NFS in 4.2.2-3).<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Now, when we run this, I note that its doing things like mmlsfileset to<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">the remote storage, coming from two clusters and some of this is timing<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">out. We've already had issues with rp_filter on remote nodes causing<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">expels, but the storage backend here has only 1 nic, and we can mount and<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">access it all fine.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">GPFS ping of course), but not make "admin" calls to it. Ssh appears to<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">work fine as well BTW to it.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">So I check on my CES and this is multi-homed and rp_filter is enabled.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">sure I'm an edge case, but it would be REALLY REALLY helpful to get<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">mmnetverify to work across a cluster (e.g. I say this is a remote node and<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">here's its FQDN, can you talk to it) which would have helped with<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">diagnosis here. I'm not entirely sure why ssh etc would work and pass<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">rp_filter, but not GPFS traffic (in some cases apparently), but I guess<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">its something to do with how GPFS is binding and then the kernel routing<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">layer.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">I'm still not sure if this is my root cause as the occurrences of the high<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">load are a bit random (anything from every hour to being stable for 2-3<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">days), but since making the rp_filter change this afternoon, so far ...?<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">I've created an RFE for mmnetverify to be able to test across a cluster...<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"><a href="https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503"><span style="color:windowtext;text-decoration:none">https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503</span></a><o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">0<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Simon<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"> <o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">_______________________________________________<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">gpfsug-discuss mailing list<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">gpfsug-discuss at spectrumscale.org<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"><a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss"><span style="color:windowtext;text-decoration:none">http://gpfsug.org/mailman/listinfo/gpfsug-discuss</span></a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;color:black"><o:p> </o:p></span></p>
<div class="MsoNormal" align="center" style="text-align:center"><span style="font-size:10.5pt;color:black">
<hr size="2" width="100%" align="center">
</span></div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:gray"><br>
Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this
email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness
or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial
product.</span><span style="font-size:10.5pt;color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
<br>
<hr>
<font face="Arial" color="Gray" size="1"><br>
Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this
email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness
or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial
product.<br>
</font>
</body>
</html>