<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><font size="2" color="#1f497d" face="Calibri"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">When we started using GPFS, 3.3 time frame, we had a lot of issues with running different meta-applications at
the same time.. snapshots, mmapplypolicy, mmdelsnapshot, etc. So we ended up using a locking mechanism around all of these to ensure that they were the only thing running at a given time. That mostly eliminated lock-ups, which were unfortunately common before
then. I haven’t tried removing it since.<o:p></o:p></span></font></p>
<p class="MsoNormal"><font size="2" color="#1f497d" face="Calibri"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></font></p>
<p class="MsoNormal"><font size="2" color="#1f497d" face="Calibri"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></font></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-left:.5in"><b><font size="2" face="Tahoma"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";font-weight:bold">From:</span></font></b><font size="2" face="Tahoma"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
gpfsug-discuss-bounces@spectrumscale.org [mailto:gpfsug-discuss-bounces@spectrumscale.org]
<b><span style="font-weight:bold">On Behalf Of </span></b>Howard, Stewart Jameson<br>
<b><span style="font-weight:bold">Sent:</span></b> Monday, December 07, 2015 12:24 PM<br>
<b><span style="font-weight:bold">To:</span></b> gpfsug-discuss@spectrumscale.org<br>
<b><span style="font-weight:bold">Subject:</span></b> Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting<o:p></o:p></span></font></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:.5in"><font size="3" face="Times New Roman"><span style="font-size:12.0pt"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Hi All,<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution
for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out.<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">An update on our situation: we have actually uncovered another clue since my last posting. One thing that
this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running
the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a.<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time.
I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not
been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago.<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem?<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Also, I would comment that this is not the only condition under which we see instability in the NFS layer.
We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far.<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Thanks so much to everyone for your help :)<o:p></o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black"><o:p> </o:p></span></font></p>
<p style="margin-left:.5in"><font size="3" color="black" face="Calibri"><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";color:black">Stewart<o:p></o:p></span></font></p>
</div>
</body>
</html>