<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>So as you might expect, we've been poking at this all day.</div>
<div><br>
</div>
<div>We'd typically get to ~1000 entries in the queue having taken access to the FS away from users (yeah its that bad), but the remaining items would stay
<span style="font-weight: bold;">for ever</span> as far as we could see. By copying the file, removing and then moving the copied file, we're able to get it back into a clean state. But then we ran a sample user job, and instantly the next job hung up the queue
 (we're talking like <100MB files here).</div>
<div><br>
</div>
<div>Interestingly we looked at the queue to see what was going on (with saferdump, always use saferdump!!!)</div>
<div><br>
</div>
<div>
<div>  Normal Queue:  (listed by execution order) (state: Active)</div>
<div>      95 Write [6060026.6060026] inflight (18 @ 0) thread_id 44812</div>
<div>      96 Write [13808655.13808655] queued (18 @ 0)</div>
<div>      97 Truncate [6060026] queued</div>
<div>      98 Truncate [13808655] queued</div>
<div>     124 Write [6060000.6060000] inflight (18 @ 0) thread_id 44835</div>
<div>     125 Truncate [6060000] queued</div>
<div>     159 Write [6060013.6060013] inflight (18 @ 0) thread_id 21329</div>
<div>     160 Truncate [6060013] queued</div>
<div>     171 Write [5953611.5953611] inflight (18 @ 0) thread_id 44837</div>
<div>     172 Truncate [5953611] queued</div>
</div>
<div><br>
</div>
<div>Note that each inode that is inflight is followed by a queued Truncate... We are running efix2, because there is an issue with truncate not working prior to this (it doesn't get sent to home), so this smells like an AFM bug to me.</div>
<div><br>
</div>
<div>We have a PMR open...</div>
<div><br>
</div>
<div>Simon</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style="font-weight:bold">From: </span><<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a>> on behalf of "Leo Earl (ITCS - Staff)" <<a href="mailto:Leo.Earl@uea.ac.uk">Leo.Earl@uea.ac.uk</a>><br>
<span style="font-weight:bold">Reply-To: </span>"<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>" <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>><br>
<span style="font-weight:bold">Date: </span>Tuesday, 10 October 2017 at 16:29<br>
<span style="font-weight:bold">To: </span>"<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>" <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [gpfsug-discuss] AFM fun (more!)<br>
</div>
<div><br>
</div>
<div xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
tt
        {mso-style-priority:99;
        font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle20
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi Simon,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">(My first ever post – queue being shot down in flames)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Whilst this doesn’t answer any of your questions (directly)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">One thing we do tend to look at when we see (what appears) to be a static “Queue Length” value, is the data which is actually inflight
 from an AFM perspective, so that we can ascertain whether the reason is something like, a user writing huge files to the cache, which take time to sync with home, and thus remain in the queue, providing a static “Queue Length”<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server ~]# mmafmctl gpfs getstate | awk ' $6 >=50 '<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Fileset Name    Fileset Target                                     Fileset State  Gateway Node    Queue State    Queue Length   Queue
 numExec<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">afmcancergenetics <IP here>:/leo/res_hpc/afm-leo Dirty           csgpfs01                Active                60                        126157822<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server ~]#<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">So for instance, by using the tsfindinode command, to have a look at the size of the file which is currently “inflight” from an AFM
 perspective:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server ~]#  mmfsadm dump afm | more<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  Fileset: afm-leo 12 (gpfs)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  mode: independent-writer queue: Normal myFileset  MDS: <c0n334><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  home: <IP> proto: nfs port: 2049   lastCmd: 6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  handler: Mounted Dirty     refCount: 5<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  queue: delay 300 QLen 0+9 flushThds 4 maxFlushThds 4 numExec 436 qfs 0 err 0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728  pReadThreads 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  i/o: pReadGWs 0 (All) pReadChunkSize 134217728 pReadThresh: -2 >> Disabled <<<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  i/o: prefetchThresh 0 (Prefetch)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  iw: afmIwTakeoverTime 0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Priority Queue:  (listed by execution order) (state: Active)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601414711.601379492] inflight  (377743888481 @ 0) chunks 0 bytes 0 0 thread_id 7630<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601717612.601465868] inflight  (462997479227 @ 0) chunks 0 bytes 0 1 thread_id 10200<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601717612.601465870] inflight  (391663667550 @ 0) chunks 0 bytes 0 2 thread_id 10287<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601717612.601465871] inflight  (377743888481 @ 0) chunks 0 bytes 0 3 thread_id 10333<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601717612.601573418] queued  (387002794104 @ 0) chunks 0 bytes 0 4<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    Write [601414711.601650195] queued  (342305480538 @ 0) chunks 0 bytes 0 5<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    ResetDirty [538455296.-1] queued etype normal normal  19061213<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    ResetDirty [601623334.-1] queued etype normal normal  19061213<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">    RecoveryMarker [-1.-1] queued etype normal normal  19061213<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">  Normal Queue:  Empty (state: Active)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Fileset: afmdata 20 (gpfs)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">#Use the file inode ID to determine the actual file which is inflight between cache and home<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server cancergenetics]# tsfindinode  -i 601379492 /gpfs/afm/cancergenetics > inode.out<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server ~]# cat /root/inode.out<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">601379492               0      0xCCB6  /gpfs/afm/cancergenetics/Claudia/fastq/PD7446i.fastq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server ~]# ls -l /gpfs/afm/cancergenetics/Claudia/fastq/PD7446i.fastq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">-rw-r--r-- 1 bbn16cku MED_pg 377743888481 Mar 25 05:48 /gpfs/afm/cancergenetics/Claudia/fastq/PD7446i.fastq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">[root@server<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I am not sure if that helps and you probably already know about it inflight checking…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D">Kind Regards,<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D"><o:p> </o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D">Leo<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D"><o:p> </o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D">Leo Earl </span></b><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#A6A6A6">|</span></b><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#1F497D"> </span></b><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#7F7F7F">Head
 of </span></b><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#7F7F7F">Research & Specialist Computing
<o:p></o:p></span></b></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Room ITCS 01.16, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:#1F497D">+44 (0) 1603 593856<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a> [<a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">mailto:gpfsug-discuss-bounces@spectrumscale.org</a>]
<b>On Behalf Of </b>Venkateswara R Puvvada<br>
<b>Sent:</b> 10 October 2017 05:56<br>
<b>To:</b> gpfsug main discussion list <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>><br>
<b>Subject:</b> Re: [gpfsug-discuss] AFM fun (more!)<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial",sans-serif">Simon,</span><br>
<br>
<tt><span style="font-size:10.0pt">>Question 1.</span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>>Can we force the gateway node for the other file-sets to our "02" node.</tt><br>
<tt>>I.e. So that we can get the queue services for the other filesets.</tt><br>
</span><br>
<tt><span style="font-size:10.0pt">AFM automatically maps the fileset to gateway node, and today there is no option available for users to assign fileset to a particular gateway node. This feature will be supported in future releases.</span></tt><br>
<span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>>Question 2.</tt><br>
<tt>>How can we make AFM actually work for the "facility" file-set. If we shut</tt><br>
<tt>>down GPFS on the node, on the secondary node, we'll see log entires like:</tt><br>
<tt>>2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove</tt><br>
<tt>>operations...</tt><br>
<tt>>So I'm assuming the massive queue is all file remove operations?</tt><br>
</span><br>
<tt><span style="font-size:10.0pt">These are the files which were created in cache, and were deleted before they get replicated to home. AFM recovery will delete them locally. Yes, it is possible that most of these operations are local remove operations.Try
 finding those operations using dump command.</span></tt><br>
<br>
<tt><span style="font-size:10.0pt"> mmfsadm saferdump afm all | grep 'Remove\|Rmdir' | grep local | wc -l</span></tt><br>
<br>
<span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>>Alarmingly, we are also seeing entires like:</tt><br>
<tt>>2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache</tt><br>
<tt>>fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name  remote</tt><br>
<tt>>error 5</tt><br>
</span><br>
<tt><span style="font-size:10.0pt">Traces are needed to verify IO errors. Also try disabling the parallel IO and see if replication speed improves.</span></tt><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">mmchfileset device fileset -p afmParallelWriteThreshold=disable</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">~Venkat (<a href="mailto:vpuvvada@in.ibm.com">vpuvvada@in.ibm.com</a>)</span><br>
<br>
<br>
<br>
<span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">From:        </span><span style="font-size:7.5pt;font-family:"Arial",sans-serif">"Simon Thompson (IT Research Support)" <<a href="mailto:S.J.Thompson@bham.ac.uk">S.J.Thompson@bham.ac.uk</a>></span><br>
<span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">To:        </span><span style="font-size:7.5pt;font-family:"Arial",sans-serif">"<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>" <<a href="mailto:gpfsug-discuss@spectrumscale.org">gpfsug-discuss@spectrumscale.org</a>></span><br>
<span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Date:        </span><span style="font-size:7.5pt;font-family:"Arial",sans-serif">10/09/2017 06:27 PM</span><br>
<span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Subject:        </span><span style="font-size:7.5pt;font-family:"Arial",sans-serif">[gpfsug-discuss] AFM fun (more!)</span><br>
<span style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Sent by:        </span><span style="font-size:7.5pt;font-family:"Arial",sans-serif"><a href="mailto:gpfsug-discuss-bounces@spectrumscale.org">gpfsug-discuss-bounces@spectrumscale.org</a></span><o:p></o:p></p>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="2" width="100%" noshade="" style="color:#A0A0A0" align="center">
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
<br>
<br>
<span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>Hi All,</tt><br>
<br>
<tt>We're having fun (ok not fun ...) with AFM.</tt><br>
<br>
<tt>We have a file-set where the queue length isn't shortening, watching it</tt><br>
<tt>over 5 sec periods, the queue length increases by ~600-1000 items, and the</tt><br>
<tt>numExec goes up by about 15k.</tt><br>
<br>
<tt>The queues are steadily rising and we've seen them over 1000000 ...</tt><br>
<br>
<tt>This is on one particular fileset e.g.:</tt><br>
<br>
<tt>mmafmctl rds-cache getstate</tt><br>
<tt>                      Mon Oct  9 08:43:58 2017</tt><br>
<br>
<tt>Fileset Name    Fileset Target                                Cache State</tt><br>
<tt>       Gateway Node    Queue Length   Queue numExec</tt><br>
<tt>------------    --------------</tt><br>
<tt>-------------        ------------    ------------   -------------</tt><br>
<tt>rds-projects-facility gpfs:///rds/projects/facility           Dirty</tt><br>
<tt>       bber-afmgw01    3068953        520504</tt><br>
<tt>rds-projects-2015 gpfs:///rds/projects/2015                   Active</tt><br>
<tt>       bber-afmgw01    0              3</tt><br>
<tt>rds-projects-2016 gpfs:///rds/projects/2016                   Dirty</tt><br>
<tt>       bber-afmgw01    1482           70</tt><br>
<tt>rds-projects-2017 gpfs:///rds/projects/2017                   Dirty</tt><br>
<tt>       bber-afmgw01    713            9104</tt><br>
<tt>bear-apps                 gpfs:///rds/bear-apps                         Dirty</tt><br>
<tt> bber-afmgw02    3              2472770871</tt><br>
<tt>user-homes                 gpfs:///rds/homes                             Active</tt><br>
<tt>  bber-afmgw02    0              19</tt><br>
<tt>bear-sysapps    gpfs:///rds/bear-sysapps                      Active</tt><br>
<tt>       bber-afmgw02    0              4</tt><br>
<br>
<br>
<br>
<tt>This is having the effect that other filesets on the same "Gateway" are</tt><br>
<tt>not getting their queues processed.</tt><br>
<br>
<tt>Question 1.</tt><br>
<tt>Can we force the gateway node for the other file-sets to our "02" node.</tt><br>
<tt>I.e. So that we can get the queue services for the other filesets.</tt><br>
<br>
<tt>Question 2.</tt><br>
<tt>How can we make AFM actually work for the "facility" file-set. If we shut</tt><br>
<tt>down GPFS on the node, on the secondary node, we'll see log entires like:</tt><br>
<tt>2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove</tt><br>
<tt>operations...</tt><br>
<br>
<tt>So I'm assuming the massive queue is all file remove operations?</tt><br>
<br>
<tt>Alarmingly, we are also seeing entires like:</tt><br>
<tt>2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache</tt><br>
<tt>fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name  remote</tt><br>
<tt>error 5</tt><br>
<br>
<tt>Anyone any suggestions?</tt><br>
<br>
<tt>Thanks</tt><br>
<br>
<tt>Simon</tt><br>
<br>
<br>
<tt>_______________________________________________</tt><br>
<tt>gpfsug-discuss mailing list</tt><br>
<tt>gpfsug-discuss at spectrumscale.org</tt><br>
</span><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=_THXlsTtzTaQQnCD5iwucKoQnoVZmXwtZksU6YDO5O8&s=LlIrCk36ptPJs1Oix2ekZdUAMcH7ZE7GRlKzRK1_NPI&e="><tt><span style="font-size:10.0pt">https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=_THXlsTtzTaQQnCD5iwucKoQnoVZmXwtZksU6YDO5O8&s=LlIrCk36ptPJs1Oix2ekZdUAMcH7ZE7GRlKzRK1_NPI&e=</span></tt></a><span style="font-size:10.0pt;font-family:"Courier New""><br>
<br>
</span><br>
<br>
<o:p></o:p></p>
</div>
</div>
</div>
</span>
</body>
</html>