<font size=2 face="sans-serif">Simon,</font><br><br><tt><font size=2>>Question 1.<br>>Can we force the gateway node for the other file-sets to our "02"
node.<br>>I.e. So that we can get the queue services for the other filesets.<br></font></tt><br><tt><font size=2>AFM automatically maps the fileset to gateway node,
and today there is no option available for users to assign fileset to a
particular gateway node. This feature will be supported in future releases.</font></tt><br><tt><font size=2><br>>Question 2.<br>>How can we make AFM actually work for the "facility" file-set.
If we shut<br>>down GPFS on the node, on the secondary node, we'll see log entires
like:<br>>2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove<br>>operations...<br>>So I'm assuming the massive queue is all file remove operations?<br></font></tt><br><tt><font size=2>These are the files which were created in cache, and
were deleted before they get replicated to home. AFM recovery will delete
them locally. Yes, it is possible that most of these operations are local
remove operations.Try finding those operations using dump command.</font></tt><br><br><tt><font size=2> mmfsadm saferdump afm all | grep 'Remove\|Rmdir'
| grep local | wc -l</font></tt><br><br><tt><font size=2><br>>Alarmingly, we are also seeing entires like:<br>>2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache<br>>fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name remote<br>>error 5<br></font></tt><br><tt><font size=2>Traces are needed to verify IO errors. Also try disabling
the parallel IO and see if replication speed improves.</font></tt><br><br><font size=2 face="sans-serif">mmchfileset device fileset -p afmParallelWriteThreshold=disable</font><br><br><font size=2 face="sans-serif">~Venkat (vpuvvada@in.ibm.com)</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">"Simon Thompson
(IT Research Support)" <S.J.Thompson@bham.ac.uk></font><br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">"gpfsug-discuss@spectrumscale.org"
<gpfsug-discuss@spectrumscale.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">10/09/2017 06:27 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">[gpfsug-discuss]
AFM fun (more!)</font><br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">gpfsug-discuss-bounces@spectrumscale.org</font><br><hr noshade><br><br><br><tt><font size=2><br>Hi All,<br><br>We're having fun (ok not fun ...) with AFM.<br><br>We have a file-set where the queue length isn't shortening, watching it<br>over 5 sec periods, the queue length increases by ~600-1000 items, and
the<br>numExec goes up by about 15k.<br><br>The queues are steadily rising and we've seen them over 1000000 ...<br><br>This is on one particular fileset e.g.:<br><br>mmafmctl rds-cache getstate<br>
Mon Oct 9 08:43:58 2017<br><br>Fileset Name Fileset Target
Cache
State<br> Gateway Node Queue Length
Queue numExec<br>------------ --------------<br>------------- ------------ ------------
-------------<br>rds-projects-facility gpfs:///rds/projects/facility
Dirty<br> bber-afmgw01 3068953
520504<br>rds-projects-2015 gpfs:///rds/projects/2015
Active<br> bber-afmgw01 0
3<br>rds-projects-2016 gpfs:///rds/projects/2016
Dirty<br> bber-afmgw01 1482
70<br>rds-projects-2017 gpfs:///rds/projects/2017
Dirty<br> bber-afmgw01 713
9104<br>bear-apps
gpfs:///rds/bear-apps
Dirty<br> bber-afmgw02 3
2472770871<br>user-homes
gpfs:///rds/homes
Active<br> bber-afmgw02 0
19<br>bear-sysapps gpfs:///rds/bear-sysapps
Active<br> bber-afmgw02 0
4<br><br><br><br>This is having the effect that other filesets on the same "Gateway"
are<br>not getting their queues processed.<br><br>Question 1.<br>Can we force the gateway node for the other file-sets to our "02"
node.<br>I.e. So that we can get the queue services for the other filesets.<br><br>Question 2.<br>How can we make AFM actually work for the "facility" file-set.
If we shut<br>down GPFS on the node, on the secondary node, we'll see log entires like:<br>2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove<br>operations...<br><br>So I'm assuming the massive queue is all file remove operations?<br><br>Alarmingly, we are also seeing entires like:<br>2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache<br>fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name remote<br>error 5<br><br>Anyone any suggestions?<br><br>Thanks<br><br>Simon<br><br><br>_______________________________________________<br>gpfsug-discuss mailing list<br>gpfsug-discuss at spectrumscale.org<br></font></tt><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=_THXlsTtzTaQQnCD5iwucKoQnoVZmXwtZksU6YDO5O8&s=LlIrCk36ptPJs1Oix2ekZdUAMcH7ZE7GRlKzRK1_NPI&e="><tt><font size=2>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=_THXlsTtzTaQQnCD5iwucKoQnoVZmXwtZksU6YDO5O8&s=LlIrCk36ptPJs1Oix2ekZdUAMcH7ZE7GRlKzRK1_NPI&e=</font></tt></a><tt><font size=2><br><br></font></tt><br><br><BR>