[gpfsug-discuss] AFM fun (more!)
Simon Thompson (IT Research Support)
S.J.Thompson at bham.ac.uk
Mon Oct 9 13:57:08 BST 2017
Hi All,
We're having fun (ok not fun ...) with AFM.
We have a file-set where the queue length isn't shortening, watching it
over 5 sec periods, the queue length increases by ~600-1000 items, and the
numExec goes up by about 15k.
The queues are steadily rising and we've seen them over 1000000 ...
This is on one particular fileset e.g.:
mmafmctl rds-cache getstate
Mon Oct 9 08:43:58 2017
Fileset Name Fileset Target Cache State
Gateway Node Queue Length Queue numExec
------------ --------------
------------- ------------ ------------ -------------
rds-projects-facility gpfs:///rds/projects/facility Dirty
bber-afmgw01 3068953 520504
rds-projects-2015 gpfs:///rds/projects/2015 Active
bber-afmgw01 0 3
rds-projects-2016 gpfs:///rds/projects/2016 Dirty
bber-afmgw01 1482 70
rds-projects-2017 gpfs:///rds/projects/2017 Dirty
bber-afmgw01 713 9104
bear-apps gpfs:///rds/bear-apps Dirty
bber-afmgw02 3 2472770871
user-homes gpfs:///rds/homes Active
bber-afmgw02 0 19
bear-sysapps gpfs:///rds/bear-sysapps Active
bber-afmgw02 0 4
This is having the effect that other filesets on the same "Gateway" are
not getting their queues processed.
Question 1.
Can we force the gateway node for the other file-sets to our "02" node.
I.e. So that we can get the queue services for the other filesets.
Question 2.
How can we make AFM actually work for the "facility" file-set. If we shut
down GPFS on the node, on the secondary node, we'll see log entires like:
2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove
operations...
So I'm assuming the massive queue is all file remove operations?
Alarmingly, we are also seeing entires like:
2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache
fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name remote
error 5
Anyone any suggestions?
Thanks
Simon
More information about the gpfsug-discuss
mailing list