[gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?

Billich Heinrich Rainer (ID SD) heinrich.billich at id.ethz.ch
Wed Jan 8 17:02:18 GMT 2020


Hello,


still new to AFM, so some basic question on how Recovery works for a SW cache:

we have an AFM SW cache in recovery mode – recovery first did run policies on the cache cluster, but now I see a ‘tcpcachescan’ process on cache slowly scanning home via nfs. Single host, single process, no parallelism as far as I can see, but I may be wrong. This scan of home on a cache afmgateway takes very long while further updates on cache queue up. Home has about 100M files. After 8hours I see about 70M entries in the file /var/mmfs/afm/…/recovery/homelist, i.e. we get about 2500 lines/s.  (We may have very many changes on cache due to some recursive ACL operations, but I’m not sure.)

So I expect that 12hours pass to buildup filelists before recovery starts to update home. I see some risk: In this time new changes pile up on cache. Memory may become an issue? Cache may fill up and we can’t evict?

I wonder

  *   Is this to be expected and normal behavior?  What to do about it?
  *   Will every reboot of a gateway node trigger a recovery of all afm filesets and a full scan of home? This would make normal rolling updates  very unpractical, or is there some better way?

Home is a gpfs cluster, hence we easily could produce the needed filelist on home with a policyscan in a few minutes.

Thank you, I will welcome and clarification, advice or comments.

Kind regards,

Heiner
.

--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200108/0a41b42d/attachment-0001.htm>


More information about the gpfsug-discuss mailing list