[gpfsug-discuss] AFM - how to update directories with deleted files during prefetch

Billich Heinrich Rainer (PSI) heiner.billich at psi.ch
Fri Jun 30 11:07:10 BST 2017


Hello

I have a short question about AFM prefetch and some more remarks regarding AFM and it’s use for data migration. I understand that many  of you have done this for very large amounts of data and number of files. I would welcome an input, comments or remarks. Sorry if this is a bit too long for a mailing list.

Short:
How can I tell an AFM cache  to update a directory when I do prefetch? I know about ‘find .’ or ‘ls –lsR’ but this really is no option for us as it takes too long. Mostly I want to update the directories to make AFM cache aware of file deletions on home. On home I can use a policy run to find all directories which changed since the last update and pass them to prefetch on AFM cache.  

I know that I can find some workaround based on the directory list, like an ‘ls –lsa’ just for those directories, but this doesn’t sound very efficient. And depending on cache effects and timeout settings it may work or not (o.k. – it will work most time).

We do regular file deletions and will accumulated millions of deleted files on cache over time if we don’t update the directories to make AFM cache aware of the deletion.

Background:
We will use AFM to migrate data on filesets to another cluster. We have to do this several times in the next few months, hence I want to get a reliable and easy to use procedure. The old system is home, the new system is a read-only AFM cache. We want to use ‘mmafmctl prefetch’ to move the data. Home will be in use while we run the migration. Once almost all data is moved we do a (short) break for a last sync and make the read-only AFM cache a ‘normal’ fileset. During the break I want  to use policy runs and prefetch only and no time consuming ‘ls –lsr’ or ‘find .’ I don’t want to use this metadata intensive posix operation during operation, either.

More general:
AFM can be used for data migration. But I don’t see how to use it efficiently. How to do incremental transfers, how to ensure that the we really have identical copies before we switch and how to keep the switch time short , i.e. the time when both old and new aren’t accessible for clients,

Wish – maybe an RFE?
I can use policy runs to collect all changed items on home since the last update. I wish that I can pass this list to afm prefetch to do all updates on AFM cache, too. Same as backup tools use the list to do incremental backups.

And a tool to create policy lists of home and cache and to compare the lists would be nice, too. As you do this during the break/switch it should be fast and reliable and leave no doubts.

Kind regards,

Heiner



More information about the gpfsug-discuss mailing list