[gpfsug-discuss] Using AFM to migrate files. (Peter Childs) (Peter Childs)

Loic Tortay tortay at cc.in2p3.fr
Sat Oct 22 11:45:23 BST 2016


On 21/10/2016 20:46, Bill Pappas wrote:
[...]
>
> If you are using GPFS as the conduit between the home and cache
> (i.e.  no NFS), I would still ask the same question, more with respect to
> stability for large file lists during the initial prefetch stages.
>
Hello,
I'm in the final stage of what the AFM documentation calls an 
"incremental migration", for a filesystem with 100 million files. (GPFS 
4.1.1, single cluster migration, "hardware/filesystem refresh" use case)

I initially tried to use the NFS transport but found it too unreliable 
(and, in my opinion, very poorly documented).
As I was about to give up on AFM, I tried using the GPFS transport 
(after seeing a trivially simple example on slides by someone from ANL) 
and things just started to work (almost) as I expected.

For the files lists, I use data produced for our monitoring system that 
relies on snapshots fast scans (we do daily statistics on all objects in 
our GPFS filesystems).
Our data gathering tool encodes object names in the RFC3986 (URL 
encoding) format which is what I found "mmafmctl prefetch" expects for 
"special" filenames. I understand that the policy engine does this too 
which, I guess, is what the documentation means by "generate a file list 
using policy" (sic), yet "mmafmctl prefetch" does not seem to 
accept/like files produced by a simple "LIST" policy (and the 
documentation lacks an example).

As you did, I found that trying to prefetch large lists of files does 
not work reliably. I remember reading on that list someone (from IBM 
Germany, I think) recommanding to limit the number of a files in a 
single prefetch to 2 millions. This appears to be the sweet spot for my 
needs, as I can split the files list in 2 millions parts (the largest 
fileset in the "home" filesystem has 26 million files) and at the same 
time manage the issues I mention below.

To keep up with the updates on the "home" filesystem (modified files), I 
rely on the "gpfs_winflags" GPFS extended attribute (the 
GPFS_WINATTR_OFFLINE bit is on for modified files, see "mmlsattr -L 
/cachefs/objectname" output).
By chance, this attribute is included in the files produced for our 
statistics. This allows us to avoid doing a prefetch of all the files 
"continuously", since the file scan indeed appears to use only the 
(single) gateway node for the fileset being prefetched.

In my specific configuration/environment, there are still several issues:
. There is a significant memory and "slab" leak on the gateway nodes
   which can easily lead to a completely unreachable gateway node.
   These leaks appear directly related to the number of files
   prefetched. Stoping GPFS on the gateway node only releases some of
   the memory but none of the "slab", which requires a node reboot.
. There is also a need to increase the "fs.file-max" sysctl on the
   gateway nodes to a value larger than the default (I use 10M), to
   avoid the kernel running out of file descriptors, since this leads to
   node being unreachable too.
. Sometimes, an AFM association will go to the "Unmounted" state (for
   no apparent reason). The only reliable way to bring it back to
   "Active" state I found is to : unmount the "cache" filesystem from
   all nodes mouting it, unmounting/remounting the "home" filesystem on
   the gateway nodes, then remounting the "cache" filesystem where it is
   needed (gateway nodes, etc.) and finally using "ls -la" in the
   "cache" filesystem to bring the AFM associations back into the Active
   state.  As I'm doing an "incremental migration", the "home" fileystem
   is still actively used and unmounting it on all nodes (as suggested
   by the documentation) is not an option.
. I include directories and symlinks in the file lists for the
   prefetch. This ensures symlinks targets are there without needing a
   "ls" or "stat" in the "cache" filesystem ("incremental migration").
. The only reliable way I found to have "mmafmctl prefetch" accept
   files lists is to use the "--home-list-file" & "--home-fs-path"
   options.

In my experience, in my environment, using AFM for migrations requires a 
significant amount of work and hand-holding to get a useful result.
Since this migration is actually only the first step in a extensive 
multiple filesystems migration/merge plan, I'm pondering wether I'll use 
AFM for the rest of the operations.


Sorry, this mail is too long,
Loïc.
-- 
|     Loïc Tortay <tortay at cc.in2p3.fr>  -  IN2P3 Computing Centre      |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2931 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161022/399a1321/attachment-0002.bin>


More information about the gpfsug-discuss mailing list