[gpfsug-discuss] Mass UID migration suggestions
Jonathan Buzzard
jonathan at buzzard.me.uk
Sat Jul 1 10:20:18 BST 2017
On 30/06/17 16:20, hpc-luke at uconn.edu wrote:
> Hello,
>
> We're trying to change most of our users uids, is there a clean way to
> migrate all of one users files with say `mmapplypolicy`? We have to change the
> owner of around 273539588 files, and my estimates for runtime are around 6 days.
>
> What we've been doing is indexing all of the files and splitting them up by
> owner which takes around an hour, and then we were locking the user out while we
> chown their files. I made it multi threaded as it weirdly gave a 10% speedup
> despite my expectation that multi threading access from a single node would not
> give any speedup.
>
> Generally I'm looking for advice on how to make the chowning faster. Would
> spreading the chowning processes over multiple nodes improve performance? Should
> I not stat the files before running lchown on them, since lchown checks the file
> before changing it? I saw mention of inodescan(), in an old gpfsug email, which
> speeds up disk read access, by not guaranteeing that the data is up to date. We
> have a maintenance day coming up where all users will be locked out, so the file
> handles(?) from GPFS's perspective will not be able to go stale. Is there a
> function with similar constraints to inodescan that I can use to speed up this
> process?
My suggestion is to do some development work in C to write a custom
program to do it for you. That way you can hook into the GPFS API to
leverage the fast file system scanning API. Take a look at the
tsbackup.C file in the samples directory. Obviously this is going to
require someone with appropriate coding skills to develop. On the other
hand given it is a one off and input is strictly controlled so error
checking is a one off, then couple hundred lines C tops.
My tip for this would be load the new UID's into a sparse array so you
can just use the current UID to index into the array for the new UID,
for speeding things up. It burns RAM but these days RAM is cheap and
plentiful and speed is the major consideration here.
This should in theory be able to do this in a few hours with this technique.
One thing to bear in mind is that once the UID change is complete you
will have to backup the entire file system again.
JAB.
--
Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
More information about the gpfsug-discuss
mailing list