[gpfsug-discuss] Change uidNumber and gidNumber for billions of files
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Tue Jun 9 12:20:45 BST 2020
On 08/06/2020 18:44, Lohit Valleru wrote:
> Hello Everyone,
>
> We are planning to migrate from LDAP to AD, and one of the best solution
> was to change the uidNumber and gidNumber to what SSSD or Centrify would
> resolve.
>
> May I know, if anyone has come across a tool/tools that can change the
> uidNumbers and gidNumbers of billions of files efficiently and in a
> reliable manner?
Not to my knowledge.
> We could spend some time to write a custom script, but wanted to know if
> a tool already exists.
>
If you can be sure that all files under a specific directory belong to a
specific user and you have no ACL's then a whole bunch of "chown -R"
would be reasonable. That is you have a lot of user home directories for
example.
What I do in these scenarios is use a small sqlite database, say in this
scenario which has the directory that I want to chown on, the target UID
and GID and a status field. Initially I set the status field to -1 which
indicates they have not been processed. The script sets the status field
to -2 when it starts processing an entry and on completion sets the
status field to the exit code of the command you are running. This way
when the script is finished you can see any directory hierarchies that
had a problem and if it dies early you can see where it got up to (that -2).
You can also do things like set all none zero status codes back to -1
and run again with a simple SQL update on the database from the sqlite CLI.
If you don't need to modify ACL's but have mixed ownership under
directory hierarchies then a script is reasonable but not a shell
script. The overhead of execing chown billions of times on individual
files will be astronomical. You need something like Perl or Python and
make use of the builtin chown facilities of the language to avoid all
those exec's. That said I suspect you will see a significant speed up
from using C.
If you have ACL's to contend with then I would definitely spend some
time and write some C code using the GPFS library. It will be a *LOT*
faster than any script ever will be. Dealing with mmpgetacl and mmputacl
in any script is horrendous and you will have billions of exec's of each
command.
As I understand it GPFS stores each ACL once and each file then points
to the ACL. Theoretically it would be possible to just modify the stored
ACL's for a very speedy update of all the ACL's on the
files/directories. However I would imagine you need to engage IBM and
bend over while they empty your wallet for that option :-)
The biggest issue to take care of IMHO is do any of the input UID/GID
numbers exist in the output set??? If so life just got a lot harder as
you don't get a second chance to run the script/program if there is a
problem.
In this case I would be very tempted to remove such clashes prior to the
main change. You might be able to do that incrementally before the main
switch and update your LDAP in to match.
Finally be aware that if you are using TSM for backup you will probably
need to back every file up again after the change of ownership as far as
I am aware.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list