[gpfsug-discuss] Change uidNumber and gidNumber for billions of files
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Tue Jun 9 14:57:08 BST 2020
On 09/06/2020 14:07, Stephen Ulmer wrote:
> Jonathan brings up a good point that you’ll only get one shot at this —
> if you’re using the file system as your record of who owns what.
Not strictly true if my existing UID's are in the range 10000-19999 and
my target UID's are in the range 50000-99999 for example then I get an
infinite number of shots at it.
It is only if the target and source ranges have any overlap that there
is a problem and that should be easy to work out in advance.
If it where me and there was overlap between input and output states I
would go via an intermediate state where there is no overlap. Linux has
had 32bit UID's since a very long time now (we are talking kernel
versions <1.0 from memory) so none overlapping mappings are perfectly
possible to arrange.
> With respect to that, it is surprising how easy the sqlite C API is to
> use (though I would still recommend Perl or Python), and equally
> surprising how *bad* the JOIN performance is. If you go with sqlite,
> denormalize *everything* as it’s collected. If that is too dirty for
> you, then just use MariaDB or something else.
I actually thinking on it more thought a generic C random UID/GID to
UID/GID mapping program is a really simple piece of code and should be
nearly as fast as chown -R. It will be very slightly slower as you have
to look the mapping up for each file. Read the mappings in from a CSV
file into memory and just use nftw/lchown calls to walk the file system
and change the UID/GID as necessary.
If you are willing to sacrifice some error checking on the input mapping
file (not unreasonable to assume it is good) and have some hard coded
site settings (avoiding processing command line arguments) then 200
lines of C tops should do it. Depending on how big your input UID/GID
ranges are you could even use array indexing for the mapping. For
example on our system the UID's start at just over 5000 and end just
below 6000 with quite a lot of holes. Just allocate an array of 6000
int's which is only ~24KB and off you go with something like
new_uid = uid_mapping[uid];
Nice super speedy lookup of mappings. If you need to manipulate ACL's
then C is the only way to go anyway.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list