[gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Tue Jun 9 14:57:08 BST 2020


On 09/06/2020 14:07, Stephen Ulmer wrote:
> Jonathan brings up a good point that you’ll only get one shot at this — 
> if you’re using the file system as your record of who owns what.

Not strictly true if my existing UID's are in the range 10000-19999 and 
my target UID's are in the range 50000-99999 for example then I get an 
infinite number of shots at it.

It is only if the target and source ranges have any overlap that there 
is a problem and that should be easy to work out in advance.

If it where me and there was overlap between input and output states I 
would go via an intermediate state where there is no overlap. Linux has 
had 32bit UID's since a very long time now (we are talking kernel 
versions <1.0 from memory) so none overlapping mappings are perfectly 
possible to arrange.

 > With respect to that, it is surprising how easy the sqlite C API is to
 > use (though I would still recommend Perl or Python), and equally
 > surprising how *bad* the JOIN performance is. If you go with sqlite,
 > denormalize *everything* as it’s collected. If that is too dirty for
 > you, then just use MariaDB or something else.

I actually thinking on it more thought a generic C random UID/GID to 
UID/GID mapping program is a really simple piece of code and should be 
nearly as fast as chown -R. It will be very slightly slower as you have 
to look the mapping up for each file. Read the mappings in from a CSV 
file into memory and just use nftw/lchown calls to walk the file system 
and change the UID/GID as necessary.

If you are willing to sacrifice some error checking on the input mapping 
file (not unreasonable to assume it is good) and have some hard coded 
site settings (avoiding processing command line arguments) then 200 
lines of C tops should do it. Depending on how big your input UID/GID 
ranges are you could even use array indexing for the mapping. For 
example on our system the UID's start at just over 5000 and end just 
below 6000 with quite a lot of holes. Just allocate an array of 6000 
int's which is only ~24KB and off you go with something like

	new_uid = uid_mapping[uid];

Nice super speedy lookup of mappings. If you need to manipulate ACL's 
then C is the only way to go anyway.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list