[gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Lohit Valleru valleru at cbio.mskcc.org
Wed Jun 10 16:31:20 BST 2020


Thank you everyone for the Inputs.

The answers to some of the questions are as follows:

> From Jez: I've done this a few times in the past in a previous life.  In many respects it is easier (and faster!) to remap the AD side to the uids already on the filesystem.
- Yes we had considered/attempted this, and it does work pretty good. It is actually much faster than using SSSD auto id mapping.
But the main issue with this approach was to automate entry of uidNumbers and gidNumbers for all the enterprise users/groups across the agency. Both the approaches have there pros and cons.

For now, we wanted to see the amount of effort that would be needed to change the uidNumbers and gidNumbers on the filesystem side, in case the other option of entering existing uidNumber/gidNumber data on AD does not work out.

> Does the filesystem have ACLs? And which ACLs?
 Since we have CES servers that export the filesystems on SMB protocol -> The filesystems use NFS4 ACL mode.
As far as we know - We know of only one fileset that is extensively using NFS4 ACLs.

> Can we take a downtime to do this change?
For the current GPFS storage clusters which are are production - we are thinking of taking a downtime to do the same per cluster. For new clusters/storage clusters, we are thinking of changing to AD before any new data is written to the storage. 

> Do the uidNumbers/gidNumbers conflict?
No. The current uidNumber and gidNumber are in 1000 - 8000 range, while the new uidNumbers,gidNumbers are above 1000000. 

I was thinking of taking a backup of the current state of the filesystem, with respect to posix permissions/owner/group and the respective quotas. Disable quotas with a downtime before making changes.

I might mostly start small with a single lab, and only change files without ACLs.
May I know if anyone has a method/tool to find out which files/dirs have NFS4 ACLs set? As far as we know - it is just one fileset/lab, but it would be good to confirm if we have them set across any other files/dirs in the filesystem. The usual methods do not seem to work.  

Jonathan/Aaron,
Thank you for the inputs regarding the scripts/APIs/symlinks and ACLs. I will try to see what I can do given the current state.
I too wish GPFS API could be better at managing this kind of scenarios  but I understand that this kind of huge changes might be pretty rare.

Thank you,
Lohit

On June 10, 2020 at 6:33:45 AM, Jonathan Buzzard (jonathan.buzzard at strath.ac.uk) wrote:

On 10/06/2020 02:15, Aaron Knister wrote:  
> Lohit,  
>  
> I did this while working @ NASA. I had two tools I used, one  
> affectionately known as "luke file walker" (to modify traditional unix  
> permissions) and the other known as the "milleniumfacl" (to modify posix  
> ACLs). Stupid jokes aside, there were some real technical challenges here.  
>  
> I don't know if anyone from the NCCS team at NASA is on the list, but if  
> they are perhaps they'll jump in if they're willing to share the code :)  
>  
> From what I recall, I used uthash and the gpfs API's to store in-memory  
> a hash of inodes and their uid/gid information. I then walked the  
> filesystem using the gpfs API's and could lookup the given inode in the  
> in-memory hash to view its ownership details. Both the inode traversal  
> and directory walk were parallelized/threaded. They way I actually  
> executed the chown was particularly security-minded. There is a race  
> condition that exists if you chown /path/to/file. All it takes is either  
> a malicious user or someone monkeying around with the filesystem while  
> it's live to accidentally chown the wrong file if a symbolic link ends  
> up in the file path.  

Well I would expect this needs to be done with no user access to the  
system. Or at the very least no user access for the bits you are  
currently modifying. Otherwise you are going to end up in a complete mess.  

> My work around was to use openat() and fchmod (I  
> think that was it, I played with this quite a bit to get it right) and  
> for every path to be chown'd I would walk the hierarchy, opening each  
> component with the O_NOFOLLOW flags to be sure I didn't accidentally  
> stumble across a symlink in the way.  

Or you could just use lchown so you change the ownership of the symbolic  
link rather than the file it is pointing to. You need to change the  
ownership of the symbolic link not the file it is linking to, that will  
be picked up elsewhere in the scan. If you don't change the ownership of  
the symbolic link you are going to be left with a bunch of links owned  
by none existent users. No race condition exists if you are doing it  
properly in the first place :-)  

I concluded that the standard nftw system call was more suited to this  
than the GPFS inode scan. I could see no way to turn an inode into a  
path to the file which lchownn, gpfs_getacl and gpfs_putacl all use.  

I think the problem with the GPFS inode scan is that is is for a backup  
application. Consequently there are some features it is lacking for more  
general purpose programs looking for a quick way to traverse the file  
system. An other example is that the gpfs_iattr_t structure returned  
from gpfs_stat_inode does not contain any information as to whether the  
file is a symbolic link like a standard stat call does.  

> I also implemented caching of open  
> path component file descriptors since odds are I would be  
> chowning/chgrp'ing files in the same directory. That bought me some  
> speed up.  
>  

More reasons to use nftw for now, no need to open any files :-)  

> I opened up RFE's at one point, I believe, for gpfs API calls to do this  
> type of operation. I would ideally have liked a mechanism to do this  
> based on inode number rather than path which would help avoid issues of  
> race conditions.  
>  

Well lchown to the rescue, but that does require a path to the file. The  
biggest problem is the inability to get a path given an inode using the  
GPFS inode scan which is why I steered away from it.  

In theory you could use gpfs_igetattrsx/gpfs_iputattrsx to modify the  
UID/GID of the file, but they are returned in an opaque format, so it's  
not possible :-(  

> One of the gotchas to be aware of, is quotas. My wrapper script would  
> clone quotas from the old uid to the new uid. That's easy enough.  
> However, keep in mind, if the uid is over their quota your chown  
> operation will absolutely kill your cluster. Once a user is over their  
> quota the filesystem seems to want to quiesce all of its accounting  
> information on every filesystem operation for that user. I would check  
> for adequate quota headroom for the user in question and abort if there  
> wasn't enough.  

Had not thought of that one. Surely the simple solution would be to set  
the quota's on the mapped UID/GID's after the change has been made. Then  
the filesystem operation would not be for the user over quota but for  
the new user?  

The other alternative is to dump the quotas to file and remove them.  
Change the UID's and GID's then restore the quotas on the new UID/GID's.  

As I said earlier surely the end users have no access to the file system  
while the modifications are being made. If they do all hell is going to  
break loose IMHO.  

>  
> The ACL changes were much more tricky. There's no way, of which I'm  
> aware, to atomically update ACL entries. You run the risk that you could  
> clobber a user's ACL update if it occurs in the milliseconds between you  
> reading the ACL and updating it as part of the UID/GID update.  
> Thankfully we were using Posix ACLs which were easier for me to deal  
> with programmatically. I still had the security concern over symbolic  
> links appearing in paths to have their ACLs updated either maliciously  
> or organically. I was able to deal with that by modifying libacl to  
> implement ACL calls that used variants of xattr calls that took file  
> descriptors as arguments and allowed me to throw nofollow flags. That  
> code is here (  
> https://github.com/aaronknister/acl/commits/nofollow  
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faaronknister%2Facl%2Fcommits%2Fnofollow&data=02%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C99476bc8be4a4e2ad20408d80cdbdf21%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637273487058332585&sdata=h4Hgmq3jsnX5VQBDPH%2BhLS2crNg9JYNJ4uae5VM4Meo%3D&reserved=0>).  
> I couldn't take advantage of the GPFS API's here to meet my  
> requirements, so I just walked the filesystem tree in parallel if I  
> recall correctly, retrieved every ACL and updated if necessary.  
>  
> If you're using NFS4 ACLs... I don't have an easy answer for you :)  

You call gpfs_getacl, walk the array of ACL's returned changing any  
UID/GID's as required and then call gpfs_putacl. You can modify both  
Posix and NFSv4 ACL's with this call. Given they only take a path to the  
file another reason to use nftw rather than GPFS inode scan.  

As I understand even if your file system is set to an ACL type of "all",  
any individual file/directory can only have either Posix *or* NSFv4 ACLS  
(ignoring the fact you can set your filesystem ACL's type to the  
undocumented Samba), so can all be handled automatically.  

Note if you are using nftw to walk the file system then you get a  
standard system stat structure for every file/directory and you could  
just skip symbolic links. I don't think you can set an ACL on a symbolic  
link anyway. You certainly can't set standard permissions on them.  

It would be sensible to wrap the main loop in  
gpfs_lib_init/gpfs_lib_term in this scenario.  


JAB.  

--  
Jonathan A. Buzzard Tel: +44141-5483420  
HPC System Administrator, ARCHIE-WeSt.  
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG  
_______________________________________________  
gpfsug-discuss mailing list  
gpfsug-discuss at spectrumscale.org  
http://gpfsug.org/mailman/listinfo/gpfsug-discuss  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200610/0a467e3d/attachment-0002.htm>


More information about the gpfsug-discuss mailing list