[gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Wed Jun 10 08:33:03 BST 2020

Quota … I thought there was a work around for this.

I think it went along the lines of.

Set the soft quota to what you want.
Set the hard quota 150% more.
Set the grace period to 1 second.

I think the issue is that when you are over soft quota, each operation has to queisce each time until you hit hard/grace period. Whereas once you hit grace, it no longer does this. I was just looking for the slide deck about this, but can’t find it at the moment! Tomer spoke about it at one point.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "aaron.knister at gmail.com" <aaron.knister at gmail.com>
Reply to: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Wednesday, 10 June 2020 at 02:16
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Lohit,

I did this while working @ NASA. I had two tools I used, one affectionately known as "luke file walker" (to modify traditional unix permissions) and the other known as the "milleniumfacl" (to modify posix ACLs). Stupid jokes aside, there were some real technical challenges here.

I don't know if anyone from the NCCS team at NASA is on the list, but if they are perhaps they'll jump in if they're willing to share the code :)

From what I recall, I used uthash and the gpfs API's to store in-memory a hash of inodes and their uid/gid information. I then walked the filesystem using the gpfs API's and could lookup the given inode in the in-memory hash to view its ownership details. Both the inode traversal and directory walk were parallelized/threaded. They way I actually executed the chown was particularly security-minded. There is a race condition that exists if you chown /path/to/file. All it takes is either a malicious user or someone monkeying around with the filesystem while it's live to accidentally chown the wrong file if a symbolic link ends up in the file path. My work around was to use openat() and fchmod (I think that was it, I played with this quite a bit to get it right) and for every path to be chown'd I would walk the hierarchy, opening each component with the O_NOFOLLOW flags to be sure I didn't accidentally stumble across a symlink in the way. I also implemented caching of open path component file descriptors since odds are I would be chowning/chgrp'ing files in the same directory. That bought me some speed up.

I opened up RFE's at one point, I believe, for gpfs API calls to do this type of operation. I would ideally have liked a mechanism to do this based on inode number rather than path which would help avoid issues of race conditions.

One of the gotchas to be aware of, is quotas. My wrapper script would clone quotas from the old uid to the new uid. That's easy enough. However, keep in mind, if the uid is over their quota your chown operation will absolutely kill your cluster. Once a user is over their quota the filesystem seems to want to quiesce all of its accounting information on every filesystem operation for that user. I would check for adequate quota headroom for the user in question and abort if there wasn't enough.

The ACL changes were much more tricky. There's no way, of which I'm aware, to atomically update ACL entries. You run the risk that you could clobber a user's ACL update if it occurs in the milliseconds between you reading the ACL and updating it as part of the UID/GID update. Thankfully we were using Posix ACLs which were easier for me to deal with programmatically. I still had the security concern over symbolic links appearing in paths to have their ACLs updated either maliciously or organically. I was able to deal with that by modifying libacl to implement ACL calls that used variants of xattr calls that took file descriptors as arguments and allowed me to throw nofollow flags. That code is here (
https://github.com/aaronknister/acl/commits/nofollow). I couldn't take advantage of the GPFS API's here to meet my requirements, so I just walked the filesystem tree in parallel if I recall correctly, retrieved every ACL and updated if necessary.

If you're using NFS4 ACLs... I don't have an easy answer for you :)

We did manage to migrate UID numbers for several hundred users and half a billion inodes in a relatively small amount of time with the filesystem active. Some of the concerns about symbolic links can be mitigated if there are no users active on the filesystem while the migration is underway.

-Aaron

On Mon, Jun 8, 2020 at 2:01 PM Lohit Valleru <valleru at cbio.mskcc.org<mailto:valleru at cbio.mskcc.org>> wrote:
Hello Everyone,

We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve.

May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner?
We could spend some time to write a custom script, but wanted to know if a tool already exists.

Please do let me know, if any one else has come across a similar situation, and the steps/tools used to resolve the same.

Regards,
Lohit
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200610/5a3ef7cd/attachment-0002.htm>