[gpfsug-discuss] Quick delete of huge tree
jonathan.buzzard at strath.ac.uk
Tue Apr 20 12:47:44 BST 2021
On 20/04/2021 12:18, Ulrich Sibiller wrote:
> Hello *,
> I have to delete a subtree of about ~50 million files in thousands of
> subdirs, ~14TB of data. Running a recursive rm is very slow so I
> setup a simple policy file:
> RULE 'delstuff' DELETE
> DIRECTORIES_PLUS > WHERE PATH_NAME LIKE '/mypath/%'
> This kinda works but is not really fast, either. It even requires a
> second run because files and directories within the tree will be
> processed in arbitrary order so it will happen quite frequently that
> a directory is going to be deleted before its content has been
> removed completely. For those dirs I see an error message and have to
> delete afterwards.
You are going to have to remove all the inodes and hence the speed is
going to be dependant on your metadata performance no matter how you
approach the problem. I doubt that you are going to get much better than
a recursive rm.
You could try running a series of parallel rm's attacking subsection of
the directory tree on different nodes, but I suspect that it will make
very little difference. You could my sync/restore script to split the
problem up between different nodes
> I am wondering if there's a quicker way. Given the fact that this is
> a whole tree I think there's should be a quick way to unlink the
> complete inode hierachy.
> Unfortunately we are not using a fileset for that tree...
> So are there any ideas how to solve that more efficiently?
Does it matter that it takes a long time? Use screen or tmux to run the
command in the background safe from accidental detaches and forget
about. Check on it now and then to see how it's going or just forget
about it, it will finish. Something like
screen -S filedel -L -Logfile /tmp/filedel.log -d -m /bin/rm -rf <tree>
Consider using mv to move it out the way or hide it while the delete is
in progress. If you do that think carefully about backups, you don't
want to back it all up again while it is being deleted :-)
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss