[gpfsug-discuss] Quick delete of huge tree

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Tue Apr 20 12:47:44 BST 2021


On 20/04/2021 12:18, Ulrich Sibiller wrote:
> 
> Hello *,
> 
> I have to delete a subtree of about ~50 million files in thousands of
>  subdirs, ~14TB of data. Running a recursive rm is very slow so I
> setup a simple policy file:
> 
> RULE 'delstuff' DELETE
>    DIRECTORIES_PLUS >    WHERE PATH_NAME LIKE '/mypath/%'
> 
> This kinda works but is not really fast, either. It even requires a 
> second run because files and directories within the tree will be
> processed in arbitrary order so it will happen quite frequently that
> a directory is going to be deleted before its content has been 
> removed completely. For those dirs I see an error message and have to
> delete afterwards.
> 

You are going to have to remove all the inodes and hence the speed is 
going to be dependant on your metadata performance no matter how you 
approach the problem. I doubt that you are going to get much better than 
a recursive rm.

You could try running a series of parallel rm's attacking subsection of 
the directory tree on different nodes, but I suspect that it will make 
very little difference. You could my sync/restore script to split the 
problem up between different nodes

https://github.com/digitalcabbage/syncrestore

> I am wondering if there's a quicker way. Given the fact that this is
> a whole tree I think there's should be a quick way to unlink the
> complete inode hierachy.
> 
> Unfortunately we are not using a fileset for that tree...
> 
> So are there any ideas how to solve that more efficiently?
> 

Does it matter that it takes a long time? Use screen or tmux to run the 
command in the background safe from accidental detaches and forget 
about. Check on it now and then to see how it's going or just forget 
about it, it will finish. Something like

screen -S filedel -L -Logfile /tmp/filedel.log -d -m /bin/rm -rf <tree>

Consider using mv to move it out the way or hide it while the delete is 
in progress. If you do that think carefully about backups, you don't 
want to back it all up again while it is being deleted :-)


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG



More information about the gpfsug-discuss mailing list