Streaming rm, a faster alternative to find … exec rm

Anyone who has done Linux or Unix administration will be very familiar with the “find … -exec rm …” used for cleaning up temporary files. For example if some application is creating temporary files, we will remove its temporary files which are more than seven days old with a command like this:

find /var/spool/thisapp -name thisapp\*.tmp -mtime +7 -exec rm "{}" ";"

Some years ago I came across a situation where an application was creating large numbers of temporary files in a huge directory structure and someone was using a find command like the one above to clear them down. It occurred to me that it would be easy to write a command which took the names of files on the standard input and removed them. I wrote a crude version and ran the command like this (using the above example):

find /var/spool/thisapp -name thisapp\*.tmp -mtime +7 | strrm

It was ten times faster than the original form!  It took thirty seconds versus five minutes for find … exec.  Admittedly there were 46,000 files to remove in that example.

A more comprehensive strrm

Recently I decided it was worth re-writing my streaming rm and you can download the source code.  These are its options:

  • -n|–dryrun Don’t do the remove, but echo the file(s) to the standard output.
  • -v|–verbose Normally strrm will run silently, using this option will echo each file name as it’s removed.

Removing files with awkward names

Apart from the aforementioned example strrm also has another advantage, it doesn’t care about weird file names.  So consider this rather artificial example which has three backspaces in it:

vger:~/tmp(99)+>- ls Annoy*
Annoy???ingFileName
vger:~/tmp(100)+>- ls Annoy* | strrm --dryrun
Annoy\010\010\010ingFileName
vger:~/tmp(101)+>- ls Annoy* | strrm --verbose
Annoy\010\010\010ingFileName
vger:~/tmp(102)+>- ls Annoy*
ls: No match.
vger:~/tmp(103)+>-

Removing empty directory paths

For various reasons it is possible to end up with empty directory paths, i.e. paths which contain only directories and no files.  This will delete all the empty directories working back from the longest path:

find testdir -type d | awk '{printf("%04d %s\n",length($0),$0)}' | sort -rn | sed 's/[0-9][0-9][0-9][0-9] //' | strrm --dryrun

Even though it cannot delete empty directories, I’ve show it with “–dryrun” just in case it doesn’t do exactly what you expect.  Once you’re happy you can remove the “–dryrun”.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.