ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


FreeBSD Basics

Find: Part Two

03/14/2002

In the last article, I introduced the Unix find command. This week, I'd like to continue by demonstrating some more of the switches that are available with this handy command.

Let's continue where we left off, with this example:

find . -atime +7 -o -size +`expr 10 \* 1024 \* 2` -print

As a recap, this command was looking for any files in the current directory and its subdirectories (represented by .) that have not been accessed for more than 7 days (-atime +7) or (-o) that were greater than a certain size (-size +). I used the expr command to calculate the size for me. Since I was aiming for 10MB and find thinks in terms of 512 bytes, I needed to calculate 10 times 1024 times 2 (as 2 times 512 is 1024).

Notice that I used the ` or "backquote" (the key on the far left of your PC keyboard). In Unix, whenever you want the output of one command passed to another command, put the command that will give the output between backquotes; this is known as command substitution. By putting the math that I wanted calculated between backquotes, the resulting calculation was passed to the -size switch and used by the find command.

The last thing I want you to notice is that I also had to quote the two * in the command using the \ character. When calculating math, * represents multiply; however, to the shell it represents a wildcard. By placing a \ before the *, the shell won't interpret it as a wildcard, so expr receives the * and will know that I want it to perform a multiplication.

Let's try some more examples. Let's say I have a large directory structure and I wish to search for a certain pattern and remove all of the files that match this pattern. There are several ways to do this with the find command, so let's compare some of these methods.

In my home directory, I have a directory called tmp that contains a subdirectory named tst. This tst directory has a lot of files and subdirectories, and some of these files end with a .old extension. Let's start by seeing just how many files live in my tst directory:

cd ~/tmp/tst
find . -print | wc -l
   269

Notice that when the find command ran, it printed each file found on a separate line. I could then pipe that result to the word count (wc) command using the switch that counted the lines (-l). This told me that I have 269 files (including directories, since to Unix, directories are really files) in my tst directory.

Let's see how many of these files have a .old extension:

find . -name "*.old" -print | wc -l
   67

Now, how can I go about removing these *.old files? One way is to use the -exec switch and have it call the rm command like so:


find . -name "*.old" -exec rm {} \;

Once that is finished, I can repeat this command to see if there are any remaining *.old files:

find . -name "*.old" -print | wc -l
   0

This command works, but it may not always be the best way to remove a large number of files. Whenever you use the -exec switch, a separate process is created for every file that find finds. This may not be an issue if you are only finding a small amount of files on your home computer. It may be an issue if you are finding hundreds or thousands of files on a production system. Regardless, this method does consume more resources and is slower than other methods.

Let's look at a second way to delete these files, this time using xargs:


find . -name "*.old" -print | xargs rm

You'll note that I didn't have to include the \; string at the end of this command, as that string is used to terminate commands that are passed to exec. By using xargs in this command, I will still remove all of the files that end in .old, but instead of creating a separate process for each file that is found, only one process is started through xargs. As find finds each file, it creates a list with each file on its own line. This list is passed to xargs, which takes all of the lines of the file and places them onto one line with a space to separate each file; it then passes this argument list of files to the rm command.

Learning the Unix Operating System

Related Reading

Learning the Unix Operating System
A Concise Guide for the New User
By Jerry Peek, Grace Todino-Gonguet, John Strang

There is actually a third way to remove these files, using the -delete switch with find:

find . -name "*.old" -delete

This command has the easiest syntax to use and is actually the most efficient way of removing files. The -delete switch doesn't even need to open a separate process: all of the files are removed by the find process. Also, this command should always work, whereas the xargs command may fail if find finds more files that can be passed to a command as an argument list. If you are searching a deep directory structure or have very long filenames, you may reach this limit. If you are curious as to the actual limit, there is a sysctl value that has been set for you:

sysctl -a | grep kern.argmax
kern.argmax: 65536

The 65536 represents the maximum number of bytes (or characters) in an argument list.

Before moving on to some other switches, I should mention that you may want to verify which files find will find before removing them. In my examples, I was just removing old files in one of my test directories. If you are concerned that find may find some files you don't want deleted, run your command like this first:

find . -name "*.old" -print

This will give you a list of all the matching files. If the list looks good, use the -delete switch to remove the files as in the example mentioned above.

Or, you can do the above in just one find command by using -ok like so:

find . -name "*.old" -ok rm {} \;

The -ok will prompt for verification before executing the command that follows it. You'll note that I do have to use the rm command; I can't use the -delete switch. And, as with using -exec, I have to use the {} \; syntax in order for -ok to work.

Let's take a look at some more switches. The -ls switch will give the inode number, number of blocks, permissions, number of hard links, owner, group, size in bytes, last modification time, and name for each file that is found. For example, the following command will show me the first ten directories in my home directory; you'll note that I specified that I only wanted to see directories by using the -type d switch.

cd
find . -type d -ls | head
976142  8 drwxr-xr-x  39 genisis  wheel    4096 Mar  3 17:52 .
1413099  2 drwxr-xr-x  2 genisis  wheel   512 Mar  3 13:38 ./pdfs
373539  2 drwxr-xr-x  2 genisis  wheel    512 Feb  6 12:38 ./tst
1087249  2 drwxr-xr-x  2 genisis  wheel   512 Oct  4 07:29 ./perlscripts
650764  2 drwx------  2 genisis  wheel  512 Mar  3 17:52 ./mail
706616  2 drwx------  4 genisis  wheel  512 Sep 22 14:29 ./.kde
706635  2 drwx------ 11 genisis  wheel  512 Nov  7 12:36 ./.kde/share
706636  4 drwx------  3 genisis  wheel  1536 Mar  2 18:38 ./.kde/share/config
785986  2 drwx------  2 genisis  wheel  512 Sep 22 14:36 ./.kde/share/config/colors
706682  2 drwx------  3 genisis  wheel  512 Mar  2 18:36 ./.kde/share/fonts

Let's get a little fancier with the -ls switch. Earlier in the article, we piped some output to wc -l to see how many files contained a certain expression. Let's be a bit more particular and see how many subdirectories are in my home directory:

find . -type d -print | wc -l
   256

Actually, there are only 255 subdirectories, as one of them is my current directory. Now, let's get a better idea of how this directory structure is laid out using this command:

find . -type d -ls | awk '{print $4 - 2, $NF}' | sort -rn | head
37 .
26 ./.kde/share/apps/kio_http/cache
18 ./.kde/share/apps
15 ./.gimp-1.2
9 ./tmp/tst
9 ./.kde/share
8 ./tmp/tst/h
8 ./tmp/tst/g
8 ./tmp/tst/f
8 ./tmp/tst/e

Wow, that's pretty cool. It looks like there are 37 subdirectories in my home directory (.), 26 subdirectories in the .kde/share/apps/kio_http/cache subdirectory, etc. Now, let's see how this find command worked. I started by using the -ls switch, which gave a fair bit of information regarding each directory as it was found. This information was piped to the awk utility, which is used to extract the data from certain fields. You'll note that in the original -ls output, the results were in certain fields: inode number, number of blocks, permissions, number of links, etc. I told awk to take the information from column 4 (which contains the number of links and is $4 to awk) and subtract 2 from that value (as I'm not interested in the directories . or ..). I also wanted to know the name of each directory; since this was the very last column, I used $NF to represent that field. By placing these instructions within the curly braces {}, I told awk to do this to every file that it received from the find command. I then piped the results from awk to the sort command; by using -rn, I told sort to sort the numerical output from largest to smallest so I could see which directories had the most subdirectories. I didn't want to bore you with all the output, so I also piped the final results to the head command so it would only display the first ten.

Another find switch is the -perm switch. An example is to search for any files that have their permissions set to 777, that is, set to read, write, and execute for everyone. This can be easily done with this command:

find . -perm 777 -print

The above command searches for files with the exact permissions of 777. If you are concerned with only a certain bit, rather than all the permission bits, you can do something like this:

find . -perm -4000 -print

This example will only yield files that have the SUID bit set. (Read more about permissions in a previous article.) Another handy find command is this one:

find . -perm -0002 -print

It will find all files that are writable by others. Note that you can use -0002, -002, -02, or -2 and receive the same result as leading 0s are assumed.

The last two switches I want to cover in today's article are useful when backing up or replicating directory structures. Let's start with -depth. Let's say I want to back up my entire home directory to a mounted directory named /backup. I can do this:

find . -depth -print | cpio -dump /backup

This command may also work without the -depth switch, but not always. By default, find lists the files it finds by starting at the point mentioned in the find command, in my case . or my home directory. That is, it lists first the directory, and then the contents of that directory. If it encounters a directory that has read-only permissions, find will still provide a list of the contents of that directory to the cpio command, but the cpio command won't have permission to replicate the files in that subdirectory. It is interesting to note that cpio will still be able to create the directory, but as it does, it will set the permissions to read only, so it won't be able to create any files below that directory.

However, if you remember to use -depth, find will instead start its search at the lowest level, meaning it will list the contents of directories before it lists the directories themselves. This means that the files will already have been replicated by cpio before it sets the read-only permissions on the parent directory.

What if I don't want to replicate my entire home directory, just portions of it? This is where the -prune switch comes into play. Let's say I want to back up everything in my home directory except my tmp directory. I could do this:

find . -type d -name tmp -prune -o -print | cpio -dump /backup

You'll note that the syntax seems a little bit backwards. I used the -name switch to find any directories (-type d) named tmp and pass that list to -prune. I then used the logical or -o so that everything else will be printed and piped to cpio.

I hope the examples provided in the last two articles have helped you to become more comfortable with the find command and its syntax. In the next article, I'd like to continue with the cpio command and compare its usage to the tar command.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.