ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


FreeBSD Basics

Read The Friendly Manpage! -- Part Two

10/11/2000

In last week's article, we were left hanging in the /usr/share/man directory. In today's article, we'll manipulate the files in this directory and learn something about formatted text, unformatted text, compressed data, and control characters. And, after traveling this circuitous route, we may even learn something interesting about manpages.

Let's do an ls of the directory where the manpages are stored:

cd /usr/share/man
ls -aF
./         cat4/     catn/      man4/     mann/
../        cat5/     ja/        man5/     whatis
cat1/      cat6/     man1/      man6/
cat1aout/  cat7/     man1aout/  man7/
cat2/      cat8/     man2/      man8/
cat3/      cat9/     man3/      man9/

Remember that there are subdirectories for each of the 9 sections of the manual. The subdirectories that begin with man contain unformatted data; the subdirectories that begin with cat contain pre-formatted data. In just a moment, we'll do an exercise that will show the difference between formatted and unformatted data.

But first, do an ls of the man1 and cat1 directories; I've snipped my output (indicated by <snip>) to just show the first 10 lines of each. You can ls more than one directory at a time; use -C to keep the multi-column output:

ls -C man1 cat1 |more
man1:
./              indent.1.gz       pkg_version.1.gz
../             indxbib.1.gz      pl2pm.1.gz
CC.1.gz         info.1.gz         pod2html.1.gz
Mail.1.gz       install-info.1.gz pod2man.1.gz
[.1.gz          install.1.gz      popd.1.gz
a2p.1.gz        intro.1.gz        pr.1.gz
addftinfo.1.gz  introduction.1.gz printenv.1.gz
addr2line.1.gz  ipcrm.1.gz        printf.1.gz
alias.1.gz      ipcs.1.gz         ps.1.gz
alloc.1.gz      ipftest.1.gz      psbb.1.gz
<snip>

cat1:
./              indent.1.gz       pkg_version.1.gz
../             indxbib.1.gz      pl2pm.1.gz
CC.1.gz         info.1.gz         pod2html.1.gz
Mail.1.gz       install-info.1.gz pod2man.1.gz
[.1.gz          install.1.gz      popd.1.gz
a2p.1.gz        intro.1.gz        pr.1.gz
addftinfo.1.gz  introduction.1.gz printenv.1.gz
addr2line.1.gz  ipcrm.1.gz        printf.1.gz
alias.1.gz      ipcs.1.gz         ps.1.gz
alloc.1.gz      ipftest.1.gz      psbb.1.gz
<snip>

Notice that every file has a .gz extension. This means that all of the manpages have been compressed to conserve disk space. This is a good thing, as the online manual is huge. The utility used to compress the files is called gzip.

When I first discovered gzip, I thought, "What a great way to conserve disk space"; at the time I had a 602 MB drive and disk space was an issue. I merrily became the superuser (mistake number one), went to the root directory (mistake number two), and told gzip to compress every file on my FreeBSD system while giving me stats on how much space I had saved by issuing the command gzip -rv (trust me, you don't want to try that one). After I had finished rendering that installation of FreeBSD useless, I learned a valuable lesson: Keep the files that came compressed with FreeBSD compressed, and keep the files that came uncompressed with FreeBSD uncompressed.

However, feel free to compress any files you have created in your home directory; compression can also be very useful when you want to e-mail a file to someone. Let's say I want to e-mail a friend a PDF file; PDF files are notoriously large, and my poor friend is still using his old 14.4 kbps modem. I can save him some time downloading that e-mail attachment if I do this first:

cd ~/pdf_files
ls -l framerel.pdf
-rwxr-xr-x 1 genisis  wheel 31840 Sep 26 16:01 framerel.pdf

gzip -v framerel.pdf
framerel.pdf:  32.9% -- replaced with framerel.pdf.gz

ls -l framerel*
-rwxr-xr-x 1 genisis wheel 21392 Sep 26 16:01 framerel.pdf.gz

Notice that the gzip utility was able to compress this file by about a third of its original size; it also replaced the original file with its compressed counterpart and added a .gz extension to the original name.

When my friend receives this file, he won't be able to do anything with it until he uncompresses it like so:

gunzip framerel.pdf

Note that my friend didn't have to specify the .gz extension as gunzip assumes the file it is unzipping will have a .gz extension and will complain if it doesn't.

This is also a good time to introduce the file utility; if anyone e-mails you an attachment or you happen to find a file on your FreeBSD system and don't know what type of data it contains, don't just send it to your screen using the cat or more commands. If it is not a text file, it may do nasty things to your screen. The file command will tell you what type of data is contained within the file like so:

file framerel.pdf.gz
framerel.pdf.gz: gzip compressed data, deflated, original filename, last modified: Tue Sep 26 16:01:34 2000, os: Unix
gunzip frame*
file framerel.pdf
framerel.pdf: PDF document, version 1.2

This is very useful information as I now know that the contents of this file will appear as random garbage characters unless I use a reader specifically designed to read pdf files. I was nice when I named this file with a pdf extension; I could have just as easily named it something less descriptive like open_me.

Let's compare these outputs to an executable file, say the ls command:

whereis -b ls
ls: /bin/ls

file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), statically linked, stripped

And finally, let's compare it to a file I created using an editor and saved as myfile:

file myfile
myfile: ASCII text

Out of the four file commands, the last command was the only one that revealed ASCII text; therefore, myfile is the only file that is safe to send to the more or cat commands or to a text editor.

Now, let's go back to the /usr/share/man directory to look at the difference between the unformatted manpages contained in the man subdirectories and the formatted manpages contained in the cat subdirectories. Since all of the manpages have been gzipped, you will have to first uncompress the data using the gunzip utility, then use cat or more to view the data, then finally remember to re-compress the file so you can continue to conserve disk space. Fortunately, the zcat utility seamlessly does all three of these steps for you.

Let's zcat the whatis manpage, as it is a nice short manpage that fits on one screen. We'll start with the unformatted version:

zcat man1/whatis.1

View the output of this command here.

Notice that this doesn't look anything like the whatis manpage as you are used to seeing it. (If you forget what the whatis manpage looks like, do a man whatis.) Instead, this file contains remarks and formatting commands along with the actual data. Let's compare this to the pre-formatted version contained within the cat subdirectory:

zcat cat1/whatis.1

View the output of this command here.

Notice that the pre-formatted version looks like the manpage you are used to seeing, minus the highlighting. However, something very interesting happens if we try to save a formatted manpage into a file. Let's send the output of zcatting the formatted whatis manpage to a test file in our home directory:

zcat cat1/whatis.1 > ~/test

Now, let's view the test file using the cat utility:

cat ~/test

Your test file should look exactly like the zcatted whatis.1 file. Now, send the test file to the more paging utility:

more ~/test

Your results should look exactly like a manpage, with highlighting included. Finally, open up the test file using your favorite text editor; I'll use Pico, but you can use any editor.

pico ~/test

View the output of this command here.

Yuck, what a mess. It's funny how a lot of ^H characters can make a file so unreadable. However, if you look very carefully, and mentally try to remove the ^H's, you should be able to recognize the text in your file. In case you're wondering, ^H is the control character for highlighting text.

We've just discovered an interesting difference in functionality between the cat utility, the more utility, and an editor. By default, cat ignores control characters, more interprets control characters, and text editors display control characters. Thus we have an unhighlighted but readable file with cat, a highlighted file with more, and a mess with a text editor.

You can force cat to display control characters instead of ignoring them by using a switch. Try this:

cat -v ~/test

Your output should display all of the ^H characters, just like the text editor did.

Understanding this behavior will come in handy if you ever want to send a manpage to a file: Perhaps you want to transfer some manpages to your non-Unix laptop or want to include some interesting snippets of a manpage when replying to an e-mail. If you just redirect the output of the man command to a file like this:

man whatis > ~/test

your resulting file will contain all of those irritating ^H characters. However, if you pipe the output through the col command before sending it to your file, you will lose the ^H characters. Try it:

man whatis | col -b > ~/test

then send ~/test to your favorite text editor to see the difference.

If you read the manpage for the col command, you'll discover why this works; the col command discards all of the control characters it doesn't recognize. And, fortunately for us, col doesn't recognize that many control characters.

This trick is also very handy if you ever transfer an ASCII file from an MS-DOS-based operating system to your FreeBSD system. If you've ever done this before, you've discovered that MS-DOS-based operating systems put a ^M at the end of every line to indicate the carriage return. You could use your arrow keys to navigate to each of these characters so you can press the delete key, but it is much easier to do this:

col -b < dosfile > unixfile

This command tells col to strip the control characters from a file called dosfile and then send the results to a new file called unixfile.

Or, if it's too late and you've already opened up the file in vi, try this:

:%! col -bx

This will remove all of those pesky ^M or ^H characters without having to leave vi. We'll save the explanation of how that works for a later article dealing with the vi editor.

Now we've finally reached the part of the article where we can tie together all of this stuff to better understand how the man command works. When you type:

man name_of_manpage

the man utility searches the /usr/share/man subdirectories, in order, for the first reference of the manpage you wish to view. You can alter this default behavior like so:

man -a name_of_manpage

which will force man to read all of the subdirectories; this switch is useful if you think a manpage is in more than one section in the manual and you wish to view them all.

If man doesn't find the manpage here, it will then look in /usr/local/man. If you do a listing of this directory and its subdirectories, you will find the manpages for the programs you installed yourself: i.e., any ports or packages that you built.

Once man has found the manpage, the formatted copy is sent to a pager so it can be displayed on your screen one page at a time. If you are using FreeBSD 4.0 or earlier, the default pager is the more utility. If you are using FreeBSD 4.1 or later, the default pager is the less utility. In true Unix style, the less utility actually offers more functionality than the more utility. Regardless of which pager your system uses, the pager will correctly interpret the ^H characters to expose the highlighted text. If you prefer to read your manpages without that glaring white text, you can start your manpage like so:

man whatis | col -b | more

You can substitute the word more with less if you prefer the less paging utility.

We've covered a lot of ground in the last couple of articles. In the next few articles I want to discuss some of the neat utilities that can be built using the ports collection.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Discuss this article in the Operating Systems Forum.

Return to the BSD DevCenter.

 

Copyright © 2009 O'Reilly Media, Inc.