ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


HTTP Wrangler

An Amble Through Apache Configuration

by Rael Dornfest
03/02/2000

The next attraction on our Apache tour is an amble through httpd.conf, the web server's main configuration file.

At first glance, the httpd.conf file can seem intimidating. Heck, the third introductory paragraph reads, "You have been warned." But despite this rather ominous beginning, Apache is surprisingly simple to configure thanks to its well-thought-out default settings.

In this article, I'll cover a selection of Apache configuration directives, the settings that define how Apache should actually run. These are things like: where files are located on your server, how much of the machine's resources Apache may use, which content visitors are allowed to see, and how many concurrent visitors the server can handle. I won't describe every directive as some are self-evident, others are perfectly fine left at the default, and still others are so involved they warrant their own columns.

Defaults and Why They're Default

Apache's default settings came about through a collaborative effort between the core development team and Apache end-users -- people like you. As a result, a newcomer can relatively easily download Apache, unpack it from its virtual box, plug it in, and get it running in only a few minutes.

Don't, however, confuse simplicity for lack of power. As we amble through httpd.conf, you'll notice scores of directives we'll pass up without mention. Heed the warning at the beginning of the file to read the documentation carefully before trampling through some of the more complicated settings. Beyond the defaults, the httpd.conf file reflects the preferences of the user and localized security issues.

Visit http://www.apache.org/docs for detailed documentation on all Apache runtime configuration directives.

Access.conf, Srm.conf, and Why You Don't Care

Take a gander at the contents of your Apache installation's configuration directory, called either conf or etc depending upon your layout preferences and installation method. Mine (a source install using the default prefix and layout) is /usr/local/apache/conf. The binary installs I've seen tend to be the same. The Apache 1.3.9 RPM installation under Red Hat 6.1 creates a conf directory at /etc/httpd/conf.


% cd /usr/local/apache/conf
%ls
access.conf  httpd.conf  magic  mime.types  srm.conf
You'll notice a few other files aside from our friend, httpd.conf. For simplicity's sake, the two other .conf files, access.conf and srm.conf have been deprecated in favor of consolidating all configuration directives inside httpd.conf. While, in some instances, it may make sense to keep a few directives in these two files (or another arbitrary file), it's not a particularly standard practice anymore. For further information, take a look at the AccessConfig and ResourceConfig directives.

As for mime.types and magic, if enough readers are interested, I'll discuss MIME-types and the mod_magic module in later columns.

Diving In

Alright, let's dive in. Open your httpd.conf file in the text editor of your choice.

httpd.conf is organized in the the following manner:

I don't cover Virtual Hosts in this column, but if you're dying to give this feature a whirl, visit the Resources section at the end of this article.

Section 1. Global Environment

KeepAlive
The HTTP protocol is "stateless," meaning that each request/response pair between web browser and server is independent. If, for example, you visit a web page that contains three embedded images, your browser actually makes four separate connections to that web server -- one for the page itself, and one for each of the images in turn.

KeepAlive, an extension to HTTP, provides a persistent connection between browser and server so that the same connection can handle multiple request/response pairs. The result is a drop in latency, or the time taken up by establishing a connection.

I'll leave the details of this set of directives to the more-than-ample httpd.conf comments.

Server-Pool Regulation
Apache under Unix is multi-process, meaning that each request is handled by a separate copy, or child process of the httpd program. Win32 Apache is multi-threaded -- the server handles each request internally rather than generating another instance of the program. If this sounds like bad elevator music to you, don't worry about it -- unless you're running under Win32, in which case I direct you to Apache.org's "Using Apache With Microsoft Windows" documentation.

Server-pool regulation balances the overhead required to spawn child processes with the memory and processor resources associated with running multiple copies of the httpd program. While the defaults are a reasonable place to start, it's only by watching a) the number of httpd processes running at any one point, and b) your server's memory and CPU usage when you're receiving the largest number of concurrent hits that you can start tuning Apache to your particular circumstances.

For example, say you're running an old 486 with little memory as your experimental web server and you want the bare minimum of resources devoted to Apache. You might set MinSpareServers, MaxSpareServers, and StartServers to a very low number. Someone running an overloaded server which handles mail, news, and web traffic might want to limit MaxClients. That way, when their site is rumored to be the last place to purchase the toy-to-have du jour, the sudden flurry of web hits won't disrupt mail services.

It boils down to this: Watch, tune, watch, tune. Incidentally, if you have a particularly nice configuration you're willing to share, please post a message on the O'Reilly Network Apache Forum.

Section 2. 'Main' Server Configuration

We now wander into Section 2 to take a peek at some of the basic directives that define the server's main web site.

Port
Think of a port as a television channel. Just as you expect to find "The B-Movie Channel" consistently on channel 123, web browsers expect to find web servers at port 80. (This analogy is an oversimplification of how ports work, but it'll do for the purposes of our discussion.) This doesn't mean that your web server has to run on port 80 -- this is only true if you want your web site to be found.

Suppose you wish to hide a test or experimental server from the outside world. You have a DSL line and have configured your router to only allow requests directed at port 80, served by your public web server. Configure your experimental server to listen on a different port number -- 8000 for instance. You can choose any port number as long as the port is not reserved for use by another service, and the number is greater than 1023 if you're not running as root.

To visit a web site hosted on a port other than 80, your visitors must include the port number in the URL they type into their browsers. For example, to visit the web site at port 8000 on your local machine, use the following URL: http://localhost:8000

ServerAdmin
(default: you@your.address)
I'm always surprised by the occasional "500 Internal Server Error" message I encounter directing me to send e-mail to the server Administrator at you@your.address. Be sure to set the ServerAdmin directive so that your visitors don't have to resort to guesswork to report problems they have with your site.

DocumentRoot
(default: usually {ServerRoot}/htdocs)
As the name suggests, this is, in the simplest case, the location where the static content of your web site lives -- HTML files, images, or sounds -- content that doesn't change on a request-by-request basis. This is where you would store the files you wish to make available for public viewing. Most folks use the directory as the root of an organized hierarchy of directories. Here's an example:

On occasion, non-static content such as Server Side Includes (SSI), embedded PHP scripting code, and CGI scripts (to name a few examples) resides right along with static content in the document root directory. Be sure, however, to think this strategy through -- you must understand how these dynamic content generators affect your site's performance and security. For more information on some of the dynamic content generators I mention here, visit the Resources section at the end of this article.

ScriptAlias
(default: usually {ServerRoot}/cgi-bin/)
CGI scripts, historically the most common dynamic content generators, usually reside outside of the main document tree in the location specified by the ScriptAlias directive. The ScriptAlias directive indicates that anything in the specified directory should be run as a program rather than simply sent to the browser as a file.

DirectoryIndex
(default: index.html)
You've probably noticed when visiting a web site that the URL you enter to get there usually doesn't necessarily contain the name of the document itself; the same holds true when clicking on many links within a site. The URL looks something like http://www.oreillynet.com rather than http://www.oreillynet.com/documentname.html. Behind the scenes, the web server is looking in the document root directory for the file specified in the DirectoryIndex directive to display by default. In other words, while you enter only http://www.oreillynet.com into your browser, you're probably actually viewing a document called index.html within the O'Reilly Network web site's document root.

By default, httpd.conf specifies this document as index.html; this, as with almost everything else in Apache, is configurable. If you're used to the Windows three-character file suffix limit or are in an environment where some folks will be editing documents for online publication under Windows and uploading these to your server, you might use index.htm as your default. Some servers assume a default.html or default.htm document. Thankfully, you may specify a space-delimited list of one or more directory indexes in order of preference from left to right:


DirectoryIndex index.html index.htm default.html default.htm
If you're using a dynamic content generator within your document hierarchy, you can just as easily designate its extension as the default:

DirectoryIndex index.php index.cgi index.html default.html

Amble Over

And thus ends our brief amble through the Apache server configuration file. I hope you've enjoyed the tour and now have a bit more of a handle on just what Apache configuration is all about (and how wonderfully configurable it is). I may have glossed over a few topics that interest you, so I'll end by suggesting several excellent tutorials and detailed documentation already available on the Web. (Why reinvent the wheel when someone else has already gone to the trouble of constructing perfectly round ones?)

An Important Note on Security and Performance

This article should in no way be considered a comprehensive tutorial. When it comes to security and performance, almost everything's situation-specific. Be careful: educate yourself by reading the documentation, ask an Apache-savvy friend for help, consult your system administrator, and join in on (or just lurk in the corner of) newsgroup and mailing list discussions.


Resources

The following is a list of starting points from which to explore some of the topics covered (or not) in this article.

Tune in Next Time ...

Fun with Logs!

As always, if you'd like me to cover anything in particular in this column, feel free to post your suggestions to the O'Reilly Network Apache Forum.

Copyright © 2009 O'Reilly Media, Inc.