ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


FreeBSD Basics HTTP Proxies

by Dru Lavigne
07/03/2003

In my previous article, I introduced some of the benefits to be gained by using a proxy. In today's article, I'd like to concentrate on HTTP proxies. We'll take a look at some of the HTTP proxies available in the ports collection and which proxies are suited for which needs.

If you have any familiarity with HTTP proxies, your first thought is probably Squid, the excellent HTTP proxy. Since there are already many fine articles and tutorials on using and configuring Squid, I won't cover that product in this series. For those that are disappointed, I'll give you with a few URLs:

Squid is an example of a very configurable HTTP proxy that can scale into very large networks. This is great if you are an administrator of a very large network, but overkill if you simply want to surf safely from your FreeBSD box or enforce a policy on a small home network. Thinking as a user, what are some of the irritants that go along with web browsing? The following quickly come to mind:

Depending upon the web browser you use, some of these irritants can be dealt with directly. Others require you to install additional proxy software. Let's start by taking a look at some common browsers, then move onto complementary proxies.

Web Browser Features

As of this writing, these are the latest (non-forbidden) versions of three popular web browsers:

Keep in mind that new features are added with new versions, so features that are missing now may appear in later versions. Also, every web browser has a "Preferences" section, so if your browser isn't listed here, check it out to see what features are available.

For these browsers, the Preferences section is found under the Edit menu of Netscape and Mozilla, and under the File menu of Opera. You'll find a big difference in the amount of Preferences available between Netscape and Mozilla or Opera. This is because this is an older version of Netscape.

All three browsers have an appropriately named setting that allows you to deal with cookies. Each also allows you to enable or disable Java and JavaScript. Finally, if you have a slow Internet connection and plenty of disk space, you may find a speed improvement by tweaking each browser's cache settings.

Dealing with popup windows is a newer feature, so it is not found in this version of Netscape. In Opera, click on General to find the setting to disallow popups. Mozilla takes this a step further by either disabling popups entirely or on a site-by-site basis. To disable popups all together, go to Privacy & Security->Popup Windows and read the warning on the ramifications. Alternately, as you encounter a site with an irritating popup, simply right-click the page and choose to "Reject popup windows from this site."

bfilter

Now, let's see what some of the applications in the ports collection can do to augment the features already provided by your favorite web browser. I'll start with bfilter. This HTTP proxy not only controls popup windows, it also stops those annoying flashing ads and promises to disable webbugs. To build this port, become the superuser and:

# cd /usr/ports/net/bfilter
# make install clean

The port will install an application to /usr/local/bin/bfilter and a configuration file to /usr/local/etc/bfilter/config. Once the build is finished, leave the superuser account and type bfilter in order to start the proxy. Then verify that the proxy is listening for requests:

$ sockstat -4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
dlavigne bfilter  20336    3 tcp4   127.0.0.1:8080        *:*

You'll note that bfilter listens on port 8080 on the loopback address. If you read the comments in its configuration file, you'll see that 127.0.0.1 means to listen for HTTP requests on all interfaces. If you wish to listen only on one interface, specify its IP address in the configuration file.

bfilter is not a transparent proxy, meaning you will have to configure your web browser to use the proxy. Go into the Preferences section of your browser and you should find a setting that deals with Proxies. Type in the IP address and port number used by bfilter. In my example, bfilter is running on the same machine as my web browser, so I use 127.0.0.1 as the IP address and 8080 as the port number. If you are running bfilter on a separate computer, change the IP address in its configuration file to reflect the IP address of the NIC attached to your internal network. Then set the browsers on the computers in your network to use that IP address in their Proxies section of Preferences.

bfilter also has a rules file, found in /usr/local/etc/bfilter/rules. However, I found that the default rules worked flawlessly at catching popup windows and flashing ads. If you're looking for an easy-to-use proxy that works out of the box, bfilter is a very nice solution.

middleman

Another HTTP proxy I enjoy using is middleman. Like bfilter, it works as is, but what makes this proxy interesting are the additional features that provide an enticing way to learn more about HTTP and what is happening behind the scenes every time you visit a web site.

First, let's build the port:

# cd /usr/ports/www/middleman	
# make install clean

Note that the name of the installed application will be /usr/local/bin/mman. You also need to know the name of the default configuration file in order to start the application. If you just type mman, you'll receive the help file. Instead, use the c or config-file switch to start the proxy:

# mman -c /usr/local/etc/mman.xml

I found that the proxy needs to be started as the superuser. Don't forget to check the port mman is listening on and set the Proxies section of your browser accordingly:

sockstat -4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
root     mman       575    0 tcp4   127.0.0.1:8080        *:*

If you plan on using middleman, take the time to read /usr/local/share/doc/middleman/README.html. This is the only documentation on the product, but it is very thorough and full of interesting ideas on how to use a proxy.

Although the default configuration will probably suit your needs, you should check out the included web interface by typing mman into your browser. This will allow you to view:

A Bit About HTTP

If you've never managed an HTTP server or HTTP proxy before, you may be amazed at the amount of interaction that occurs whenever a web browser connects to a web server. I mentioned in the last article that we would be referring to the HTTP RFC (2616). Let's do a very quick rundown on how the HTTP protocol works; I'll leave it to you to refer to the RFC to fill in the details that interest you.

Whenever you browse a web site, your browser must make a separate request for every item on that page. For example, if I type slashdot.org into my browser, I'll see the following entries in my mman cache:

Note that every GIF or image is a separate request, as each is stored as a separate file on the web server. In order for my web browser to display the main page of Slashdot's site, it had to individually request each of the 11 .gifs, the one .ico, and the HTML page that explained how to format everything together.

In HTTP, there are two types of packets: request packets and response packets. The request packet always comes from the web browser. This makes sense, as a web browser is a client and the job of a client is to make requests. Not surprisingly, the response packets always come from the web server.

A web browser's request packet has three components:

The method indicates what the client is requesting. The methods are all listed and explained in the RFC and typically are written in uppercase. The most common method is the GET method, as typically your web browser wants to "get" a particular page or image from the web browser. If you take a look at your mman log, or for that matter, the log from any HTTP proxy or HTTP server, you'll see GET requests:

Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] request: GET http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] cache: create: http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topiccomdex.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topiccomdex.gif
<snip>

Here, mman issued the GET request on behalf of my browser, then placed a copy of the requested item into its cache.

A web server's response packet also has three components:

That is, the request packet sends a method, and the web server responds with a status message. Status messages are numerical, and again are listed in the RFC. You've probably run across a "404 error," as 404 is the status number representing "not found." The most common status is 200 or OK. If a web browser issues a GET request and the server finds the requested resource, it will send it back along with a status of 200. If it can't find the requested file, it will instead send a status of 404.

You probably noticed that both request and response packets contain headers and a body. The body usually contains the requested page or image. So, when my web browser made a GET request for http://images.slashdot.org:80/greendot.gif, the web server found the GIF and sent a response packet with a status of 200 and the GIF itself in the body of that packet.

Displaying Headers with mman

Headers are the interesting part of HTTP packets. They contain useful information that help the web browser and web server to communicate effectively. They also contain sensitive information about both the web server and web browser. Here are the results of my clicking on Show Headers in mman's web interface:

Unfiltered
Type		Value
Host		mman
User-Agent	Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1)Gecko/20030619
Accept		text/xml,application/xml,application/xhtml+xml,text/html;
	q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
	q=0.2,*/*;q=0.1
Accept-Language	en-us,en;q=0.5
Accept-Encoding	gzip,deflate,compress;q=0.9
Accept-Charset	ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive	300
Proxy-Connection keep-alive
Referer		http://mman/headers

Filtered
Type		Value
Host		mman
Accept		text/xml,application/xml,application/xhtml+xml,text/html;
	q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
	q=0.2,*/*;q=0.1
Accept-Language	en-us,en;q=0.5
Accept-Encoding	gzip,deflate,compress;q=0.9
Accept-Charset	ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer		http://mman/headers
User-Agent	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)

Remember, every HTTP packet includes headers. Here you are seeing the values that are sent by my web browser. The Unfiltered section contains the defaults used by my web browser. It clearly shows my operating system and the version and type of web browser I am using. The Filtered section shows that mman changed some of those headers before sending them to the web server. If I don't like those new values, I can simply click on Config, select header, and edit, say, the User-Agent. This configuration section is quite powerful, as you can add, delete, and modify the contents of headers. Don't do this just for kicks, however. Make sure you've read the RFC and understand the ramifications of the particular header value you have the urge to muck about with.

It's also interesting to see the headers being sent by a web server. If I type this URL into my browser and remember to use two periods between the word "headers" and the URL:

headers..www.mp3.com

I'll see this:

*Server header:*

HTTP/1.1 200 OK
Date: Sat, 21 Jun 2003 21:17:43 GMT
Server: Apache/1.3.12m1 (Unix) yasl/2.25 sw/1.7 mod_rdbcookie/1.2
	mod_mp3idver/0.12 rwh/1.1 bw/3.37 rewrite/3.3 include/3.6
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html

Notice that there aren't any secrets on the server end either. The header clearly indicates the type and version of web server software in use. If you are responsible for maintaining a web server, remember that every HTTP packet leaving your server reveals whether or not you've kept up with your web server patches!

Controlling Access

mman also supports features that can be very useful in a networked environment. One, it can force users to authenticate before they are allowed to use the Internet. I'll click on config then select access and add a policy. I'll then be presented with a form.

If I leave the IP address section empty, the access policy will affect every IP address that connects to the proxy. I can then set values in the username and password fields. Before saving the policy, I need to configure what access users will be allowed once they input the correct username and password. My choices are:

If you decide to create your own policy, remember to create a second policy that will allow you as an administrator to configure mman. If you plan on configuring mman on the same computer that is running the proxy software, keep the default policy, but place it below your new policy that affects your users.

Now, when users open up their web browsers, the browser itself will prompt them for the username and password you created in your policy. If they type it in correctly, they will be able to access the Internet, according to the parameters you set in your policy.

Also in FreeBSD Basics:

Fun with Xorg

Sharing Internet Connections

Building a Desktop Firewall

Using DesktopBSD

Using PC-BSD

The last feature I wish to mention is limits. This configuration allows you to control Internet access according to month, day, and time. For example, you could configure a policy that limits Internet access to the hours of 9:00 to 17:00 on Monday to Friday.

Conclusion

It seems that I've barely scratched the surface of the middleman proxy server. Perhaps I've piqued your interest and you will try this application for yourself.

In the next article, I'd like to finish the proxy series by taking a look at SMTP proxies.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.