Apache DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Apache 2.0 Basics

Writing Input Filters for Apache 2.0

09/20/2001

In the last two articles, we covered writing output filters for Apache 2.0. In this article, we'll focus on input filters. The two filter types are very similar and many of the concepts that we have already covered with output filters can be successfully applied to input filters. However, input filters differ from output filters enough that it is important to dedicate an article to them.

The first difference between input and output filters is that there are two different types of input filters -- those that filter all content and those that filter only request data. The big difference is that the former will filter headers as well as body data, while the latter only filters body data. The example input filter we'll cover today will modify the headers of each request.

The second difference is how filters are called. There are no functions analagous to the set of ap_f* functions discussed in the last article. This means that developers that write input filters are forced to handle buckets and bucket brigades directly.

The final difference is the order in which filters effect the data. With the output filters, the content generator started with nothing, generated the base content, and passed that content down the filter stack to be modified; at the bottom of the filter stack, the data is sent to the network and an empty brigade is returned back up the stack.

Input filters work in reverse. The first thing to do in a function that requests data from the network, is to call the ap_get_brigade() function with an empty brigade. This function will pass the brigade to the next filter until the last filter in the stack receives an empty brigade. The last filter will then fill out the brigade with the data from the network, and return it to the previous filter. The previous filter will modify the data and return it until the original function is returned a full bucket brigade.

The one function we haven't covered yet is ap_get_brigade. This function calls the next filter in the stack. It has four arguments to control its behavior:

apr_status_t ap_get_brigade(ap_filter_t *filter, apr_bucket_brigade *bucket, ap_input_mode_t mode, apr_size_t *readbytes);

Previously in this series:

Writing Filters for Apache 2.0

Writing Apache 2.0 Output Filters

The first argument is the next filter in the stack. The bucket brigade is the location to use to store the data from the network. Remember that this argument is always empty when the function is first called, and is filled out when ap_get_brigade returns. The mode defines how data is read from the filters. There are three options for this parameter: AP_MODE_BLOCKING, AP_MODE_NONBLOCKING, and AP_MODE_PEEK. The first two are self-explanatory -- we either read from the network in blocking or non-blocking mode. The third is more complex. After the first request has been made, Apache needs to determine if there is another request coming over the same socket. If there is, then Apache doesn't send the end of a response immediately, it waits until the second request has fully processed to save network bandwidth. Most filters can safely ignore this parameter if it is AP_MODE_PEEK, Apache's core filters will return the correct information along with an empty brigade. The final parameter, readbytes, is the number of bytes requested from the network on input, and the number of bytes returned on output. This is used to inform the requesting function of how much information is available to be processed. If this value is "0", input filters will return one line of data.

Now that we have the basics of input filters, let's look at the details. Like the previous article, there is an example module that implements the input filter described below. This module was implemented after the London ApacheCon. At that event, CDs were passed out to conference attendees. The problem was that the CD was created on Windows, so all the the HTML files used backslashes and spaces in the URLs, instead of forward slashes and %20. This made the CD unusable for anybody on a non-Windows platform. Because most of the conference attendees were not using Windows, most people were upset about the CD. To resolve this problem, I created a simple Apache 2.0 module that filtered the request to ensure that it is valid.

Now, let's dissect the ApacheCon input filter:

static apr_status_t apcon_filter_in(ap_filter_t *f, apr_bucket_brigade *b, ap_input_mode_t mode, apr_size_t *readbytes)
{
    const char *str;
    const char *begin;
    int length;
    apr_bucket *e;
    apr_bucket *d;
    char data[256];
    int i,j;

We start by just declaring all of the variables the filter needs. Each of the variables will become obvious as we proceed through the filter.

ap_get_brigade(f->next, b, mode, readbytes);

I will stress again that the very first call in every input filter should be to ap_get_brigade. This fills out the brigade to be used by the rest of the filter.

e = APR_BRIGADE_FIRST(b);
 
if (e->type == NULL) {
    return APR_SUCCESS;
}

Once we have a brigade, the first thing we must do is access the first bucket in the brigade. If the type of this bucket is "null", then the brigade is empty, and we can just return SUCCESS to the higher filters.

apr_bucket_read(e, &str, &length, 1);
 
if (strncmp("GET ", str, strlen("GET "))) {
    return APR_SUCCESS;
}
apr_bucket_split(e, strlen("GET "));
e = APR_BUCKET_NEXT(e);

At this point, we know that we have a valid brigade, and that there is data in it. The first thing we must do is to read from the bucket to get a string of data that we can process. In this case, this filter is very simple and only knows how to handle GET requests. Once we know that we have a GET request, we split the bucket so that we are just dealing with the URL and the HTTP version.

apr_bucket_read(e, &str, &length, 1);
/* this should work, because we are just searching for HTTP/1.0 or HTTP/1.1 */
begin = str + (strlen(str) - 3);
do {
    begin--;
} while (strncmp("HTTP", begin, 4) && (begin > str));
apr_bucket_split(e, begin - str - 1);

This segment isolates the URL from the HTTP version. We don't care about the HTTP version, but the filter needs to have the URL isolated from everything else.

  apr_bucket_read(e, &str, &length, 1);
  i = 0;
  j = 0;
  while (i < length) {
    if (str[i] == ' ') {
      data[j++] = '%';
      data[j++] = '2';
      data[j++] = '0';
      i++;
    }
    else if (str[i] == '\\') {
      data[j++] = '/';
      i++;
    }
    else {
      data[j++] = str[i++];
    }
  }

We are almost done at this stage. We have just traversed the entire URL, and copied it to a new string, replacing spaces with %20 and "\" with "/".
    d = apr_bucket_transient_create(data, j);
    apr_bucket_setaside(d, f->c->pool);
    APR_BUCKET_INSERT_AFTER(e, d);
    APR_BUCKET_REMOVE(e);
    apr_bucket_destroy(e);
    return APR_SUCCESS;
}

Writing Apache Modules with Perl and CWriting Apache Modules with Perl and C
By Lincoln Stein & Doug MacEachern
1st Edition March 1999
1-56592-567-X, Order Number: 567X
743 pages, $39.95

Finally, we have to put the new URL into a bucket, and insert that bucket into the brigade in the correct location. This filter cheats a bit, because one of its goals is to be a teaching filter, so we use a transient bucket and call apr_bucket_setaside immediately. This is done so that I have a mechanism for teaching about apr_bucket_setaside. The bucket insertion is done by inserting after the original URL bucket, and then removing the original.

To try this module, configure your Apache 2.0 server with --with-module=filters:/path/to/mod_apachecon. This will copy the module into your Apache source tree and add it to the build system. This filter is activated automatically and operates on every request. The easiest way to test it, is to telnet to the server, and make a request for a file such as:

GET \foo bar HTTP/1.0

Just be sure that you have a file named "foo bar" in your DocumentRoot directory.

Next time, we will discuss how to write modules that can be extended by other modules.

Ryan Bloom is a member of the Apache Software Foundation, and the Vice President of the Apache Portable Run-time project.


Read more Apache 2.0 Basics columns.

Return to the Apache DevCenter.





Sponsored by: