LinuxDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


Simplify Network Programming with libCURL

by Q Ethan McCallum
05/05/2005

The curl command-line tool is a one-stop shop for data transfer. It supports HTTP, FTP, LDAP, and other protocols. However, people who use it as just a download tool don't do it justice.

curl's inner workings use the libCURL client library. So can your programs, to make them URL aware. libCURL-enabled tools can perform downloads, replace fragile FTP scripts, and otherwise take advantage of networking without any (explicit) socket programming. The possibilities are endless, especially with libCURL using a MIT/X-style license agreement.

This article explains how to use libCURL's "easy" API, which is simple and should suit most needs. (I plan to cover the more powerful but more complex "shared" and "multi" interfaces in in a future article.) It uses the following scenarios to demonstrate libCURL programming:

  • HTTP GET: to fetch content from a URL
  • Anonymous FTP download: to fetch a remote file
  • HTTP POST: to simulate a web form, such as a search engine call
  • Authenticated FTP upload: to log in to a remote host and push a file

Stubs though they may be, the samples are working tools that you can use as building blocks for your own libCURL experiments. Feel free to download the example code and join in.

libCURL is a C library. My examples are in C++, but a proficient C programmer should be able to follow along. That said, I've discovered a template technique that should make libCURL a little easier for C++ programmers.

I tested the sample code under Fedora Core 3, libCURL version 7.12.3. As libCURL is under active development, the examples may require slight modifications to work under different library versions.

curl "Easy" Interface Basics

A typical client/server scenario involves a connection, plus one or many request/response iterations. Consider an HTTP transfer:

  1. The client establishes a connection with the server
  2. The client sends a request (usually a GET or POST operation)
  3. The server sends back some data (HTML or an error message)
  4. The client and server terminate the connection

libCURL sits in the middle of this process. To use it, configure a context object with request data (URL, parameters) and response handlers (callback functions). Pass this context to the library, which handles low-level network transport (connection initiation and teardown, data transfer) and calls your response handler(s).

Notice that libCURL doesn't really do anything with the data; it's more of a data transfer framework that fires your callbacks to do the heavy lifting. This clean separation of transport and handling abstracts your development from low-level networking and protocol concerns so that you can focus on writing your application.

Using libCURL's "easy" interface, then, involves the following sequence of API calls:

  1. curl_global_init(), to initialize the curl library (once per program)
  2. curl_easy_init(), to create a context
  3. curl_easy_setopt(), to configure that context
  4. curl_easy_perform(), to initiate the request and fire any callbacks
  5. curl_easy_cleanup(), to clean up the context
  6. curl_global_cleanup(), to tear down the curl library (once per program)

The function curl_easy_setopt() deserves attention:

curl_easy_setopt(CURL* ctx , CURLoption key , value )

The parameters are the context, the option name, and the option value, respectively. Think of value as a void* (it's really not, but bear with me), because it can be any data type. That data should, however, befit whatever key sets.

HTTP GET: Fetch a Web Page

The stub program step1 performs a simple HTTP GET operation. It prints the response headers to standard error and the body to standard output.

First, step1 calls curl_easy_init() to create a context object (CURL*):

CURL* ctx = curl_easy_init() ;

It then calls curl_easy_setopt() several times to configure the context. (CURLOPT_URL is the target URL.)

curl_easy_setopt( ctx , CURLOPT_URL , argv[1] ) ;

CURLOPT_WRITEHEADER is an open FILE* to which libCURL will write the response headers. step1 sends them to stderr.

Similarly, CURLOPT_WRITEDATA is a FILE* destination (here, stdout) for the response body. This is text data for HTTP requests but may be binary data for FTP or other transfer types. Note that libCURL defines "read" as "sent data" and "write" as "received data"; some people may find these terms confusing.

CURLOPT_VERBOSE is helpful for debugging. This option tells libCURL to print low-level diagnostic messages to standard error.

curl_easy_perform() makes the actual URL call. In the event of an error, curl_easy_strerror() prints an error message:

const CURLcode rc = curl_easy_perform( ctx ) ;

// for curl v7.11.x and earlier, look into
// the option CURLOPT_ERRORBUFFER instead
if( CURLE_OK != rc ){
  std::cerr << "Error from CURL: "
    << curl_easy_strerror( rc) << std::endl ;
} ... 

Otherwise, you can call curl_easy_getinfo() to fetch transfer statistics. Similar to curl_easy_setopt(), it takes a constant as a key and a void* in which to store the data:

long statLong ;
curl_easy_getinfo( ctx , CURLINFO_HTTP_CODE , &statLong )
std::cout << "HTTP response code: " << statLong << std::endl ;

You must match the key constant to the pointer you provide: for example, CURLINFO_HTTP_CODE (the numeric HTTP response code, such as 200 or 404) requires a long variable, whereas CURLINFO_SIZE_DOWNLOAD (the number of bytes downloaded) requires a double. Call curl_easy_cleanup() to clean up the context object. Do this after any calls to curl_easy_getinfo(), or you risk a segmentation fault.

curl_easy_setopt() doesn't copy any of the pointers you assign to context values, nor does curl_easy_cleanup() destroy them. You are responsible for ensuring pointer validity throughout the context's lifetime and for cleaning up any resources after the context's teardown.

FTP Download (Fetch a Remote File)

Web services are gaining steam, but plenty of systems still use plain old FTP jobs to transfer data between applications.

Scripts typically feed the ftp command instructions via standard input. expect offers better error handling, because it simulates an interactive session; but in my experience high-end expect skills are fairly rare. Most annoying is that these scripts run outside of the main application, so they bypass any tracing or error-handling facilities.

step2 addresses these concerns by moving the FTP pull into the native-code application itself. (Pretend that step2 is a code excerpt from a larger, long-running app.) It also demonstrates how to process the data as it downloads, so you don't have to store it in a temporary file.

step2 and step1 share a lot of code, some of which I've already explained.

The CURLOPT_WRITEFUNCTION context option specifies the function libCURL will call as it downloads the remote file:

size_t showSize( ... ) ;
curl_easy_setopt( ctx , CURLOPT_WRITEFUNCTION , showSize ) ;

The for() loop in lines 202-239 sets up the FTP calls. The CURLOPT_URL option is a URL created by concatenating the name of the target file with the name of the server. libCURL will try to use the same network connection for all of the FTP calls, because they share a context.

The value assigned to CURLOPT_WRITEDATA is available in the CURLOPT_WRITEFUNCTION callback (here, showSize()). This can be any data type, either native or user defined. The callback uses the value as a means to keep state between invocations. In step2, this is a custom XferInfo* object that stores information about the downloaded file and the number of times the library has invoked the callback:

class XferInfo {
  void add( int more ) ;
  int getBytesTransferred() const {
  int getTimesCalled(){
} ;

...

XferInfo info ;
curl_easy_setopt( ctx , CURLOPT_WRITEDATA , &info ) ;

In turn, the showSize() callback does all of the work. It tracks the size of the files downloaded from the FTP server. Note its signature:

extern "C"
size_t showSize(
  void *source ,
  size_t size ,
  size_t nmemb ,
  void *userData
)

All CURLOPT_WRITEFUNCTION callbacks use this signature.

C++ users must expose callback functions with C linkage, hence the extern "C" declaration. You can't specify an object member function as a callback, but I've found a template technique to pass the work to an object indirectly.

source is a buffer of data. I usually cast this to a char* because I process text data (HTML, XML). This example doesn't use this parameter because showSize() doesn't do anything with the data itself.

Because source is not NULL-terminated, you can't use standard string functions to determine its length. Instead, use the product of size*nmemb.

userData is the value assigned to the CURLOPT_WRITEDATA context option. Note that the libCURL manual calls this parameter stream, likely because it's a FILE* when using the default (libCURL internal) write function. I call it userData because that's a little less confusing.

As userData is void*, you must cast it back to its proper data type. showSize() casts it to an XferInfo object and calls its add() member function to record the number of bytes transferred in this call:

extern "C"
size_t showSize( ... ){

  XferInfo* info = static_cast< XferInfo* >( userData ) ;
  const int bufferSize = size * nmemb ;

  info->add( bufferSize ) ;

On success, your callback should return the number of bytes it processed (size*nmemb). libCURL compares this with the number of bytes it passed your function and aborts the transfer if they don't match. Return 0 to indicate the end of processing or some number less than size*nmemb to indicate that an error occurred.

A callback may fire several times for the same download, because the library hands you the file data in chunks. This is memory efficient if your code operates on piecemeal data, such as with low-level text parsing. Otherwise, you must store the data yourself as it comes in and handle it after the download, after the call to curl_easy_perform() returns.

Pages: 1, 2

Next Pagearrow




Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!


Linux Resources
  • Linux Online
  • The Linux FAQ
  • linux.java.net
  • Linux Kernel Archives
  • Kernel Traffic
  • DistroWatch.com


  • Sponsored by: