Linux DevCenter    
 Published on Linux DevCenter (
 See this if you're having trouble printing code examples

Program Your Computer to See

by Chris Halsall

Related Articles:

How the CamCal Program Works - Chris Halsall walks you through the CamCal set-up and calibration in this companion article to "Program Your Computer to See," which is an introduction to the science of Computer Vision.

Previous Features

More from the Linux DevCenter

Computer Vision is the science of applying algorithms to still or moving images in order to automatically extract meaningful, reproducible data. Research in the area has been underway for over 30 years and has mostly been concentrated in large, well-funded labs with access to mainframes or specialized hardware because of the extremely intensive computation needed.

In recent years, as everyone knows, Moore's law has continued its exponential march, and consumer-level computing devices have started matching mainframe performance and memory sizes of yesteryear. At the same time, charge-coupled device (CCD) camera technology continues to get better, smaller, and cheaper. This means that some of what used to be confined to the labs is now becoming practical for the home or office.

The applications possible with CV are literally limitless. A few examples include optical character recognition (OCR), handwriting interpretation, gesture recognition, face tracking, security or traffic monitoring, and feature extraction from satellite or aerial photographs. Biometric data can be determined for face, iris, or fingerprint identification. With fast enough processing and stereo cameras, autonomous robots and vehicles become possible.

But despite the increases in raw data processing power available, CV continues to be very difficult, with advances being hard won and the most successful algorithms often being extremely complex. While high-reliability solutions to limited domain problems are becoming more common, no computer program can look at an arbitrary image and tell you what it contains. Even something as simple as reliably recognizing one face from a database of many requires graduate or Ph.D. level knowledge and a great deal of code.

Intel's Open Source Computer Vision Library

In June of this year, at the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Intel announced the availability of the first alpha version of its "Open Source Computer Vision Library" (OSCVL) for Microsoft Windows. On September 13th, the library, version Alpha3.1, was made available for Linux for the first time.

The library provides several hundred functions that implement many of the most common algorithms in use in CV applications. It's intended to be a substrate upon which both researchers and commercial developers can immediately begin being productive at building CV solutions, instead of having to first reimplement the basic building blocks from white papers and sprinkled bits of source code.

The code is under a BSD-like license which allows either source or binary redistribution so long as the copyright notice remains attached. It is perfectly OK to take the library and produce a commercial product with it, so long as Intel's name is not used to promote the product. Full details, including download links, are available at Intel's Open Source Computer Vision Library page.

Intel has been working on the OSCVL for approximately three years, growing in scope from one man with a vision (no pun intended), Gary Bradski, to a multi-national team of developers. The library has an acceptance and navigating committee of some of the world's leading CV experts and a Web-based users group of over 500 researchers.

The facilities provided by the library vary from the common and easy-to-understand to the very complex. Some of the former include camera calibration, image statistics and histograms, gesture recognition, arbitrarily sized matrix math support, edge detection, and flood filling. The more complex include optical flow algorithms, segmentation, eigen objects, and embedded hidden Markov models. Reading the PDF manual included with the library or linked off the above site is recommended.

It's important to point out that the OSCVL relies on another free library, Intel's "Image Processing Library" (IPL). This is currently only available in x86 binary format, although Intel has committed to releasing this as truly open source within the fourth quarter of this year. The IPL will work with any x86 based processor using optimized C and will take advantage of MMX or SSE processors if available.

Once released as source, porting to and optimizing for other processors will become possible. In addition, Linux distributions may choose to include the libraries for vision applications. This should not be misinterpreted as a tie-in to Intel processors -- because of the license, anyone could port the OSCVL to work with another image processing library. It would probably be quicker, however, to simply wait for the IPL to become available.


Following the link above, navigate to the download section, where you'll find the manuals, library headers, the libraries themselves, and several sample applications (for Windows). Download the IPL package (currently IPL-2.2-1.i386.rpm), and install it as root (e.g., rpm -Uvh ipl-2.2-1.i386.rpm).

Next, download the OSCVL itself, which currently is available only as a source tar-ball, version Alpha 3.1. Uncompress this on any partition where you have at least 80 megabytes of free space available (e.g., tar -xzvf opencv-0.3.1.tar.gz), and cd into the created directory (cd opencv-0.3.1.b). Run the configure script (./configure) and then run make.

Once the build has completed successfully, become root and execute make install. This places the library files in /usr/local/lib/ and the include files in the opencv subdirectory in /usr/include/.

Keep in mind, you may need to update your /etc/ file to include the /usr/local/lib/ directory in the library search path, or add it to the LD_***_PATH environmental variables. You will know you need to do this if you get "cannot find library" messages when you try to run a program linked against the library. If you update, be sure to run the command ldconfig (as root) in order to update the cache, or the changes won't be "seen" until the next reboot.

Camera calibration

Almost all cameras suffer from barrel distortion, with wide angle or "fish eye" lenses being the worst. Further, every camera and lens will have slightly different characteristics. The result is that straight lines in the target scene become curves in the captured image, particularly at the edges. The upper-left portion of the image below, captured with an inexpensive B&W security camera, demonstrates this well.

Comparison of results from two different lenses.

Corrected vs. uncorrected image from small security camera.

In Computer Vision applications, camera calibration is often a critical first step in extracting meaningful data. After all, how is a computer algorithm suppose to understand what it is looking at when straight lines are curves? Calibration, or extracting the camera's "intrinsics," registers the 2D data the camera provides to the 3D world it is viewing and allows the distortion to be removed from the video stream. The OSCVL has been used to undistort the lower-right hand portion of the image.

Once the intrinsics are known, it is also possible to calculate the true (X,Y,Z) coordinates of targets of known characteristics. With two or more calibrated cameras viewing a scene, arbitrary 3D extraction and background elimination can be performed. This leads to applications like inexpensive but complex gesturing functions (e.g., the computer knows where on the screen or your desk you're looking or pointing) and automatic 3D feature extraction of objects in a scene (how big is that vehicle, and how fast is it going).

CamCal application for Linux

The OSCVL package for Windows comes with several sample applications, but they all currently rely on Video for Windows to run. Because of its importance for most further CV work, the sample camera calibration application was the first to be ported to run under Linux, using Video4Linux devices. It is currently distributed separately, as it was not ported by Intel.

To build the package, uncompress it as you did above for the OSCVL, and cd into the created directory. Execute the ./ command, and if there are no warnings about missing components, run make. Once it's finished building, running make install (as root) will place the calib application in /usr/local/bin/. If you don't want to install right away, you can run the application from the src/ directory.

When the application launches, it tries to open the /dev/video device and, if successful, starts capturing two frames per second from this source. If unsuccessful, it will complain and wait for you to set the correct device from the Preferences panel, available from the menu. If you continue to have difficulty opening the device, you might need to set the device's ownership, group, or file-mode permissions to allow access.

It's possible to have multiple copies running at the same time, each connecting to a different Video4Linux device (/dev/video0, /dev/video1, etc.). If you have more than one camera, be sure to calibrate each separately.

The calibration process is quite simple. It involves the use of a chessboard pattern placed on a planar (flat) surface, as seen in the image below. Several different patterns are included with the package in the targets/ directory, appropriate for 8.5 by 11 inch paper. One of these should be printed out with a high quality laser printer and mounted on a rigid, flat surface.

Calibration screen shot.

Calibration software running under Linux; Tux is helping hold target.

Next, you need to set the "etalon" values for the target you've decided to use. The target that Tux is holding in the image is an 11 by 8 target with squares of 2cm each. The linear size value isn't actually needed for undistortion calculations, as it's only a scaling factor for 3D calculations, but it's a good idea to give the correct values even if you don't plan to use 3D extraction right away.

Once the sizes are known, the program will look for the chessboard pattern in the scene and draw red Xs at all found intersections. If the number of intersections found matches what is expected, and the points appear to be of high quality, the red Xs turn green, and, if appropriate, the found points are stored for processing. It is interesting to note that the library is able to determine the intersection points with an accuracy of 0.1 pixel along each axis.

Several screen shots worth of samples must be acquired, with the target moving around the field of view of the camera and with the plane of the target varying as well; that is, the six degrees of freedom must be exercised. By default, 20 such images are collected, and this can be done sequentially by pressing the "Start" button and moving the target around, or by pressing the "Single" button when the target has been moved to the next location in space.

Once the full set of images have been acquired, a call to the OSCVL is made, which processes the found intersection points and tries to calculate the intrinsics. The processing can take from a few seconds to a minute or so, depending on the speed of the processor. If successful, the calculated intrinsics are displayed in the lower-right hand corner of the window, and the captured video frames are subsequently undistorted to visually show what the effects are.

If the calculations fail (because of interlace shutter noise or the target placements being too coplanar), the process is reset, and another set of images must be captured. You can also restart the calibration by pressing the "Start" button again, to train another camera, or if you're not satisfied with the current results.

Once the intrinsics have been extracted for a camera, they can be written out using the "File->Save As" menu. These files are plain text and can be re-read into the calibration program using the "File->Open" menu, or into other applications which will be working with data from the same camera. Being plain text files, they can be edited by hand. Try adjusting the various values to see what each does.

It is also possible to turn on and off the undistortion by toggling the check box. Interpolation filtering can also be toggled in this way, which results in faster processing but with the risk of "jaggies."


The critical camera calibration functions of the OSCVL, briefly outlined above, are only a small example of the features and abilities delivered. Leveraging on this library, and with only a small amount of research, just about any developer should be able to add sophisticated CV functions to their applications.

Intel plans to continue to add additional functions and abilities to the library over time, with several new features currently being tested internally. Contributions are encouraged, so long as they're free from anything that would prevent them from being placed under the OSCVL license. That means you won't find patented algorithms in the library unless they're released appropriately, or they expire.

Personally, I think this is yet another example of Intel "getting it." While there's no question that applications built with the library will encourage the purchase of faster processors, because of the BSD-like license, there is no lock-in to Intel processors specifically.

For developers and researchers alike, the benefits are huge. Previously, duplication of effort, or lots of hunting around for code snippets, was required to get the basics in place before any real new work could begin. With processors as fast as they are now, and only getting faster by the quarter, ever more advanced real world applications are becoming possible. Cameras also continue to become more sensitive, smaller, and less expensive.

The OSCVL is an excellent example of what makes open source work: You make the substrate, or infrastructure, free and available for everyone to work on, and then just watch as people do amazing things. The time is ripe for CV applications to become common.

There is no doubt: we live in very interesting times.

Chris Halsall is the Managing Director of Ideas 4 Lease (Barbados). Chris is a specialist... at automating information gathering and presentation systems.

Discuss this article in the O'Reilly Network Linux Forum.

Return to the Linux DevCenter.


Copyright © 2009 O'Reilly Media, Inc.