Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Optimizing Linux System Performance

by Swayam Prakasha
06/07/2007

Performance optimization in Linux doesn't always mean what we might think. It's not just a matter of outright speed; sometimes it's about tuning the system so that it fits into a small memory footprint. You'd be hard-pressed to find a programmer that does not want to make programs run faster, regardless of the platform. Linux programmers are no exception; some take an almost fanatical approach to the job of optimizing their code for performance. As hardware becomes faster, cheaper, and more plentiful, some argue that performance optimization is less critical--particularly people that try to enforce deadlines on software development. Not so, even today's most advanced hardware, combined with the latest in compiler optimization technology cannot come even close to the performance benefits that can be attained by fixing some small programs, or even going with an entirely different and much faster design.

Several ideas can be applied to programs that will make them perform better. By keeping these ideas in mind while writing (and revising) the code, we can expect better and faster programs. When we talk about performance, we need to consider several different things. One is the absolute amount of time it takes the software to complete a given task. Consider an example where even if a web server serves the client requests perfectly well, there can be a delay of a few seconds before the server begins to send the pages to the client every time. In such a case, the web server is failing to perform adequately in terms of the total time required to complete the task.

Another thing to consider is the amount of CPU time required by a program. CPU time is a measure of the time spent by the computer's processors to execute the code. Many programs tend to spend most of their time waiting for something to happen--input to arrive, output to be written to disk, etc. While waiting, the CPU will usually be serving other requests and hence the program is not using CPU time. However, some programs may be primarily CPU-bound programs and for such programs, a savings in the amount of CPU time required may result in a substantial savings in absolute time.

It is important to note here that if your program uses a lot of CPU time, it can slow down all the processes on your system. The CPU time can further be separated into system and user time. The system time is the amount of CPU time used by the kernel on your behalf. This could accrue by calling functions such as open() and fork(). The user time (i.e., amount of CPU time used by your program) might be used by string manipulations and arithmetic. A third thing to consider for performance is the time spent doing I/O. If we consider some programs such as network servers, they spend most of their lives handling I/O. Other programs spend little time with tasks related to I/O. Thus, I/O optimization can be very critical with some projects and completely unimportant with some others.

The performance optimization basically consists of the following steps

  1. Define the performance problem.
  2. Identify the bottlenecks and carry out a root cause analysis.
  3. Remove the bottlenecks by appropriate methodologies.
  4. Repeat steps 2 and 3 until we have a satisfactory resolution.

It is important to note here that bottlenecks occur at various points in a system. Determining the bottlenecks is a step-by-step procedure of narrowing down the root causes. Performance optimization is relatively a complex process that requires correlating many types of information with source code to locate and analyze performance problem bottlenecks.

When focusing on performance optimization, a system administrator needs certain tools to measure and monitor the situations as well as to identify the bottlenecks. On Linux, various tools are available to do this. Top is a nice little program and it provides real time information on the system level. The top utility is an interactive one and it provides a snapshot of the system at the moment it is being viewed. Mtop is a top-like utility for monitoring MySQL. It shows any slow queries or locks that are active at the time, with the exact SQL that is executed. Sysstat, procfs, sysctl and sysfs are valuable Linux tools and configuration commands that can be used for Linux performance measurement and tuning. Another tool, sar, can be used to collect a wealth of information on system activity (such as CPU utilization, memory usage, network and buffer usage, etc.) and it can be used to pinpoint a potential bottleneck.

Problems with Loops

Let us first understand the performance problems caused by loops. Loops magnify the effects of otherwise minor performance problems. This is because the code within the loop will get executed several times. Always make sure to move the code outside the loop that need not be executed each time through the loop.

Let us consider the following code segment

main()
{
     int five;
     int cnt;
     int end;
     int temp;
        for (cnt=0; cnt < 2* 100000 * 7/ 15 + 6723; 
                     cnt += (5-2)/2
     {
        temp = cnt / 16430;
        end = cnt;
        five = 5;
     }
     printf("printing values of five=%d and end = %d\n", five,  end);
}

If we carefully observe the code, we can see that several things can be moved outside the loop. The value of "end" can be calculated only once, after the loop is through. Also, the assignment to the variable five is a dead code and it makes much sense to take these out of the loop.

When we talk about optimizing the performance, we need to make sure that unless there is an absolute need to use it, we should never try to use floating point data types such as "float" and "double." This is because of the fact they take more time to calculate than do their integer counterparts. Also, if we have a function that is called very frequently, it is better to declare it as "inline."

Also, another way to improve the performance is to increase the block size. As we know, many operations are done on blocks of data. By increasing the block size, we will be able to transfer more data at once. This will reduce the frequency with which we call more time consuming.

Take Care of Expensive Calls

It is clear that when we are interested in optimizing the code, we always want to get rid of the expensive operations (that take more time) with inexpensive calls. System calls in general are expensive operations. Let us have a look at some of the expensive system calls:

If we come across a code piece such as system("ls /etc"); we can see how expensive this is. The program first has to fork and execute the shell. The shell needs to do initialization and then it forks and executes ls. Definitely not a piece of code to desire.

The first step in getting the system tweaked for both speed and reliability is to chase down the latest versions of required device drivers. Another useful key is to understand what the bottlenecks are and how they can be taken care of. We can come to know about the various bottlenecks by running various system monitoring utilities, such as the top command.

Optimizing Disk Access

It's always worth giving attention to disk access. There are various techniques that can produce significant improvements in disk performance.

First, read up on the hdparm command and you will notice that it sets various flags and modes on the IDE disk driver subsystem. There are two options we need to look at – the -c option can set 32 bit I/O support and the -d option enables or disables the using_dma flag for the drive. In most cases, this flag is set to 1, but if yours hasn't, then you are going to suffer from performance issues. Try changing it by placing a command like this

hdparm –d 1 /dev/hda

at the end of the /etc/rc.d/rc.local file.

Similarly,

hdparm –c 1 /dev/hda

at the end of /etc/rc.d/rc.local file will set the support for 32 bit I/O.

GNU profiler (gprof)

After we have taken enough measures in optimizing our code, the compiler can be helpful with optimization as well. One tool that we can use to analyze our program's execution is the GNU profiler (gprof). With this, we can come to know where the program is spending most of its time. With profile information we can determine which pieces of program are slower than expected. These sections are definitely good candidates for to be rewritten so that program can execute faster. The profiler collects data during the execution of a program. Profiling can be considered as another way to learn the source code.

The following are the requirements to profile a program using gprof

For you to use this gprof utility, the package must be installed on your system. In order to analyze the program with gprof, we need to compile the program with a special option. Assuming that we have a program sample_2007.c, the following can be used to compile i

$ gcc –a –p –pg  –o sample_2007 sample_2007.c

Note here that –pg option enables the basic profiling support in gcc. The program will run somewhat slower when profiling is enabled. This is because of the fact that it needs to spend time in collecting data as well. The profiling support in the program creates a file named gmon.out in the current directory. This file is later used by gprof to analyze the code.

We can run the following command to get the output (which we have redirected to a file):

$ gprof sample_2007 gmon.out > output.txt

gprof is useful not only to determine how much time is spent in various routines, but it also tells you which routines invoke other routines. By using gprof, we will be able to know which sections of our code are causing the largest delays. Analyzing the source code with gprof is considered as an efficient way determining which function is using a large percentage of the overall time spent in executing the program.

A Few Things to Know About kprof

Kprof is a graphical tool that displays the execution profiling output generated by the gprof profiler. Kprof is very useful as it displays the information in list or tree view and it makes the information easy to understand.

Kprof has the following features

References

Swayam Prakasha has been working in information technology for several years, concentrating on areas such as operating systems, networking, network security, electronic commerce, Internet services, LDAP, and Web servers. Swayam has authored a number of articles for trade publications, and he presents his own papers at industry conferences. Currently he works at Unisys Bangalore in the Linux Systems Group.


Return to Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.