ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Linux Compatibility on BSD for the PPC Platform: Part 4

by Emmanuel Dreyfus
06/21/2001

In part 3, we had a look at several bug fixes that helped us get some interesting Linux applications working. However, we had a few bugs remaining that broke even more interesting binaries such as the Java Development Kit (JDK) or RealPlayer. In this part, we'll focus on the bug fixes that are needed in order to get a working Java Virtual Machine (JVM) for PowerPC-based ports of NetBSD. Surprisingly, most of the bugs we encountered here were located in machine-independent code, and they did not caused any known problem on alpha, i386, and m68k Linux emulations.

A tricky bug fix: The brk() issue

At first, the Java Virtual Machine for Linux seemed like a very interesting binary to run on the PowerPC, but the Linux compatibility was not accurate enough to get it working. We tried the Blackdown team's JDK, and when invoking the JVM, it hung. This was caused by the JVM native thread model, which makes use of Linux Real Time signals. We have not had a look at RT signal emulation yet, and there were obviously bugs in it. Fortunately, the JVM provides an alternative threading model, enabled by the -green flag -- the Green Threads.

Green Threads do not make use of RT signals so they are more likely to run under emulation. But a simple test quickly exhibits a new problem:

$ java -green -version
java: ../../../../../src/solaris/hpi/green_threads/src/dl-malloc.c:1636:
malloc_extend_top: Assertion '((size_t)((char*)(((mbinptr)(&(av_[2 *
(0)])))->fd) + top_size) & (pagesz - 1)) == 0' failed.
Abort (core dumped)

Tracing the process with ktrace(1), we can see that this error is triggered just after a set of brk() calls. The brk() system call is used by Unix processes to increase the size of the heap. Upon a brk() call, the kernel maps new pages of memory on the top of the processes' heap. Then the kernel returns the new heap top address to the process. This address is called the break value.

The brk() syntax is used to set the break value at an absolute address. When called with a null value, brk() will return the current break value. There is also an sbrk() library call, which is used to move the break value using a relative offset. The sbrk() syntax is implemented as two brk() calls. The first call is made with a null value, to get the current break address, and the second call is made to set the break at the current value augmented by the offset. For more information, please have a look at the brk(2) man page.

Debugging the brk() problem in the JVM without a look to the sources could have been tricky, but thanks to Kevin Hendricks from the Blackdown team, it was possible to work on a test program that reproduced the problem.

The failure was caused by alignment issues: The JVM wants the break value to be aligned on a page boundary. It therefore makes two sbrk calls: one to allocate some space and get the resulting break value, and the second to adjust the break value so that it ends up on a page boundary. The final value is then tested for page alignement, and a non-alignment triggers the assertion we saw. When tracing the JVM calls to brk() while running natively on a Linux system, we get this:

brk(0)          = 0x100109d8
brk(0x100109f9) = 0x100109f9
brk(0)          = 0x100109f9
brk(0x20021000) = 0x20021000

Each couple of brk() is the result of a sbrk() library call. And here is the result when running the JVM in emulation on NetBSD:

brk(0)          = 0x10011000
brk(0x10011021) = 0x10011021
brk(0)          = 0x10012000
brk(0x12012fdf) = 0x12012fdf

The difference is obvious: NetBSD ended up with a non-aligned break value, and this is what led us to the assertion failure. The question is why did NetBSD get a non-aligned break value, and the answer is that NetBSD just returned the value requested by the calling process, here 0x12012fdf.

Previously in this series

Linux Compatibility on BSD for the PPC Platform -- The Linux compatibility layer allows BSD to run Linux binary applications. Emmanuel Dreyfus explains how he implemented this on NetBSD for the PowerPC.

Linux Compatibility on BSD for the PPC Platform: Part 2 -- Emmanuel Dreyfus takes a look at how to prevent dynamic Linux binary compatibility problems on the NetBSD/PowerPC platform.

Linux Compatibility on BSD for the PPC Platform: Part 3 -- Signals are the interactions between the kernel and the user program -- a program can't run without them. Emmanuel Dreyfus explains how to make your signals Linux-compatible.

In fact, the problem is that between the second brk() call and the third brk() call, the break value presented by the NetBSD kernel to the user process has changed. And the JVM uses the return value of the second brk() call to compute the offset needed to page align the break value. Here, with a return value of 0x10011021, the JVM knows that a 0xfdf adjustment is needed in order to reach a page-aligned address. The JVM then calls sbrk() with a 0xfdf offset. Unfortunately, the actual break value is now 0x12012000. Adding 0xfdf to it leads us to 0x12012fdf, and this address is not page aligned.

The explanation of this break value inconsistency is that NetBSD always sets the break value to a page-aligned address. On a brk() system call with a non-aligned address, it returns the requested value to the user process while the real break value is set to a page-aligned adress. This is why on the next brk() call, the break value read is not at the same address. The idea behind this behavior is that you only have to call brk() once to get a page-aligned break value. Linux, on the other hand, can set the break value to a non-page-aligned address, and you need to call brk() at least two times to get a page-aligned break.

Brk() emulation can be fixed by just keeping track of Linux processes' idea of break values. The kernel keeps setting the break values on page-aligned addresses while returning the requested address to the user process. This address may not be page aligned. The fix is to keep track of this returned value, and return it as the break address on the next brk() call.

We just have to find a place where to store the process idea of the break value. It fits nicely in the struct linux_emuldata (defined in sys/compat/linux/common/linux_emuldata.h), which is referenced by the *p_emuldata member of Linux processes' struct process (defined in sys/sys/proc.h). The new field of struct linux_emuldata we introduce to keep track of the process idea of the break value is called p_break.

With this fix to the way Linux brk() is emulated, we are able to get minimal support for the Java Virtual Machine. A simple program such as a "Hello world" was working:

/* Hello.java -- A simple test for the JVM */
public class Hello {
        static public void main (String[] args) {
                System.out.println("Hello");
        }
}

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: