ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Linux Compatibility on BSD for the PPC Platform: Part 2

by Emmanuel Dreyfus
05/17/2001

Managing dynamic executables

Previously in this series:

Linux Compatibility on BSD for the PPC Platform -- The Linux compatibility layer allows BSD to run Linux binary applications. Emmanuel Dreyfus explains how he implemented this on NetBSD for the PowerPC platform.

In this article, we'll take a closer look at the problems that prevent dynamic Linux binaries from working in compatibility mode on the NetBSD/PowerPC platform. This includes the way the arguments are passed to the Linux program, and ELF auxilliary table handling.

Passing arguments to the program

The first problem here is Linux's ld.so did not get its command-line arguments. In fact, no program running in Linux emulation -- either statically linked or dynamically linked -- was actually able to get its arguments. This could be outlined by building this sample program on a Linux box (statically, of course), and trying to run it on the NetBSD box:

/*
* arg.c -- An argument printer
*/
#include <stdio.h>
int main (int argc, char **argv) {
int i;
for (i=0; i<= argc; i++) {
printf ("argc[%d]=%s\n", i, argv[i]);
if (argc > 1)
return atoi (argv[1]);
return 0;
}

This programs tests argument and return value passing between the kernel and the emulated executable. When running it, we get no output at all. The program got a null argc, which demonstrated the problem passing command-line arguments.

The arguments are passed to the program using the stack. When preparing the program launch, the kernel sets up the stack so the program will be able to find argc, argv, and envp. To inspect this mechanism a bit deeper, we can use a stack dumper, like the following piece of code :

/*
* sd.c -- A stack dumper
*/
#include <stdio.h>
#include <sys/types.h>
#include <ctype.h>

extern long end;
extern long etext;
extern long edata;
extern char **environ;

void stackdump (long, char **);

int main (int argc, char **argv) {
 long sign = 0x89abcdef;

 printf ("argc=0x%p\n", &argc);
 printf ("argv=0x%p\n", &argv);
 printf ("environ=0x%p\n", &environ);
 stackdump (sign, argv);

 return 0;
}

void stackdump (long arg, char **argv) {
  unsigned long i,j;
  long signature = 0x01234567;
 
  if (0)
    printf ("%lx %lx\n", arg, signature);

  printf ("etext=0x%lx\nedata=0x%lx\nend=0x%lx\n", etext, edata, end);
  for (i = (((long)argv-0x400)/16)*16; i <= 0x7fffffff; i=i+16) {
    printf ("%08lx ",i);
    for (j=0; j <= 15; j=j+2) {
      printf ("%02x", (*(char*)(i+j)));
      printf ("%02x ", (*(char*)(i+j+1)));
    }
    for (j = 0; j <= 15; j++) {
      if (isprint (*(char*)(i+j)))
        printf ("%c", *(char*)(i+j));
      else
        printf (".");
    }
    printf ("\n");
  }
}

This program also uses global and local variables to help study argument passing. It dumps the stack from an arbitrary address until it reaches the end of the stack and crashes, because pages after the stack are not accessible when running in user mode. We do not really care about this crash because it displays what we are looking for. However, this can be a problem if you are working on a terminal that is unable to scroll back and want to pipe the stack dump's output to more(1) or less (1).

If you want to do this, you will have to modify the program so it catches the SIGSEGV signal. You will also have to ensure that linux_sendsig() in linux_machdep.c does not crash anything. Most likely, you will keep that function empty. The easy solution is certainly to get a terminal that has a scrollback feature.

Dumping the stack, you can see the parameters you give to the program and its environment. Stackdump also give you the address of argc, which is the place where the program stores argc on the stack. In fact, the program copied that value from an upper address on the stack before entering main(). If we do not get the appropriate value for argc, we must find out where the program gets its argc, and fix the way the NetBSD kernel sets up the stack so that argc gets written where the emulated binary expects it.

Note: This is a stack dump with the desired stack layout, not the original one.

argc=0x7fffe8a8
argv=0x7fffe8ac

<snip> 7fffe8a0 7fff e8c0 0180 0744 0000 0001 7fff e904 ................
7fffe8b0 7fff e8b0 0000 0006 0184 0000 0184 0000 ................
7fffe8c0 7fff e8e0 0180 05cc 0000 0000 0000 0000 ................
7fffe8d0 7fff e8e0 4186 65e0 7fff e9e0 4186 5d60 ....A.e.....A.]'
7fffe8e0 7fff e8f0 4188 9580 7fff e9e0 4186 5d60 ....A.......A.]'
7fffe8f0 0000 0000 0000 0000 0000 0000 0000 0000 ................
7fffe900 0000 0001 7fff eab0 0000 0000 7fff eab5 ................

Next to this copied argc, here at 0x7fffe8a8, stands a pointer to **argv, at 0x7fffe8ac. This is more interesting because looking at the pointed address, at 0x7fffe904, we can find the **argv pointer that was set up by the kernel. Next to it, at 0x7fffe900, we have the argc value set up by the kernel. In this example, everything is fine, but if the kernel does not set up argc at the place the executable expects it, searching around the place pointed by the pointer to **argv (here at 0x7fffe8ac) is a good option.

When searching for the argc value set up by the kernel, the idea is to look for an integer value (4 bytes on the PowerPC) equal to the actual number of arguments given to the program (the program name itself being the first argument, so that number is at least 1). Next to argc we have **argv, which points to the *argv array. Each element of this array is a pointer to a null terminated argument string, so it is easy to identify.

We can figure out what the problem is by trying stackdump with various arguments. On the PowerPC, the problem was that we needed to set up argc on a 16-byte boundary. And there was a special trick if argc was already to appear on a 16-byte boundary, because the emulated binary then expected it to be 16 bytes lower on the stack.

To fix this problem, and get arguments passed to the program, we need to modify the stack pointer before writing argc, **argv and **envp on the stack. Setting up the stack is normally done by the copyargs() function, which lives in sys/kern/kern_exec.c. But it is possible to supply a customized copyargs() function by filling the appropriate field of COMPAT_LINUX's struct execsw. This is done in sys/kern/exec_conf.c, using the linux_copyargs_function macro. That macro should be defined in sys/compat/linux/arch/powerpc/linux_exec.h.

Thus, by modifying this macro, we can use a customized copyargs() function. The Alpha port of COMPAT_LINUX already did this. The customized function is linux_copyargs(), and it is in the sys/compat/linux/arch/alpha/linux_exec_alpha.c file. Because there is already a linux_exec.c in sys/compat/linux/common, this file cannot be called linux_exec.c, because when you build the kernel, all object files fit in the same build directory. Having the same name twice will result in the second object file overwriting the first one, and this will lead to a link error. That file was intended to be architecture-independent, so we use the Alpha version with some PowerPC add-ons. The result is the sys/compat/linux/arch/powerpc/linux_exec_powerpc.c file, which is common to the Alpha and the PowerPC platforms. It should be moved to the architecure-independent sys/compat/linux/common/linux_exec.c file later.

Linux_copyargs() first calls the standard copyargs() function, to set up all the argv and envp arrays. It leaves a linux_elf_aux_argsize bytes gap for the ELF auxiliary table (we will take a look at this later), and then it attempts to write argc, and the **argv and **envp pointers. The PowerPC-specific alignment is done by this code section:

#ifdef LINUX_SHIFT
/*
 * Seems that PowerPC Linux binaries expect 
 * argc to start on a 16 bytes
 * aligned address. And we need one more 16 
 * byte shift if it was already
 * 16 bytes aligned.
 */
   (unsigned long)stack = ((unsigned long)stack - 1) & ~LINUX_SHIFT;
#endif

The LINUX_SHIFT command is a macro, defined as 0x0000000fUL in sys/compat/linux/arch/powerpc/linux_exec.h, and we use an ifdef test to prevent the Alpha version to do this PowerPC-specific fix that would break NetBSD/Alpha Linux emulation. The file remains architecture-independent.

With this fix, we managed to get statically linked executables to get their arguments. However, a dynamically linked program will still fail because ld.so does not find the ELF auxiliary table.

The ELF auxiliary table

The ELF dynamic linker (ld.so) needs more information than just argc, **argv, and **envp to actually link a program. It must be able to locate the ELF section where the list of shared libraries needed by the program is located. This kind of information is transmitted to ld.so by setting up the ELF auxiliary table on the stack. This table contains a few entries, each containing two fields: type and value. The details of each field are specified in the System V Release 4 PowerPC ABI, that can be found here.

By looking at Linux kernel source file linux/fs/binfmt_elf.h, in the create_elf_tables() function, we can learn how the table should be laid out so Linux's ld.so works. The job is nearly the same on the PowerPC and Alpha platforms, so we can use the NetBSD/Alpha version again. The PowerPC platform just has a special trick: The ELF auxiliary table must also be aligned on a 16-byte boundary. This is a bit difficult to understand in the Linux kernel sources, but we can see comments about this in linux/fs/binfmt_elf.h, and also in the shove_aux_table() function, which is in linux/arch/ppc/kernel/process.c.

We therefore have to add another LINUX_SHIFT conditional before writing the ELF auxiliary table:

#ifdef LINUX_SHIFT
/*
 * From Linux's arch/ppc/kernel/process.c:shove_aux_table().
 * GNU ld.so expects the ELF auxiliary table to start on a 
 * 16 bytes boundary on the PowerPC.
 */
   (unsigned long) stack = 
       ((unsigned long) stack + LINUX_SHIFT) & ~LINUX_SHIFT;
#endif

Finding out where ld.so really expects the table was fairly difficult: When dynamic linking does not work, it is impossible to even output a string from the program, so stack-dumping a dynamically linked program is not an option. I had to blindly try a few different alignments and test the result before I managed to get it to work.

When the ELF Auxiliary table is correctly set up onto the stack, dynamically linked Linux binaries should link and run. Using GNU ld-1.7.0.so, everything was fine: ld.so got its argument, and the program was able to run (and then crash, but this was actually caused by another bug we will study in the next section). However, when upgrading to GNU ld-2.1.3.so, we discovered a new problem: Dynamically linked executables did not get their arguments anymore. This problem will be studied in a later section. In the next section, we will focus on other bug-crashing Linux binaries dynamically linked with GNU ld-1.7.0.so.

A mmap() fix

At this point, it is obvious that ld.so was successfully launched: The kernel trace did show attempts to open() and mmap() files such as /emul/lib/libc.so.6. But the mmap() call failed.

mmap() is used to remap physical memory and files into a process's virtual address space. It is widely used when linking shared libraries because the library code doesn't have to be loaded into the process memory. mmap() is used to map the shared library file from the disk to the virtual address space of several user processes. When a process uses the library, it is loaded into physical memory by the virtual memory subsystem, but it will never be loaded twice, because other processes share the library through their virtual memory mappings. The library is loaded once and used several times. If you need more information about the mmap() system call, take a look at the mmap (2) man page.

To debug this kind of problem, it is useful to make a small test program that uses the bogus system call. Here is a simple mmap() tester:

/*
 * mmap.c -- mmap() tester
 */
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>

int main (int argc, char **argv) {
  int fd;
  char* ptr;

  fd = open ("/etc/passwd", O_RDONLY, 0);
  if (fd < 0) {
    printf ("open failed\n");
    exit(-1);
  }

  (void*)ptr = mmap (NULL, 512, PROT_READ,MAP_PRIVATE|MAP_FILE, fd, 0);

  if (ptr == NULL) {
    perror ("mmap failed");
    exit(-1);
  }

  printf ("%c-%c-%c-%c\n", ptr[0], ptr[1], ptr[2], ptr[3]);

  return 0;
}

Using this program, it is clear the problem is caused by our mmap() emulation, and nothing else. After some investigation, we found the problem was caused by the size of the offset argument to mmap(). This argument is 32 bits long on a PowerPC Linux system, and it is 64 bits long on a PowerPC NetBSD system. The result is that when a Linux executable made a mmap() system call, NetBSD used for offset the actual argument given by the Linux executable, plus the next 32 bits of data on the stack.

Adding a wrapper function that correctly handles the offset argument and transfers control to linux_sys_mmap() fixes the problem. This wrapper function is defined in sys/compat/linux/arch/powerpc/linux_mmap_powerpc.c. Obviously, this is not very clean design, and it would be better to define a linux_off_t in architecture-dependent linux_mmap.h files, and then use them in the architecture-independent linux_sys_mmap() function.

After this mmap() fix, we are able to run dynamically linked programs such as the stack dumper or the argument printer. Everything is fine with ld-1.7.0.so (which is available with Linux's glibc-1), but upgrading to ld-2.1.3.so, which comes with glibc-2, breaks argument passing for dynamic executables.

Back to argument passing

When using ld-2.1.3.so, the argument-passing problem was a bit weird: ld.so was able to link the program, and this meant that it was able to find the program's arguments (if ld.so does not get the arguments, it complains by displaying an error message). That suggested the stack layout for arguments was good. But on the other hand, the program itself wasn't able to retrieve its arguments anymore: When running the argument printer, the program displayed a null **argv. This suggested the stack layout for the arguments was bad.

Running the stack dumper, it was obvious that the program expected its arguments 16 bytes lower than the place they actually were. Modifying the stack layout or the stack pointer did not fix the problem, because if the arguments were set up where the program expected them, then ld.so did not find them, and it was not able to link the program.

In fact, the problem is that ld.so and the executable expected the arguments to be on two different places. Duplicating the arguments was therefore a possible workaround to the problem. With such a duplication, here is the stack layout the kernel produced before transferring control to ld.so:

7fffe9b0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................
7fffe9c0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................

You can recognize on each line argc (here 0000 0001), the **argv pointer, a null pointer, and the **envp pointer. When the kernel transferred control to ld.so, the stack pointer was at 0x7fffe9c0. Ld.so was able to find its arguments at 0x7fffe9c0, and the idea was that the program would find its arguments 16 bytes lower, at 0x7fffe9b0.

Unfortunately, this does not work, because ld.so makes use of the stack. It uses the space between 0x7fffe9b0 and 0x7fffe9bf, and when it transfered control to the program, the stack layout is like this:

7fffe9b0 0000 0000 0000 0000 0000 0000 0000 0000 ................
7fffe9c0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................

And again, the program was not able to find the arguments, because the place where it expected them is erased by ld.so.

A good solution here would be to understand why ld.so gives a stack pointer that is 16 bytes too low to the program. It was not possible to achieve this, so I had to hack a bad solution. The idea here is that ld.so gives the program a stack pointer which is 16 bytes too low. So if we can regain control after ld.so has done its job, and before the program is actually started, we can adjust the stack pointer so that the program can find its arguments.

The problem is how to get control between ld.so and the program. Because ld.so does not return to kernel mode before launching the program, we have to fool ld.so into thinking it is launching the program, whereas it is actually running our code.

This can be done by setting up an entry in the ELF auxiliary table that describes where the program entry point is. Ld.so then uses that entry to launch the program. We can modify this entry in the ELF auxiliary table so that ld.so will transfer control to a small piece of code we uploaded onto the process stack. This code would adjust the stack pointer and then jump to the real program entry point. This approach is a ugly hack, but at least it worked. Here is the stack pointer adjustment code (thanks to Wolfgang Solfrank for helping me writing it) :

#include <machine/asm.h>
#define LINUX_SP_WRAP_OFFSET 0x10

.globl _C_LABEL(linux_sp_wrap_start)
.globl _C_LABEL(linux_sp_wrap_end)
.globl _C_LABEL(linux_sp_wrap_entry)
_C_LABEL(linux_sp_wrap_start):
  addi  1,1,LINUX_SP_WRAP_OFFSET
  mflr  12
  bl  1f
1:
  mflr  11
  mtlr  12
  lwz  12, _C_LABEL(linux_sp_wrap_entry)-1b(11)
  mtctr  12
  bctr
_C_LABEL(linux_sp_wrap_entry):
.long  0  /* orginal prog entry point. setup by the kernel
*/
_C_LABEL(linux_sp_wrap_end):

Its use is triggered by the LINUX_SP_WRAP macro, which is defined in PowerPC-specific linux_exec.h, just like the LINUX_SHIFT macro. The kernel just copies this code from kernel space to the user stack, sets up the program entry point at the linux_sp_wrap_entry location, and sets the entry point in the ELF auxiliary table to the location on the stack where the code was just uploaded.

We can have a closer look at what the assembly instructions actually do. First, we adjust the stack pointer, which is GPR1, by adding 16 to it. This is done by the addi 1,1,LINUX_SP_WRAP_OFFSET.

Then we load in GPR12, the value at the linux_sp_wrap_entry location. To do this, we will have to tamper with the Link Register, so it is saved prior to that operation and then restored. This is done with the mflr 12 instruction, which saves the Link Register to GPR12, and by the mtlr 12, which restores the Link Register to the value contained in GPR12.

The next goal is to get the value at the linux_sp_wrap_entry address in GPR12. By the bl 1f instruction (the f stands for the next label 1), we branch to label 1, and we save the Program Counter into the Link Register. mflr 11 copies the value contained in the Link Register into GPR11. We now have the address of label 1 in GPR11.

The difficult part is the lwz 12, _C_LABEL(linux_sp_wrap_entry)-1b(11) instruction, which adds the difference between the address of linux_sp_wrap_entry and the address of label 1 (the 1b stands for the previous label 1) to GPR11 and loads the word located at the resulting address into GPR12. We end up with the linux_sp_wrap_entry address in GPR12.

We copy the value of GPR12 to the CTR register, using the mtctr 12 instruction. Then we can use the bctr instruction, which branches to the address contained in CTRM.

This may look a bit complicated, but this is caused by two problems we need to address: First, we want the code to be able to be relocated (hence the use of the Link Register), and second, we want to do a long branch to the program entry. We must use the CTR to do this long branch.

This hack was rather inelegant, but it fixed the problem. Using this method, it was possible to get arguments in programs linked with ld-2.1.3.so. What is surprising is that it did not break linking with ld-1.7.0.so.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.