ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


IRIX Binary Compatibility, Part 4

by Emmanuel Dreyfus
10/10/2002

Native Implementation of Signals

Signals are the difficult of part IRIX emulation. However, before examining the way they work on IRIX, let us study the signals implementation in NetBSD/mips.

A user process enters the kernel by a trap. When a trap is caught, the hardware transfers control to the kernel. Assembly code in sys/arch/mips/mips/locore.S builds a trap frame (this is a struct frame, defined in sys/arch/mips/include/proc.h) on the kernel stack, in which CPU registers are saved. Then the trap() function from sys/arch/mips/mips/trap.c is called to handle the trap.

When resuming the process execution, the kernel just restores the CPU registers from the trap frame. This restores the program counter, stack pointer, and so on.

When the kernel invokes a signal handler, it has to trick the user process so that on return to userland, it executes the signal handler instead of resuming normal execution where it was stopped when the trap occurred.

This is done by modifying the trap frame. The saved program counter is modified so that on return to userland the signal handler is run. Registers A0, A1 and A2 are set to the signal handler's arguments. This is done for native process in sys/arch/mips/mips/mips_machdep.c:sendsig().

In This Series

IRIX Binary Compatibility, Part 6
With IRIX threads emulated, it's time to emulate share groups, a building block of parallel processing. Emmanuel Dreyfus digs deep into his bag of reverse engineering tricks to demonstrate how headers, documentation, a debugger, and a lot of luck are helping NetBSD build a binary compatibility layer for IRIX.

IRIX Binary Compatibility, Part 5
How do you emulate a thread model on an operating system that doesn't support native threads (in user space, anyway)? Emmanuel Dreyfus returns with the fifth article of his series on reverse engineering and kernel programming. This time, he explains thread models and demonstrates how NetBSD emulates IRIX threads.

IRIX Binary Compatibility, Part 3
Emmanuel Dreyfus shows us some of the IRIX oddities, the system calls that you will not see anywhere else.

IRIX Binary Compatibility, Part 2
Emmanual Dreyfus shows us how he implemented the things necessary to start an IRIX binary. These things include the program's arguments, environment, and for dynamic binaries, the ELF auxiliary table, which is used by the dynamic linker to learn how to link the program.

IRIX Binary Compatibility, Part 1
This article details the IRIX binary compatibility implementation for the NetBSD operating system. It covers creating a new emulation subsystem inside the NetBSD kernel as well as some reverse engineering to understand and reproduce how IRIX internals work.

The difficult task is resuming the process normally once the signal handler is executed. To achieve this, we must save the machine state from the trap frame somewhere, and restore it correctly later. This is done by copying the machine state to a struct sigcontext (defined in sys/arch/mips/include/signal.h) on the process' user stack. The struct sigcontext address is handed to the signal handler through its third argument, as documented in sigaction(2).

Another requirement is to get control back after the signal handler is executed. On MIPS processors, there is an RA register, which holds the Return Address. Function returns are implemented in assembly by a jump to RA:

 j      $ra

The RA saved in the trap frame is used to control where the signal handler will return. We want to return to the kernel in order to undo the trap frame modification so that the process will be able to resume normally the next time it returns to userland. Unfortunately, it is not possible to just jump to kernel code from a user program: the user process must do a trap to return to the kernel.

Things are done by calling a dedicated system call: sigreturn(2). The RA is set to return to a small piece of code known as the signal trampoline, which is copied by the kernel onto the user stack. Then things work, because the signal handler returns to code in user space. The signal trampoline calls sigreturn(2).

Here is the signal trampoline for NetBSD/mips, as defined in sys/arch/mips/mips/locore.S

 addu    a0, sp, 16              # address of sigcontext
 li      v0, SYS___sigreturn14   # sigreturn(scp)
 syscall

If the signal handler did not screw up the stack pointer, the signal context structure will be 16 bytes above the stack pointer. The signal trampoline sets up A0 with the address of the struct sigcontext, which will be the first argument to sigreturn(2).

It is important to hand the struct sigcontext to sigreturn(2), because it needs it in order to restore the process trap frame. This is done in sys/arch/mips/mips/mips_machdep.c:sigreturn().

sigreturn(2) is a system call that does not return: once the trap frame is restored and we return to user space, execution will resume after the trap that occurred before the signal handler execution. The context of the signal handler does not exist anymore.

IRIX Implementation of Signal Delivery

On IRIX, things are much more complicated. Once again, doing things incrementally is a good solution, allowing different problems to be addressed separately.

Hence, implementing a irix_sendsig() and irix_sigreturn() as plain copies of the native sendsig and sigreturn is a good first step. The only original item we need is an IRIX signal trampoline. It can be made from the native signal trampoline, since the only thing that needs to change is the system call number for irix_sigreturn:

 addu      $a0, $sp, 16             # address of sigcontext
 li        $v0, IRIX_SYS_sigreturn + SYSCALL_SHIFT
 syscall                            # irix_sys_sigreturn(scp)
 break     0                        # just in case sigreturn fails

Here we are storing into A0 the address of sigcontext, which is 16 bytes lower than the stack pointer. A0 is used to store the first argument to the irix_sys_sigreturn system call.

This is really basic, and we are far from emulating what really happens on IRIX, but, in fact, it will even "just work" for a lot of programs where the signal handler does not use its arguments.

Let us review the arguments of the signal handler on IRIX. According to the IRIX sigaction(2) man page, we have two situations.

If the SA_SIGINFO flag was not set on sigaction(2) call, then the signal handler is invoked with:

If SA_SIGINFO was set:

If the signal handler attempts to use its arguments, we must accurately emulate them. When SA_SIGINFO is not set, things are simple because this is exactly what the native sendsig() sets up. However, there is a problem when SA_SIGINFO is set.

It is easy to modify irix_sendsig() so that it saves the context in a struct irix_ucontext instead of struct irix_sendsig when SA_SIGINFO was set. But the problem is in irix_sys_sigreturn(): how to distinguish between the situation where the context is to be restored from a struct irix_sigcontext or a struct irix_ucontext. Assuming the wrong structure will lead to restoring a screwed context, and this means a crash for the process, because execution will resume at the location the PC register was wrongly set to.

We have the beginning of an answer by looking at IRIX's sigreturn man page.

IRIX's sigreturn expects in three arguments:

One more time, Linux has already implemented this, hence the temptation is here to pick up good ideas from Linux. Looking at Linux source for sigreturn is not very instructive: when the first argument is NULL, Linux uses the mysterious ucp pointer to find the process context. Because we have no real idea about how to handle this ucp pointer and the SA_SIGINFO flag, we will just leave them behind for now. The easy thing to do is to modify irix_sendsig() and irix_sigreturn() to make use of struct irix_sigcontext instead of the native struct sigcontext. At least this will make programs that do not set SA_SIGINFO happy with the content of the signal context if they happen to access it.

The next problem where our emulation may not be accurate enough is probably the stack layout and CPU registers on signal handler invocation. If the IRIX process makes any assumption about them, we can fail. The method for checking stack and CPU register contents on signal handler invocation is simple, it is exactly the same we used on program startup.

We will use it on the following test program:

/* signal11.c -- a simple sigaction(2) tester */
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

void func (int, siginfo_t *, void *);

int main (int argc, char** argv) {
        struct sigaction sa;
        sigset_t ss;

        bzero(&ss, sizeof(ss));

        sa.sa_handler = NULL;
        sa.sa_mask = ss;
        sa.sa_flags = 0;
        sa.sa_sigaction = *func;

        if (sigaction(SIGHUP, &sa, NULL))
                printf("sigaction() failed\n");
        kill(getpid(), 1);
        return 0;
}

void func (int sig, siginfo_t *si, void *v) {
        printf("signal handler\n");
        return 0;
}

Then, using gdb on IRIX, we can break at the beginning of the signal handler:

$ gdb ./signal11
Breakpoint 1 at 0x400e38: file signal11.c, line 27.

(gdb) r
Starting program: /tmp/signal11

Program received signal SIGHUP, Hangup.
0xfa455a4 in _kill () at kill.s:15
15      kill.s: No such file or directory.
(gdb) c
Continuing.

Breakpoint 1, func () at signal11.c:27
27              printf("signal handler");
(gdb) info registers
          zero       at       v0       v1       a0       a1       a2       a3
 R0   00000000 fffffffe 00000000 00023dd5 00000001 00000000 7fff2b60 00400e28
            t0       t1       t2       t3       t4       t5       t6       t7
 R8   0fb4f9b0 00000000 0fb4e931 00000042 00000001 0000001d 0faee284 ffffffff
            s0       s1       s2       s3       s4       s5       s6       s7
 R16  00023f14 7fff2f74 7fff2f7c 7fff2fc8 00000000 00000000 00000000 00000000
            t8       t9       k0       k1       gp       sp       fp       ra
 R24  00400e10 00400e28 00000000 00000014 1000c040 7fff2b10 00000000 0faee2e8
            pc    cause      bad       hi       lo      fsr      fir
      00400e38 00000024 00000000 0000005c 0000015b 00000000 00000000
(gdb) x/32w $sp
0x7fff2b10:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2b20:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2b30:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2b40:     0x00000000      0x00000000      0x00000000      0x0fb4f9b0
0x7fff2b50:     0x00000000      0x00000001      0x7fff2b60      0x7fff2b60
0x7fff2b60:     0xffffffff      0x00000000      0x00000000      0x0fa455a4
0x7fff2b70:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2b80:     0x00000000      0x00000000      0x00000000      0x00023dd5

In the A0, A1 and A2 registers we have the arguments to the signal handler. A2 points to the struct sigcontext, and we can therefore start to figure out how the stack is set up for signal delivery.

It is interesting to discover where the different pointers go. For example, at 0x7fff2b3c, we have a pointer to 0x0fb4f9b0. gdb can tells us it points to:

errno: (gdb) x/1w 0x0fb4f9b0 0xfb4f9b0 <errno>:  0x00000000

By running the same program with SA_SIGINFO enabled, we have a slightly different result (now A2 points to 0x7fff2b10)

0x7fff2ac0:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2ad0:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2ae0:     0x00000000      0x00000000      0x00000000      0x00000000
0x7fff2af0:     0x00000000      0x00000000      0x00000000      0x0fb4f9b0
0x7fff2b00:     0x00000000      0x80000001      0x00000000      0x7fff2b10
0x7fff2b10:     0x0000000f      0x00000000      0x00000000      0x00000000
0x7fff2b20:     0x00000000      0x00000000      0x7fff0000      0x00006b58
0x7fff2b30:     0x00000000      0x00000000      0x00000000      0x00000000

Starting at 0x7fff2b10, we now have a struct irix_ucontext. The big difference is that at 0x7fff2b08 we now have a NULL pointer whereas we had a pointer to the struct irix_sigcontext. At 0x7fff2b04 we have the signal number. We end up with the following signal stack frame:

struct irix_sigframe {
     int isf_pad1[7];
     int *isf_uep;   /* 0x7fff2afc Pointer to errno in userspace */
     int isf_errno;  /* 0x7fff2b00 errno value */
     int isf_signo;  /* 0x7fff2b04 signal number */
     struct irix_sigcontext *isf_scp; /* 0x7fff2b08 sigcontext pointer */
     struct irix_ucontext *isf_ucp;   /* 0x7fff2b0c ucontext pointer */
     union {                          /* 0x7fff2b10 sigcontext/ucontext */
         struct irix_ucontext iuc;
         struct irix_sigcontext isc;
     } isf_ctx;
};

Note that it is impossible yet to guess that the isf_errno field holds the errno value. We will discover this a bit later.

In the CPU register values, the content of register RA is also interesting: this is the address of the signal trampoline. An x/20i $ra command typed in gdb would show it, but I cannot reproduce it here because of the copyright SGI holds on it. There are several things to note about this signal trampoline.

First, dumping it would show that it does not start at RA but a few instructions before. If you happen to run the above test program, you can see the whole IRIX 6.5 signal trampoline by issuing an x/32i $ra-100. This suggests two things: there are several entry points to the signal trampoline, or the signal trampoline is invoked before the signal handler.

It is easy to find out what is happening: we just have to set a breakpoint at the beginning of the signal trampoline and run the test program again (do not pay attention to the file names, gdb is obviously confused):

(gdb) b *$ra-100
Breakpoint 2 at 0xfadce9c: file iconv_converter.c, line 21556.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /tmp/signal10

Program received signal SIGHUP, Hangup.
0xfa44cc0 in kill () at regcomp.c:575
575     regcomp.c: No such file or directory.
(gdb) c
Continuing.

Breakpoint 2, <signal handler called>
(gdb)

We do break at the beginning of the signal trampoline. The signal trampoline is therefore executed before the signal handler, it calls the signal handler, and then sigreturn().

Another thing to note is that the signal trampoline is not on the user stack. In fact, on IRIX, gdb is kind enough to gives us its symbol name when we dump it using x/32i $ra-100: _sigtramp.

We can check that this symbol is defined in libc itself:

$ nm -D libc.so.1 | grep sigtramp
0faee444 A _sigtramp

On IRIX, the signal trampoline is therefore provided by the libc. This is another big difference from NetBSD.

The IRIX signal trampoline is also quite big compared to NetBSD's native one. Copyrights forbid us to reproduce it, but at least we can tell what it does (and this will probably say more to most readers). The signal frame is called sf here, it is as defined earlier.

This is where we discover that sf.isf_errno holds the errno value. We also discover that the kernel transmits the information that we use a struct irix_ucontext instead of a struct irix_sigcontext by setting the higher bit of A0 on signal handler invocation.

The function irix_sigreturn can discover if the context is to be restored from a struct irix_ucontext instead of a struct irix_sigcontext when sf.isf_scp is NULL. sf.isf_ucp then points to the struct irix_ucontext.

Now that we know everything about the signal trampoline. What should we do with it? Well, just use it. In sys/compat/irix/irix_signal.c:irix_sendsig() instead of returning to the signal handler, with the return address set to a signal trampoline on the stack (which is a NetBSD flavor signal delivery), we can return to the signal trampoline and let it do all the work for us. This behavior is much closer to the way signals are handled on IRIX.

The only problem is that the kernel does not have the signal trampoline location. The signal trampoline is in a userland program, and we do not have any simple way of digging it out. It is not possible to assume that we can find it at a hard coded place in user memory, since it will change with libc versions.

To learn the signal trampoline address, we can run a shell command such as:

$ nm -D libc.so.1 |grep _sigtramp
0faee444 A _sigtramp

If a shell command can do it, the kernel can do it. A first idea could be to find the address the same way we do it from the shell: by looking up the symbol table. This means loading the ELF section containing the symbol table, then walking through the table looking for our symbol, et voila.

This method could work, but it is still extremely weak. First, looking up the symbol is time-consuming. Second, it means importing into the kernel a lot of user level code, thus increasing kernel size and increasing the odds of creating kernel bugs. Third, it is not trivial to find The Right Place for this symbol check in the kernel sources. Doing it that way would be a nasty hack.

I could not imagine that it was done in such an ugly way in the IRIX kernel. There must have existed some way for the program to inform the kernel of the signal trampoline address.

After some investigation, this appears to be true: Each time the sigaction(2) system call is invoked on IRIX, the libc gives the signal trampoline address as a fourth argument to sigaction(4):

/* signal10.c -- a sigaction tester */
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

void func (int, siginfo_t *, void *);

int main (int argc, char** argv) {
        struct sigaction sa;
        sigset_t ss;

        bzero(&ss, sizeof(ss));

        sa.sa_handler = NULL;
        sa.sa_mask = ss;
        sa.sa_flags = 0;
        sa.sa_sigaction = (void *)func;

        if (sigaction(SIGHUP, &sa, NULL)) {
                printf("signal() failed\n");
                return -1;
        }
        printf("sigactionl() successful. Now sleeping\n");
        kill(getpid(), 1);
        return 0;
}

void func (int sig,siginfo_t *si, void *v) {
        return;
}

$ gdb ./signal10
(gdb) b sigaction
Breakpoint 1 at 0x400b90
(gdb) r
Starting program: ./signal10
Breakpoint 1 at 0xfa40df0: file sigaction.c, line 25.

Breakpoint 1, _sigaction () at sigaction.c:26
(gdb) x/5i $pc
0xfa40df0 <_sigaction+12>:      lw      $v0,-31524($gp)
0xfa40df4 <_sigaction+16>:      addiu   $sp,$sp,-32
0xfa40df8 <_sigaction+20>:      sw      $ra,28($sp)
0xfa40dfc <_sigaction+24>:      lw      $v0,0($v0)
0xfa40e00 <_sigaction+28>:      sw      $gp,24($sp)
(...)

No system call here. In fact the sigaction() function in libc calls another libc function. By tracing in gdb with the stepi command, we can discover it. The other way is to think that the name could be related:

$ nm -D /usr/lib/libc.so.1|grep sigaction
0fa40d94 A _ksigaction
0fa40dd4 A _sigaction
0fa40d94 W ksigaction
0fa40dd4 W sigaction

If in gdb we attempt a break on ksigaction...

$ gdb ./signal10
(gdb) b ksigaction
Breakpoint 1 at 0xfa40da4: file ksigaction.s, line 23.
(gdb) r
Starting program: ./signal10

Breakpoint 1, _ksigaction () at ksigaction.s:23
23      ksigaction.s: No such file or directoryCurrent language:  auto;
currently asm
(gdb) x/4i $pc
0xfa40da4 <_ksigaction>:        li      $v0,1162
0xfa40da8 <_ksigaction+4>:      syscall
0xfa40dac <_ksigaction+8>:      bnez    $a3,0xfa40dbc <_ksigaction+24>
0xfa40db0 <ksigaction+12>:     nop
(gdb) info reg
          zero       at       v0       v1       a0       a1       a2       
a3
 R0   00000000 ffffffe0 00000000 00000000 00000001 7fff2f30 00000000 
0faee284
(...)

(gdb) x/4i $a3
0xfaee284 <_sigtramp>:  lui     $t0,0x8000
0xfaee288 <_sigtramp+4>:        addiu   $sp,$sp,-48
0xfaee28c <_sigtramp+8>:        and     $t0,$a0,$t0
0xfaee290 <_sigtramp+12>:       sw      $a0,36($sp)

We've got it. A3 holds the fourth argument to the system call. Once we know where the signal trampoline is, our only problem is to remember it between the time we are given it by sigaction(2) and the time we have to use it on signal delivery. We cannot use a kernel global variable since we may work with several processes which use different libc, and hence different sigramp addresses. We must store a per process view of this information.

This is done using a field of the struct proc. Each process is described by a proc structure, which is defined in sys/sys/proc.h. In this structure, we find various information such as the process credentials, a pointer to the process' struct emul, the process pid, and so on. There is a p_emuldata field, which is a pointer to emulation specific data. Each emulation subsystem is free to use it to store per process emulation specific data.

We just have to define a struct irix_emuldata with an appropriate field to store the signal trampoline address. Then we must allocate this structure at process creation time and set the struct proc p_emuldata field pointing to it. At process termination time we must de-allocate the struct irix_emuldata, to avoid memory leak in the kernel. To do this we need a hook at fork, exec and exit time. The emulation subsystem provides a way of doing this without having to provide emulation specific versions of these system calls: in struct emul, we have 3 fields named e_proc_exec, e_proc_fork, and e_proc_exit, which point to 3 functions invoked on exec, fork and exit.

Thus, we can define the 3 functions: irix_e_proc_exec(), irix_e_proc_fork(), and irix_e_proc_exit(), fill the fields in the emul_irix_o32 and emul_irix_n32 struct emul (if you don't remember them go back to article 2 of the series), and we have enough support for allocating and freeing our irix_emuldata structures.

Related Reading

C Pocket Reference
By Peter Prinz, Ulla Kirch-Prinz

Once all of this was done, we achieve a very good level of compatibility with IRIX signal delivery. Additionally this makes the code in irix_sendsig() and irix_sigreturn() much simpler, since it does not have to handle most of the signal frame anymore: the signal trampoline does this. We are even able to successfully emulate signal delivery for binaries such as autocad which provide their own signal trampoline.

One other interesting thing to note is that since that code was written, Jason Thorpe implemented signal trampolines provided by libc for NetBSD native processes, thus adopting the same scheme IRIX used.

The libc provided signal trampoline was adopted in NetBSD because it removes the need to execute code on the stack. Memory pages mapped on the stack can therefore be made non executable (the Memory Management Unit of all modern CPU are able to enforce such rules), and we are able to fix a whole class of security problems. With a non executable stack, it is not possible anymore to exploit a buffer overflow on a local variable by executing some user-supplied code stored on the stack.

In future articles we will move on to multi-threading support in IRIX binaries.

Acknowledgements

I would like to thank David Brownlee, Hubert Feyrer, and John Kloss for reviewing this paper, and of course all the NetBSD developers that spent some time answering my kernel questions on the project's mailing lists.

References

NetBSD-current kernel sources
The NetBSD project, 1993-2001.

ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/tar_files/src/sys.tar.gz
Via CVSWeb: http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/

NetBSD man pages
The NetBSD project, 1993-2001.

System V Release 4 Application Binary Interface: MIPS processor supplement
The Santa Cruz Operation, Inc, 1990-1996.

Linux Compatibility on BSD for the PPC Platform
Emmanuel Dreyfus, 2001

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.