Linux Compatibility on BSD for the PPC Platform: Part 3
Pages: 1, 2, 3
Tuning: Fixing system-call-specific issues
A simple bug fix: ioctl() issues
Now the time has come to try running real Linux binaries, and see what happens. We discover many small problems here. For example, the Linux
ioctl() TIOCGETA and TIOCGWINSZ fails without any reason.
ioctl() is used to make non-standard operations on devices. It is widely used to get and set terminal parameters. For example, ioctl() TIOCGETA is used to get the terminal's struct termios, and ioctl() TIOCGWINSZ is used to get the terminal window size. If you need more information about ioctl(), refer to the ioctl(2) man page.
After some investigation with ktrace(1), it is obvious that the ioctl com argument was wrong: Linux tried to do a ioctl() TIOCGETA, and NetBSD understood another ioctl() (and thus, it failed). This is caused by a struct linux_termios mismatch.
The ioctl com parameters are calculated on the ioctl type (read, write, read/write, or nothing), its group (the letter in the ioctl definition), its number, and the size of the third argument to ioctl(). Here the problem is that in our NetBSD definition, the struct linux_termios is not the same size than the real Linux's struct termios. This happens because the struct linux_termios is defined in sys/compat/linux/common/linux_termios.h. It is considered to be architecture-independent, but it is not. Moving the definition to an architecture-dependent file fixes the problem.
One fake bug: lstat() issues
There are also fake problems. For example, lstat() fails with glibc-2. A program build on a glibc-1 LinuxPPC system worked fine on the Linux system with glibc-1, but it broke on NetBSD when using glibc-2. If I had a glibc-2 LinuxPPC system to try out my binary built on a glibc-1 LinuxPPC system, I would have been able to understand quickly that the failure was normal: A program using lstat() and dynamically linked against glibc-1 cannot work with glibc-2. Let's study why it failed.
glic-2.1.3 sources are available here.
Alternatively, you can browse the source using CVSWeb.
Here is a simple program that tests lstat(). It was build on a LinuxPPC system that uses glibc-1.
/*
* lstat.c - A lstat() tester
*/
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
int main (int argc, char **argv) {
const char *file_name = "/etc";
struct stat buf;
int res;
if (argc >= 2)
file_name = argv[1];
res = lstat (file_name, &buf);
if (res < 0) {
printf ("res=%d file_name=%s &buf=0x%lx\n", res, file_name, &buf);
perror ("lstat() failed");
exit (-1);
}
return 0;
}
Now, if we try to use the libc-2.1.3, the same binary will fail. According to the kernel trace, the lstat() system call is successful, but the program gets a -1 return value (errno set to EINVAL). The modification of the result is done with glibc glue. Looking at glibc-2.1.3 sources, we discover there is a mechanism for dealing with the multiple versions of the struct stat that exists on
the Linux system (Linux-2.4 defines a struct old_kernel_stat and a struct stat). glibc has to detect the version of the stat structure expected by the program, and if the kernel does not provide that structure, it has to convert it. Here's how it works:
lstat()is defined inglibc/io/lstat.c, and it calls__lxstat(), with_STAT_VERas the first argument. This function gets statically linked into the executable, and therefore the_STAT_VERparameter is hard-coded into the executable with a value specific to thestruct statthat is expected. When linking withglibc-1.99, the value is 0.__lxstat()is defined inglibc/sysdeps/unix/sysv/linux/lxstat.c, it tests the first arguments (it calls itvers), return if it is_STAT_VER_KERNEL, or callsxstat_conv, giving itversas first argument if not (xstat_convis called with_STAT_VER). The call fromlstat()to_lxstat()is dynamic.__lxstat()compares theversversion to_STAT_VER_KERNELthat is specific to the current kernel'sstruct stat. Onglibc-2.1.3, this value is "3."xstat_conv()is defined inglibc/sysdeps/unix/sysv/linux/xstatconv.c. Its job is to convert the kernel'sstruct statinto what the executable expects. It checks two possibilities about theversparameter:- If it is equal to
_STAT_VER_KERNEL, just return - If it is equal to
_STAT_VER_LINUX, thestruct old_kernel_statis converted to astruct stat, and we return. - Otherwise, return an error (
EINVAL).
- If it is equal to
Obviously, when running on a glibc-2 system -- a binary linked with glibc-1 -- we are hitting the "otherwise" case in xstat_conv(). The conclusion is that glibc-2 does not expect the user to use lstat() in a binary built for glibc-1. Building the binary on a glibc-2 Linux system fixes the problem, and the binary works fine with NetBSD's Linux emulation. There was no fix to do in the NetBSD emulation code, so we could consider it a glibc-2 bug.
open() unable to create files
This is a really annoying bug: The bug causes open() to ignore the O_CREAT flag. Therefore, open() system calls requiring a file creation fail because the file does not exist. The reason is silly: In Linux's fnctl.h, the O_CREAT flag definition is like this: #define O_CREAT 0100. Looking at it, if you do not use C octal notation every day, you may think that this is a hexadecimal value, and that the Linux code adds the leading "0x" where it needs to use this value. Therefore, you might write this in NetBSD's linux_fcntl.h file: #define LINUX_O_CREAT 0x0100
If you use octal notation, just remember that in C, "0100" means 100 in octal, which is 40 in hexadecimal. You may think this the silliest mistake described in this document. Well, I did it so I hope this section will be useful for people who have forgotten how to define an octal value in C.