BSD DevCenter
oreilly.comSafari Books Online.Conferences.


IRIX Binary Compatibility, Part 5
Pages: 1, 2, 3

Lists, Locks and Reference Counts

But things are not that simple. For various reasons, we need to be able to walk through the threads list in a share group. Therefore, we need a chained list of processes in the share group.

There are handy macros in the NetBSD headers <sys/queue.h> that are really helpful here. Documentation can be found in the queue(3) manual page. There are macros to declare the list, to walk through it, to insert elements in it, and so on. All the pointer arithmetic is done in the macro, and the programmer can focus on the actual use of the list. This is wonderful.

Creating the list is easy. To store the pointers to next/previous elements in the list, we can use a field of the emulation specific part of struc proc (it is pointed to by the p_emuldata field, and we created a struct irix_emuldata to describe it. Check the previous article about signal emulation if you need details about this). The only problem is deciding where to store the head of the list. Storing it in the irix_emuldata structure is not a good idea, since we have no guarantee at all that the processes holding the head of the queue will not terminate before the others.

The solution is to allocate a structure, which describes the share group. It is defined in sys/compat/irix/irix_prctl.h and is called struct irix_share_group.

/* IRIX share group structure */
struct irix_share_group {
        LIST_HEAD(isg_head, irix_emuldata) isg_head;    /* list head */
        struct lock isg_lock;                           /* list lock */
        int isg_refcount;

Here we have the list head, defined using the LIST_HEAD macro, one lock on the list and a reference counter.

The List

The list declaration will expand in the declaration of a structure. The first argument to LIST_HEAD is the structure name. This name does not really matter; we use the name of the field in struct irix_share_group, as this is more convenient. The second argument does matter; it is the type of the list elements. Here we specify that our list is made of struct irix_emuldata elements. (Note that 'struct' is omitted. The macro will add it.)

In the struct irix_emuldata, we have one field that holds the list pointers. The declaration is done with another macro:

LIST_ENTRY(irix_emuldata) ied_sglist;

The Lock

The lock is used because several processes could try to manipulate the list at once. This can of course happen on a multiprocessor system, where the two processors may be in kernel mode at the same time, trying to manipulate the same kernel structure. It can also happen on a uniprocessor system, too. Imagine we have to walk the share group list to apply some collective operation. For some reason this operation may cause the process to sleep, awaiting some particular event. During the sleep, other processes will be scheduled, and they may try to modify the list too. When our first process wakes up, it may be working on a list element that no longer exists, which is quite likely to cause a panic.

The solution to this is to use a lock. Locks are manipulated through the lockmgr(9) function. Simplest uses are to get a shared or exclusive lock and to release a lock. Here is an example of lockmgr(9) use:

(void)lockmgr(&lock, LK_SHARED, NULL);
/* Do something */
(void)lockmgr(&lock, LK_RELEASE, NULL);

We use shared locks when we want to read the list. Several processes may be doing this at once. They will all ask for a shared lock and get it.

Exclusive locks are requested when we want a process to modify the list. Upon exclusive lock request, lockmgr(9) will sleep until all shared locks are released, which guarantees that no reading process is still reading. Once exclusive lock is acquired by one process, any other process calling lockmgr(9) for a shared or exlcusive lock will go to sleep, thus ensuring that no process can read while our process is writing.

The Reference Count

The reference count enables us to track the irix_share_group structure usage. When a process is added to the list, we increase the count. When a process is released, we decrease the count. When the count hits zero, we can free the structure, since nobody is using it anymore.

One of the first jobs of irix_sproc() is to check if the irix_share_group structure has already been allocated. If this is not the case, the structure is allocated and the calling process is added to the list. The child process will be added later, as we cannot add it before it is actually created.

Parent and Child Life in the Kernel

fork1() has an argument named func which is used to specify the child entry point. This can be a NULL pointer if we want to return immediately to userland, or it can be the address of a kernel function used to perform some custom tasks before returning to userland. Since we need to arrange the child environment and to add it to the share group list, we use this feature. The child entry function is called irix_sproc_child() and is defined in sys/compat/irix/irix_prctl.c.

It is worth mentioning that the child entry function must be a kernel function. Passing a userland address here will cause a panic. To return to a custom userland address after fork1(9), the solution is to specify a kernel child entry function and to modify the child registers saved onto the stack in that function so that, on return to userland, the process will execute our userland function.

While executing irix_sproc_child(), we need some information from the parent. (The kernel stores all fundamental information about a process in the proc structure, so we read that.) This can lead to some problems if we just let the parent exit irix_sproc() while irix_sproc_child() is executing. If the parent returns to userland and catches a deadly signal, our child can be left manipulating a stale proc structure. This is very likely to cause a panic.

This situation can arise easily in a multiprocessor system where the second CPU will execute the child while the first CPU executes the parent. But it can happen on a uniprocessor system as well. If the child sleeps awaiting some resource, the kernel may schedule the parent to run and resume the child later.

The solution is to force the parent to sleep on a flag and make the child awaken the parent when it will no longer be fatal for the child to be orphaned. The address of the flag is passed to the child using the arg argument to fork1(9). This argument is a pointer that gets passed to the child entry function. For irix_sproc_child(), we defined it as a pointer to the irix_sproc_child_args structure (defined in sys/compat/irix/irix_prctl.c):

struct irix_sproc_child_args {
        struct proc **isc_proc;
        void *isc_entry;
        void *isc_arg;
        size_t isc_len;
        int isc_inh;
        struct proc *isc_parent;
        struct irix_share_group *isc_share_group;
        int isc_child_done;

The structure is allocated in irix_sproc() and is used to transmit a lot of information between the parent and the child: the child proc structure pointer (isc_proc), the entry, arg, len, and inh arguments that were given to the sproc(2) system call (isc_entry, isc_arg, isc_len, and isc_inh), the parent proc structure address (isc_parent), the share group structure address (isc_share_group), and the flag on which the parent is sleeping, awaiting for the child to finish building itself (isc_child_done).

Now that you have the whole picture, let us move to something even more interesting. irix_sproc_child() needs to set up the stack and registers just like IRIX does. Of course this is not documented anywhere, and this is why things are getting more interesting. Time has come to reverse engineer IRIX's sproc(2) implementation. But this will come in the next part, so stay tuned.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.

Return to the BSD DevCenter.

Sponsored by: