Published on (
 See this if you're having trouble printing code examples

Introduction to Stackless Python

by Cameron Laird

A multi-player game

The Stackless stew is bubbling so rapidly, it's hard for casual observers to keep track of the action. To understand Stackless, it helps to keep in mind the principal actors involved:

Melvin E. Conway -- author of seminal 1963 paper on coroutines.

Mike Fletcher -- author of very large collaborative virtual reality system relying on microthreads.

Steven Majewski -- on staff with the department of Medicine-Molecular Physiology at the University of Virginia. Important contributor to early development of Python. Author of mid-90s analyses of generators, continuations, and coroutines in Python.

Gordon McMillan -- author of Stackless-related Python Enhancement Proposals. Independent consultant, "heavily using Stackless since May" 2000 in coroutine-based production applications.

Tim Peters -- long-time Python contributor and "insider." Recently hired by for first paid Python work in his career. Deeply knowledgeable about abstract language concepts. Recommended continuation as right generalization for coroutine-enabled re-implementation. Author of 1994 paper on coroutines in Python.

Sam Rushing -- independent consultant. Wants coroutines, particularly for his work with "massively parallel" messaging systems, partially financed by eGroups. Source of Tismer's initial inspiration.

Christian Tismer -- independent consultant. Inventor of Stackless Python.

Guido van Rossum -- Python's inventor. Wary of Stackless's consequences. Recently hired by BeOpen to head

Just van Rossum -- programming typographer. Coauthor of microthreading implementation. Stackless enthusiast. Brother of Guido van Rossum.

Will Ware -- programmer and designer with Xionics Document Technologies, with experience both "close to the silicon" and on very large software projects.

Do you want your favorite language to be smaller, faster, more flexible, and safer? That's Stackless Python's claim.

Stackless Python is an alternative implementation of Python created by independent developer Christian Tismer. He started with the conventional Python language processor managed by the language's inventor, Guido van Rossum, and patched his own Stackless invention in place of a small but central part of Python's internals. Stackless Python is the result. This article introduces Tismer's technology and its significance. In future articles, you'll be able to read about how to make your own start at programming Stackless Python, as well as the prospects for a merger between Stackless and the main Python distribution.

Why Stackless

Like most interesting technologies, the travelers on the Stackless voyage bring several separate but closely linked stories. Crew members have strikingly different motivations and knowledge. So far, though, they have pulled together for a relatively smooth ride. The sidebar at the right hints at how different ideas have been unfolding all through the decade, finally to find a common vehicle in Stackless Python.

Tismer is the adventurer in this tale. He created Stackless because he wanted to see if he could do something that "seemed impossible" when he first heard about it. He ran across the idea initially in the python-dev mailing list for technical discussions of Python implementation. There, in early 1999, Sam Rushing and others made the point that the current Python design did not support coroutines well. Coroutines are a concurrency construct, that is, a model for managing computations that should happen without necessarily having to wait on other work a computer needs to do. Threads and events are alternative formulations of concurrency or multi-processing. Python, along with Java and other languages, directly supports a threading model. Threads are commonly taught in college programming courses.

However, coroutines are widely viewed as "safer" than threads, because they seem to demand less expertise and care for correct use. Melvin E. Conway introduced coroutines in 1963 to model symmetric computation, where each coroutine operates with the same "rank." This contrasts with our usual procedural codings, with one method in charge of calling another. Languages such as SIMULA and Modula make good use of coroutines, and special-purpose libraries make coroutines available for C++ and Java.

Rushing ran into a technical limitation of Python's implementation, having to do with the way the Python byte-code interpreter keeps the information it needs as control flows from one method to another. The standard Python distribution mixes

in the same "stack."

Holding two different kinds of information in the same stack simplified Python's early development. It makes concurrency difficult, though. Even before Rushing's experiments, Python experts had bumped into this limitation several times, as, for example, when Steven Majewski had worked in the mid-'90s on "generators," close relatives of coroutines. Generators and microthreads are two more multiprocessing or concurrency models that we'll see Stackless Python enables.

Enter continuations

In the course of describing the difficulties as precisely as possible, python-dev contributors Rushing and Tim Peters emphasized continuations. A continuation can be viewed as the most primitive or fundamental multiprocessing construct. All the others -- generators, coroutines, threads, and so on -- have straightforward expressions in terms of continuations.

While Tismer didn't know what continuations were when he started reading this mailing list thread, the challenge was the kind that appealed to him. He read the literature on multiprocessing control structures. He looked at previous work with Python. He studied the layout of the Python "frames," that is, the segments of the stack where information about different invoked methods is kept.

Eventually, Tismer convinced himself he could do precise surgery on Python to transplant in a new calling-frame manager that operates outside the conventional stack. As he likes to say, "after heavily brainwashing myself, it was finally quite easy."

Tismer presented "Continuations and Stackless Python" to the Eighth International Python Conference just after New Year 2000. He had in hand patches for Python that eliminated the stack-frame limitations, and he could offer Stackless to those who wanted to use it.

Layers of Stackless

Keep in mind that "Stackless Python" means different things to different people. For Tismer, Stackless Python is an internal or implementation matter. With changes to "just a small number of C modules," he has decoupled the frame stack from the C stack. That work is done and useful now.

Tismer continues to pour effort into Stackless to improve several details. Stackless is already entirely compatible with existing "pure" Python applications -- those developed exclusively in Python. Many Python applications, however, involve fragments coded in other languages, generally C. Python's willingness to "play nicely" with other languages is one of its principal virtues for many programming teams.

The Stackless Python of summer 2000 has a slightly different interface for cross-language work than conventional Python does. Some mixed-language projects won't allow Stackless Python simply to "drop in" as a replacement. One of Tismer's tasks is to rework Stackless Python's internals so that it fits better in these existing mixed-language projects.

Tismer is also searching for improved abstractions that will better support complementary high-level constructs such as coroutines. Should an official coroutine be implemented as a layer above continuations, or as an "independent" interface? What precise definition for coroutines best fits the Python style?

Apart from these details, Stackless is significant purely as an implementation matter. Conventional Python limits recursion depth, because it keeps so much information on the stack. Large-scale applications that nest or recurse deeply can fail simply through exhaustion of the stack. Stackless Python is much less constrained in this regard, with almost all of its state information kept in heap memory. Also, concurrent algorithms that exploit Stackless capabilities exposed as coroutines run at up to three times the speed of the corresponding Python codings in terms of threads.

The implementation change has several layers of consequence, though. The main programming interface Tismer has exposed up to this point is the continuation module. While generally regarded as too abstruse for working Python application developers, continuation is just the right building block for such system-level programmers as Gordon McMillan and Will Ware. With continuation as their base, they've layered new concurrency definitions that appear to perform much better than threads and are also easier to understand. This kind of experimentation, going on through 2000, has been so successful that several applications are already in production.

When Tismer talks about Stackless, he's generally thinking about the Python interpretive kernel, and how to strengthen its architecture. This makes possible the innovative control structures that interest McMillan and Rushing. Their Python-level module definitions, in turn, provide coroutines, microthreads, and other upper-level concurrency models that bear on practical programming problems.


Will Ware's microthreads, for example, journey a long way from the "ivory tower." Scripting languages are rightly becoming popular implementation vehicles for games. The higher-level constructs in common scripting languages encourage customization and extensibility at a pace that's crucial in gaming, where time-to-market is so important. Performance is also essential, though, and conventional threading is just too clumsy to satisfy the requirements typical in game programming.

Microthreads are a perfect answer. In a description Ware co-authored:

Microthreads are useful when you want to program many behaviors happening simultaneously. Simulations and games often want to model the simultaneous and independent behavior of many people, many businesses, many monsters ... With microthreads, you can code these behaviors as Python functions. ...

Microthreads switch faster and use much less memory than OS threads. You can run thousands of microthreads simultaneously.

What's the relation between these benefits and Stackless's implementation details? Here's a quick sketch:

Continuations are the general-purpose concurrency construct. A continuation represents all the future computations of a particular program. Capturing all this control flow in a single conceptual object makes it programmable: It becomes possible to calculate or reason over the control flow. In particular, there's great scope for optimizing assignment of different calculations to different processes or threads or even hosts.

The example Tismer likes is one he credits to Jeremy Hylton: Think of the simple program

     x = 2
     y = x + 1
     z = x * 2

For this example, "the continuation of x = 2 is y = x + 1; z = x * 2. ... [E]very single line of code has a continuation that represents the entire future execution of the program."

Continuations are sufficiently general that they can model threads efficiently. As it happens, the threads native to most operating systems (OSs) have accumulated a lot of baggage through the years. Legend advertises threads as "lightweight." They're supposed to be much nimbler than processes, for example. This is no longer true, though. Common industry practice has caused threads and processes to converge in their resource demands. Threads are not particularly lightweight.

Think for a minute about an ideal thread, not the one leading OSs provide, but a minimal flow of execution burdened with as little communication and synchronization overhead as possible. That's a microthread, and continuations can model microthreads as well as they model any other concurrency structure.

This is important. It's well known among experts that common multi-threaded Web servers, for example, can gag when they spawn more than a modest number of threads -- fifty, in common cases. Stackless microthreads not only possess all the programming ease Python delivers, but even mediocre equipment handles thousands and tens of thousands of microthreads.

It's not just reliability and performance that Stackless gives microthreads. Stackless should also make microthreads and related control structures "serializable," or persistent: They will have a representation that allows state to be saved and restored. At the level of game-playing, it means that there will be a thoroughly natural and complete way to pause or save any simulation object, or even to relocate it to a different process or host. A game player might choose to pack his identity away, and then re-start again at the same point months later. While serialization isn't available in a finished form, it's already in Tismer's plans. The Python jargon for serialize is "pickle." As Tismer puts the situation, "Microthreads are easier to pickle than general continuations. . . . [M]icrothreads are most likely to be the first things we will be able to pickle." Early experiments suggest it'll work nicely, once the API becomes more definite.

Join in the fun

The challenge of this introduction has been to explain Stackless' promise for the future, without losing contact with its present reality. To help bring the abstractions back into proper focus, tune in again next week for an article on Stackless Programming. There you'll see how easy it is for you to start to practice with Stackless Python for your own work.

Cameron Laird is the vice president of Phaseit, Inc. and frequently writes for the O'Reilly Network and other publications.

Discuss this article in the O'Reilly Network Python Forum.

Return to the Python DevCenter.


Copyright © 2009 O'Reilly Media, Inc.