ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Getting Loopy with Python and Perl

by Aahz
06/27/2002

This article is based in part on my O'Reilly Open Source Convention 2002 tutorial, "Python for [Perl] Programmers/" However, this article includes more Perl and Python comparison than I've included in the tutorial. My tutorial targets experienced programmers of all sorts, with the non-Python examples drawn from Perl. In this article I'll be comparing Python's loop constructs to Perl's.

Perl has two basic looping constructs: while and for/foreach. This doesn't count variations, such as "<statement> until EXPR" or do...while, nor the fact that for/foreach has two different forms. Python also has only two looping constructs: while and for. Unlike Perl, Python's loops have no variations; instead, the for loop uses a special protocol that generalizes well. Both Perl and Python have functional constructs that loop over a sequence, but that's outside the scope of this article.

The variation in available loop constructs exemplifies the basic difference between Perl and Python: Perl's motto is TMTOWTDI (There's More Than One Way To Do It), whereas Python's counter-motto is "There's Only One Way." Python's motto is the short form of one element of Python's design philosophy: "There should be one--and preferably only one--obvious way to do it." To see the rest of Python's design philosophy, start the Python interactive interpreter (Python 2.1.2 or later) and type "import this".


O'Reilly Open Source Convention -- July 22-26, San Diego, CA.

From the Frontiers of Research to the Heart of the Enterprise

Aahz will present Python for [Perl] Programmers Monday July 22nd at the 2002 O'Reilly Open Source Convention. Don't miss out, you still have time to Register!

So how do Python's looping constructs actually work? Let's start with a basic Perl idiom:

   
    while (<STDIN>) {
        print;
    }

and compare it to the equivalent Python idiom:

    
    import sys
    for line in sys.stdin:
        sys.stdout.write(line)

The main thing to notice is that Python uses a for loop instead of a while loop. There are two reasons for this:

Prior to Python 2.2, the loop would have been written like this:


    import sys
    while 1:
        line = sys.stdin.readline()
        if not line:
            break
        sys.stdout.write(line)

In general, Python's for loop works much like Perl's for/foreach list form. In order to loop over a sequence of numbers, you need to produce a list:


    for i in range(10):
        print i

This will print the numbers from 0 through 9. Like Perl arrays, Python lists are zero-based, and the range() function caters to that. To prove that range() is in fact creating a list of numbers, fire up the Python interpreter:

    
    >>> range(10)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Contrast this with Perl's standard indexing loops:

    
    for ($i=0; $i<10; $i++) {
        print "$i\n";
    }

or

    
    foreach $i (0..9) {
        print "$i\n";
    }

Note particularly how range() specifies a value one higher than the maximum index; this makes it easy to use with the len() function. Generally speaking, range() is fast enough and consumes little enough memory that generating an entire list doesn't hurt. But if you're worried about that, Python does have the xrange() function that produces one number at a time.

Aside from the lack of assignment, Python's while loops function almost identically to their Perl counterparts:

Perl:

   
    $done = 0;
    while (!$done) {
        $input = getInput();
        if (defined($input)) {
            process($input);
        } else {
            $done = 1;
        }
    }

Python:


    done = False
    while not done:
        input = getInput()
        if input is not None:
            process(input)
        else:
            done = True

Note that False is a new, built-in value in Python 2.2.1 (in Python 2.3, boolean operations will return True/False instead of 1/0). In general, any empty value generates a false truth value: False, None, 0, "", (), [], {}, and class instances with __nonzero__() or __len__() methods that return 0.

Note also that in Perl, if getInput() starts returning a hash or array instead of a scalar, the while loop needs to be modified, whereas the Python loop will continue to work fine with a dict or a list. That's because in Python, everything is done with reference semantics (called "binding" in Python because one does not access references directly); one can get the same effect in Perl by explicitly using references.

In addition to iterators, Python 2.2 added generators. Generators are functions that return an iterator. It may help to think of them as something like resumable closures. Here's a subset implementation of grep:


    from __future__ import generators
    import sys
    import re

    def grep(seq, regex):
        regex = re.compile(regex)
        for line in seq:
            if regex.search(line):
                yield line

The "yield" keyword turns an ordinary function into a generator; the generator returns an iterator that wraps the generator. Each time the yield executes (returning a result just like "return" does), the generator is paused, but the generator function's stack frame is retained (including all local variables), pending another call to the iterator's next() method. The iterator does not execute any of the generator's code until the first call to next().

Returning to this specific example, calling grep() returns an iterator. Each call to the iterator's next() method returns one match. This happens implicitly in a for loop:


     regex = r"\s+\w+\s+"
     for line in grep(sys.stdin, regex):
         sys.stdout.write(line)

The advantage of all this is that it's simple, clean, and efficient. You can write straightforward code--but it doesn't need to hog memory by creating an entire list before returning. Generators can be pipelined; imagine recreating other Unix utilities, such as uniq. Even if the source is a list, at least there's no need to create temporary lists.

The "from __future__" statement is needed because "yield" is a new keyword in Python 2.2 and accessing it must be done explicitly. When Python 2.3 is released, "from __future__ import generators" will no longer be required, but keeping it won't harm anything (allowing code to run under any version Python 2.2 or later).

For more information about converting Perl code to Python, see the Python/Perl Phrasebook . (The phrasebook is seven years old and out of date, but still quite useful.)

Special thanks to Cathy Mullican (menolly@spy.net) for refreshing my badly outdated Perl knowledge.

Aahz has been programming in Python for more than three years and enjoys teaching people how to use Python.

Copyright © 2009 O'Reilly Media, Inc.