ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Why I Stopped Coding and Why I'd Start Again

by Brian McConnell
01/18/2007

It's been about a year since I wrote a program. I've written snippets of code since then but have mostly focused on managing other developers. This is partly because, as a business owner, my spare time is scarce; I have many tasks to juggle. It's also because programming stopped being fun for me.

I have been fascinated with computers and telecom systems since I was a child, but in the past several years, coding has become more of a chore, in large part because the task of actually designing something useful is a small one compared with the much less interesting work that's now required.

My favorite language is Python. I am partial to it for several reasons, but mainly because Python source is easy to write and easy to read. I have always disliked C/C++ and derivatives languages such as Java because they are, for me anyway, a chore to work with. For whatever reason, I have a tendency not to see punctuation marks when I am writing. It's a trivial thing, but I would spend a half hour trying to track down a missing { in Java. The languages themselves make sense, but for me anyway, they just weren't much fun to work with. I'd spend more time debugging a program or trying to figure out some external library than I would working on whatever problem I was trying to solve.

I like Python particularly for its versatility. I could use it as a command-line tool in one instance, a standalone app in another, or as a web script in another. Other languages can do this, but it seemed especially easy to do this with Python. I've used it to build a lot of useful little apps, for example, a Spanish flashcard program that I use to memorize vocabulary. I also like it because it is as close to procedural code as you can get. Anytime I needed to build a small utility of some sort, it was a great choice, unless I wanted to do something like talk to a database, or distribute something for use on other peoples' computers.

When I started my most recent venture, I really wanted to use Python for most of our applications. It's easy to read, runs on everything, and because of that, is cheap to support. The problem is that to do anything useful, you invariably need to talk to a database, and that's where Python falls flat on its face. I am not picking Python out for abuse here; it's what I've worked with the most. And in any case, my critique of current programming tools applies across the board.

Python should have become the dominant language for web apps but instead has remained in the shadows of Perl and PHP (and now Ruby). Most systems I see outside of large organizations run on PHP, ASP (I know, yuck), and in the past year or so, Ruby. You can find services that run on Python, like Zoomr, but they are the exception.

One of the main reasons for this is that getting Python to talk to databases can be quite an annoyance. While it does provide a high-level DB API, that API in turn talks to external modules that are designed to work with specific databases such as MySQL. It sounds simple enough, but if you're supporting servers that are scattered all over, it's a pain in the ass to keep track of which versions of what have been installed on which boxes. I never understood why, in the early 21st century, we're still using proprietary wire protocols to send SQL queries to database servers. I wrote a piece about this, Really Simple Database Protocol, in 2005, but I digress.

My gripe applies to most development environments, as to one degree or another, they each force you to waste time on tasks unrelated to actual design and development. Some languages force you to deal with external libraries. Others force you to load bloated IDEs and ship gigundous distributable packages. Others force you to deal with the author's idiosyncratic ideas about how code should be as unreadable as humanly possible.

What I want from a programming language is very simple:

I'll admit that I am lazy. Tinkering with computers is fun when you're first learning about them. I remember how I'd happily spend hours trying to figure out how to get the drivers for a Dialogic phone card to work. Eventually, though, you reach a point where your curiosity about the machine gets replaced by a desire to do useful things with the machine, and that's where most development tools fail.

That's my gripe with programming today. I remember "back in the day" when you'd turn a computer on and it would start up in BASIC, as if to say "teach me to do something." Today's computers are really fancy, but if you want to write software, and especially if you want to do something really useful, prepare to spend a lot of time just getting your development tools laid out on the table.

Here's a domain where a company like Google could make a really useful contribution to the trade, not by creating an operating system, but by creating the web generation's answer to BASIC. I mention Google because the tech industry has a history of launching new languages about every ten years. The last big industry push to create a standard language was Java. Since then we've slid backward into a confusing muddle of acronyms and competing languages. Google could set a standard here, and dare I say it, by simply dusting the rough edges off Python and getting it baked into all sorts of devices. The company could make a lot of headway.

My vision is to take a mature language like Python and add just two new features to the interpreter. It is a simple change, but it will create a real web operating system--one that is simple to build upon yet infinitely extensible.

Why I'd Start Coding Again

I use Python and Google as the primary examples in this article, although this idea could apply to other languages and commercial services. I picked Python because it is easy to read, powerful, and extensible. I picked Google because it has turned into a Bell Labs for software development and has hired many leading open source developers. Google could make an important contribution to software development, much as Sun did by promoting Java and Microsoft did by promoting Visual Basic.

I first wrote about this idea in Dr. Dobbs Journal in 1998. That article, entitled "Concept Oriented Programming," proposed extending object-oriented programming languages with a DNS-like system that would store reusable code, and that would enable developers to write tiny programs that load libraries as needed from the cloud. My original article is somewhat dated now, so I decided to revisit it in this O'Reilly piece, and to show some specific examples of how a hyperlinked version of Python would look. While this article is not a detailed technical specification, it clearly shows how this can be implemented. (Shortly before publication, I learned of UrlImport, a Python module that implements part of what I describe in this article.)

In plain English, you could write very lightweight programs that would load most of their underlying modules on the fly. Such a program might look like this:

import /widgets/foo.py as=foo
import http://2_0_1.popupdialog.widgets.py.code.google.com as=popup crc=a6771234

x = foo.widget()
x.say("Hello World")

y = popup.widget()
y.say("Well Hello There")

What I've done in this imaginary program is simple. In the first import statement, I load a locally stored library, just like a normal import. In the second import statement, I load a library from a network repository. I instantiate objects from each of these libraries and tell them to print a "Hello World" message.

The details of how the networked import command works are less important than what it enables you to do. Properly implemented, it will mean that you can write tiny programs that will grab whatever underlying code they need at runtime. No more bloated distributable packages. There are no more install scripts. Just send your program and say "run mywidget.pyc, have fun!".

Baked into the Box

One of the goals of this manifesto, or whatever you want to call this, is to have a networked programming language baked into every box. The runtime environment will be built in, much as Python is preloaded on Mac OS X.

With most languages, the standard library becomes ever more bloated with each new release. With this technique, we can radically downsize the runtime environment. All we need is the interpreter (updated to recognize network imports and a couple of other functions). It can get everything else it needs from networked repositories.

The goal is to make the runtime environment so small that it can be a part of any computing device.

Batteries Not Needed

Python's "Batteries Included" philosophy enables developers to do quite a lot without venturing beyond the standard libraries. With a fully networked language, we no longer need to distribute a large standard library with the runtime environment or applications. Everything you need will be stored in the public network.

The distinction between the standard library and extras developed by other authors will also go away. With today's languages, it is usually best to stay with the standard libraries if at all possible, to avoid headaches when you share the application with other users, to install it on other machines, etc. Distribution and installation headaches can be a powerful disincentive to use third-party code.

With a networked language, there will be no build or install process. You'll just share an application and it will get what it needs on the fly.

DNS for Objects

While the networked import function could load from a simple HTTP file store or similar service (this will work fine), we could take this a step further to create a DNS namespace for code. This section is a bit of a sidebar to this article, so if you think this is too forward-looking and it's a distration to you, skip forward to the next section.

We could create a .code (or .turing) top-level domain. From there, we could create a DNS namespace for objects that works much like the namespace for Internet hosts. The second-level domains would be reserved for major language branches, for example, *.py.turing. Tertiary domains would go to anyone who wanted to host a code repository, while a std subdomain would be reserved for official releases and libraries. (For example, the official standard Python libraries would live in std.py.turing.)

Using this system would be as easy as including an import statement in a program, and might read as:

import http://2_0_1.doodad.widgetcorp.py.turing as=doodad crc=aa123456

The runtime engine, when it sees an import statement like this, will do a standard DNS lookup and go grab the file from the appropriate server, just as in the earlier example. The idea here is to create a persistent address space for reusable code.

This may or may not happen, and in any case, the import statement can pull modules from any domain. Commercial organizations will likely host early implementations of this, but it's good to know that we can move in this direction if the idea catches on.

Security

One of the primary concerns people raise about hyperlinked code is that a small program could call libraries that in turn call others that contain malicious instructions. If this system is built incorrectly, this is a real risk, but if it is done right, hyperlinked programs will actually be more secure than programs that are distributed as a single package.

While a detailed discussion of security is beyond the scope of this article, there are two measures we can use to make this a trustworthy system: a validation mechanism that enables a runtime environment to know that a module it fetched from a repository has the same checksum or certificate as the author intended, and a trusted code repository.

In the examples in this article, I imagine that the system uses a simple validation technique--a 32-bit checksum--to detect changes. An author would include the checksums in the program's main module. At runtime, a module swapped out in one of the code repositories would cause a hyperlinked import to fail because the checksum for the replaced module would be different. I know there are even more secure ways to do this, but I wanted to use a simple example.

The second and more important security measure is a trusted code repository. One of its jobs is to remove modules that are flagged as malicious, hopefully before they can inflict damage. The ability to disable modules will help to halt the spread of malicious code and to disable the parts of a program that are dangerous. Imagine, for example, a simple Trojan program that invokes an apparently harmless module that deletes files at a certain date. If this module is identified and reported before that time, the code repository can disable it and flag it as harmful. When D-Day arrives, the runtime environment either cannot retrieve this module or learns that it has been flagged and refuses to run it. Even if millions of people have downloaded this program, the harmful portion will be defunct. With a conventional program distributed as a single package, there isn't an efficient way to recall harmful components once they're in the wild.

One of the reasons I chose to use Google as an example in this article is that it is uniquely qualified to address these issues, having both the intellectual and technical resources required to build a system that is reliable, trusted, and also easy to deal with. The security issues surrounding hyperlinked code aren't trivial, but they are all solvable, and if this is done right, it will be a more efficient way to build and distribute software.

Dynamically Loaded Modules

Unless you need to talk to a low-level hardware API or do something especially CPU intensive, you can do quite a lot without leaving Python. At least, that's been my experience. Where possible, I always tried to write programs without going outside of the core libraries, mainly so that I could distribute the programs onto other computers without running into a rat's nest of configuration and admin issues.

With this system, it will be easy for developers to share code and for people to use shared code without creating a lot of version control and distribution headaches for themselves. Sharing a module will be as easy as uploading the module to a trusted repository and then referencing it in a program. This will look something like the following:

wellhithere.py

import http://2_0_1.wellhithire.widgets.py.code.google.com as wellhithere crc=aa712345

x = wellhithere.widget()
x.say("It's certainly nice to see you")

As you might have guessed, this fetches a copy of wellhithere.py (version 2.0.1) and refers to it locally as wellhithere. The program instantiates this object and tells it to say something. It's pretty basic stuff, with the twist that the import directive makes it easy to load external libraries on the fly. The CRC option allows the program to do checksum verification.

When executing this application, the runtime engine will use a procedure like this one:

This can be done in such a way that indicating a local or networked source becomes simple. An import path such as /widgets/helloworld indicates a local file path. A path such as http://helloworld.foocorp.com points to a network file store. There could be other syntax, of course, but this is one example of an easy way to do this.

Dynamically Loaded Binaries

Like many languages, Python allows you to talk to external binaries, typically written in C or C++, to talk to hardware APIs, or to optimize performance for especially CPU intensive tasks. The issue with these, of course, is that it is difficult and many times impossible to make this type of code portable because the API the C program talks to (a telecom interface, for example) does not exist on many platforms.

For this type of module, how about a slightly different syntax to tell the runtime environment to load an external binary? This also requires changes to the runtime environment, which I'll discuss later. In our hypothetical language, we'd write:

import http://1_0_9.fastfft.widgets.code.foocorp.com type=bin runonce=n as=fastfft

x = fastfft()
fft = x.transform(some_pcm_audio)

This short example adds two parameters to the import statement, type=bin and runonce=y/n. This tells the runtime engine to fetch and launch a compiled binary that will process data from the application, not unlike a DLL. The runonce parameter tells the runtime engine whether it can launch multiple instances of the binary or only one.

The runtime interpreter hides the messiness of this process from the Python application and does something like this:

  1. Try to fetch a copy of the binary for the target platform; throw an exception if a problem occurs (for example, file not found, CRC error, etc.).
  2. Launch an instance of the binary and tell it to talk to the runtime environment via interprocess communication (for example, a localhost TCP socket).
  3. Pass data to/from the Python app via the interprocess communication interface.

This is a simple and fairly clean way to make it easy for Python apps to use external binaries. This itself is not news. A voice-scripting language I used years ago did something similar to this, minus the dynamic load and binding trick. The goal here is to make it easy to talk to binaries, but do so in a way that does not require the user to run a build command prior to running the application.

Using the example above, I want to use a C program that does a fast Fourier transform on a segment of audio data. This is a computationally intensive task, so it'll be a lot faster in compiled C than in an interpreted language such as Python. In this framework, the runtime engine launches an instance of fastfft.exe in the background and tells fastfft.exe to talk to it at localhost:nnnn.

From my perspective, I am just talking to something that looks like any other object. When I invoke a method, the runtime engine sends a message to the external application via IPC, does its thing, and sends a response back, which gets returned to my application. Simple.

Again, the details of how this works behind the scenes are not so important. I use TCP via localhost as an example, mainly because any networked appliance will, by definition, be able to talk via a localhost socket. In a real-world version of this, C and C++ developers will have a wrapper library that provides a simple interface in and out of their programs. The key requirement here is to eliminate the need to preload external modules, without the need for a build operation prior to running the program. It's worth losing a little bit of performance to gain this flexibility.

Built-In Version Control

You may have noticed that this system has a built-in form of version control. Code repositories must explicitly create a separate path for every version of a module. Likewise, developers would be required, or at least strongly encouraged to explicitly refer to a unique version at runtime.

Bootstrapping to a Web OS

This is a simple extension of a proven language. At the very least, it will make building and sharing applications easier, though it has potential beyond that.

Imagine a minimal machine that has the basic guts you need in a computer. Storage, I/O, and a network. It has a lightweight, built-in interpreter and is designed to run only Python apps. In effect, it's a Python computer and operating system.

When you first take this computer home, it boots up with a picture of a snake and a command line. You type the name of a program you want to run. Any program. Maybe you want to run an MP3 player, so you type "PyMP3".

Your computer is brand new, and you don't have that program yet, so behind the scenes it tells itself to:

import http://pymp3.code.widgetcorp.com

It then runs this program, and because it probably contains import statements of its own, automatically works through the underlying packages it needs to run. This all happens in the background, and after a short while, the program runs, as if it had been on your machine all along.

While many companies could do this, Google is in a unique position to make a system like this a reality. With its data center infrastructure and world-class software engineers, it can easily fund a project to make the necessary modifications to the Python interpreter (as well as other languages if desired) and to operate a trusted code repository--the two key ingredients required to build this system.

With the right sponsorship, a system like this could lead to major changes in the way developers build and share software. It could do so at a surprisingly modest cost, because this system requires only straightforward modifications to existing and widely used programming tools. Just as hyperlinking was a straightforward enhancement to document markup languages that enabled the development of the World Wide Web, hyperlinked source code will bring similar benefits to software development.

Brian McConnell is an inventor, author, and serial telecom entrepreneur. He has founded three telecom startups since moving to California. The most recent, Open Communication Systems, designs cutting-edge telecom applications based on open standards telephony technology.


Return to the Python DevCenter.

Copyright © 2009 O'Reilly Media, Inc.