Python DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Beginning Python for Bioinformatics
Pages: 1, 2, 3, 4, 5

Python Lists

Where Python strings are limited to characters, Python lists have no limitations. Python lists are ordered sequences of arbitrary Python objects, including other lists. In addition, you can insert, delete and replace elements in a list. Lists are written as a series of objects, separated by commas, inside of square brackets. Let's look at some lists, and some operations you can perform on lists.



>>> bases = ['A', 'C', 'G', 'T'] 
>>> bases 
['A', 'C', 'G', 'T'] 
>>> bases.append('U') 
>>> bases 
['A', 'C', 'G', 'T', 'U'] 
>>> bases.reverse() 
>>> bases 
['U', 'T', 'G', 'C', 'A'] 
>>> bases[0] 
'U' 
>>> bases[1] 
'T' 
>>> bases.remove('U') 
>>> bases 
['T', 'G', 'C', 'A'] 
>>> bases.sort() 
>>> bases 
['A', 'C', 'G', 'T']

In this example we created a list of single characters that we called bases. Then we added an element to the end, reversed the order of all the elements, retrieved elements by their index position, removed an element with the value 'U', and sorted the elements. Removing an element from a list illustrates a situation where we need to supply the remove() method with an additional piece of information, namely the value that we want to remove from the list. As you can see in the picture below, PyCrust takes advantage of Python's ability to let us know what is required for most operations by displaying that information in a call tip pop-up window.


A tooltip showing usage of the 'remove' method.

We've talked about objects having methods, such as the remove() method of a list object, and how a method performs a task and, perhaps, returns a result. Python has another very similar feature, called a function. About the only difference between a function and a method is that a function isn't associated with a particular object.

Note: Whether something should be defined as a function or a method is, in part, a design choice. In fact, we're going to create several functions below and then re-define them as methods as a way of demonstrating Python's support for object-oriented programming.

Python Functions

Functions perform an operation on one or more values and return a result. Python comes with many pre-defined functions, as well as the ability to define your own functions. Let's look at a couple of the built-in functions: len() returns the number of items in a sequence; dir() returns a list of strings representing the attributes of an object; list() returns a new list initialized from some other sequence.

>>> dna = 'CTGACCACTTTACGAGGTTAGC' 
>>> bases = ['A', 'C', 'G', 'T'] 
>>> len(dna) 
22 
>>> len(bases) 
4 
>>> dir(dna) 
['__add__', '__class__', '__contains__', '__delattr__',  
'__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',  
'__getslice__', '__gt__', '__hash__', '__init__', '__le__',  
'__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__',  
'__repr__', '__rmul__', '__setattr__', '__str__', 'capitalize',  
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs',  
'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower',  
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',  
'lstrip', 'replace', 'rfind', 'rindex', 'rjust', 'rstrip', 'split',  
'splitlines', 'startswith', 'strip', 'swapcase', 'title',  
'translate', 'upper'] 
>>> dir(bases) 
['__add__', '__class__', '__contains__', '__delattr__',  
'__delitem__', '__delslice__', '__doc__', '__eq__', '__ge__',  
'__getattribute__', '__getitem__', '__getslice__', '__gt__',  
'__hash__', '__iadd__', '__imul__', '__init__', '__le__', '__len__',  
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__repr__',  
'__rmul__', '__setattr__', '__setitem__', '__setslice__', '__str__',  
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',  
'reverse', 'sort'] 
>>> list(dna) 
['C', 'T', 'G', 'A', 'C', 'C', 'A', 'C', 'T', 'T', 'T',  
'A', 'C', 'G', 'A', 'G', 'G', 'T', 'T', 'A', 'G', 'C'] 

Next, we're going to define some functions of our own that will perform useful operations on biological sequence data.

User-defined Functions

Here is the process for creating your own function in Python. The first line begins with the keyword def, is followed by the name of the function and any arguments (expected input values) surrounded by parentheses, and ends with a colon. Subsequent lines make up the body of the function and must be indented. If a string comment appears in the first line of the body, it becomes part of the documentation for the function. The last line of a function returns a result.

Let's define some functions in the PyCrust shell. Then we can try each function with some sample data and see the result returned by the function.

>>> def transcribe(dna): 
...     """Return dna string as rna string.""" 
...     return dna.replace('T', 'U') 
...      
>>> transcribe('CCGGAAGAGCTTACTTAG') 
'CCGGAAGAGCUUACUUAG' 

In this example we created a function, called transcribe that expects a string representing a DNA sequence. Strings have a replace() method that will return a copy of the original string with each occurence of one character replaced by another. In three lines of code we've given ourselves a consistent way to transcribe a string of DNA into RNA. Let's create another function. How about reverse?

>>> def reverse(s): 
...     """Return the sequence string in reverse order.""" 
...     letters = list(s) 
...     letters.reverse() 
...     return ''.join(letters) 
...      
>>> reverse('CCGGAAGAGCTTACTTAG') 
'GATTCATTCGAGAAGGCC' 

There are a few new things in this function that need explanation. First, we've used an argument name of "s" instead of "dna". You can name your arguments whatever you like in Python. It is something of a convention to use short names based on their expected value or meaning. So "s" for string is fairly common in Python code. The other reason to use "s" instead of "dna" in this example is that this function works correctly on any string, not just strings representing dna sequences. So "s" is a better reflection of the generic utility of this function than "dna".

You can see that the reverse function takes in a string, creates a list based on the string, and reverses the order of the list. Now we need to put the list back together as a string so we can return a string. Python string objects have a join() method that joins together a list into a string, separating each list element by a string value. Since we do not want any character as a separator, we use the join() method on an empty string, represented by two quotes ('' or "").

In order to calculate the complement of a DNA sequence, we need a way to map each of the four bases to its complement. For that, we'll use another Python sequence structure called a dictionary.

Pages: 1, 2, 3, 4, 5

Next Pagearrow





Sponsored by: