15 Collections and Looping: Lists and for
A list, as its name implies, is a list of data (integers, floats, strings, Booleans, or even other lists or more complicated data types). Python lists are similar to arrays or vectors in other languages. Like letters in strings, elements of a list are indexed starting at
 syntax. We can also use brackets to create a list with a few elements of different types, though in practice we won’t do this often.
Just like with strings, we can use
 notation to get a sublist “slice,” and we can use the
len() function to get the length of a list.
Unlike strings, though, lists are mutable, meaning we can modify them after they’ve been created, for example, by replacing an element with another element. As mentioned above, lists can even contain other lists!
We will typically want our code to create an empty list, and then add data elements to it one element at a time. An empty list is returned by calling the
list() function with no parameters. Given a variable which references a list object, we can append an element to the end using the
.append() method, giving the method the element we want to append as a parameter.
This syntax might seem a bit odd compared to what we’ve seen so far. Here
new_list.append("G") is telling the list object the
new_list variable refers to to run its
.append() method, taking as a parameter the string
"G". We’ll explore the concepts of objects and methods more formally in later chapters. For now, consider the list not just a collection of data, but a “smart” object with which we can interact using
Note that the
.append() method asks the list to modify itself (which it can do, because lists are mutable), but this operation doesn’t return anything of use.
This type of command opens up the possibility for some insidious bugs; for example, a line like
new_list = new_list.append("C") looks innocent enough and causes no immediate error, but it is probably not what the programmer intended. The reason is that the
new_list.append("C") call successfully asks the list to modify itself, but then the
None value is returned, which would be assigned to the
new_list variable with the assignment. At the end of the line,
new_list will refer to
None, and the list itself will no longer be accessible. (In fact, it will be garbage collected in due time.) In short, use
some_list = some_list.append(el).
We often want to sort lists, which we can do in two ways. First, we could use the
sorted() function, which takes a list as a parameter and returns a new copy of the list in sorted order, leaving the original alone. Alternatively, we could call a lists
.sort() method to ask a list to sort itself in place.
As with the
.append() method above, the
.sort() method returns
None, so the following would almost surely have resulted in a bug:
a_list = a_list.sort().
At this point, one would be forgiven for thinking that
. methods always return
None and so assignment based on the results isn’t useful. But before we move on from lists, let’s introduce a simple way to split a string up into a list of substrings, using the
.split() method on a string data type. For example, let’s split up a string wherever the subsequence
If the sequence was instead
"CGCGTATACAGA", the resulting list would have contained
["CGCG", "", "CAGA"] (that is, one of the elements would be a zero-length empty string). This example illustrates that strings, like lists, are also “smart” objects with which we can interact using
. methods. (In fact, so are integers, floats, and all other Python types that we’ll cover.)
Tuples (Immutable Lists)
As noted above, lists are mutable, meaning they can be altered after their creation. In some special cases, it is helpful to create an immutable version of a list, called a “tuple” in Python. Like lists, tuples can be created in two ways: with the
tuple() function (which returns an empty tuple) or directly.
Tuples work much like lists—we can call
len() on them and extract elements or slices with
 syntax. We can’t change, remove, or insert elements.
Looping with for
A for-loop in Python executes a block of code, once for each element of an iterable data type: one which can be accessed one element at a time, in order. As it turns out, both strings and lists are such iterable types in Python, though for now we’ll explore only iterating over lists with for-loops.
A block is a set of lines of code that are grouped as a unit; in many cases they are executed as a unit as well, perhaps more than one time. Blocks in Python are indicated by being indented an additional level (usually with four spaces—remember to be consistent with this indentation practice).
When using a for-loop to iterate over a list, we need to specify a variable name that will reference each element of the list in turn.
In the above, one line is indented an additional level just below the line defining the for-loop. In the for-loop, the
gene_id variable is set to reference each element of the
gene_ids list in turn. Here’s the output of the loop:
Using for-loops in Python often confuses beginners, because a variable (e.g.,
gene_id) is being assigned without using the standard
= assignment operator. If it helps, you can think of the first loop through the block as executing
gene_id = gene_ids, the next time around as executing
gene_id = gene_ids, and so on, until all elements of
gene_ids have been used.
Blocks may contain multiple lines (including blank lines) so that multiple lines of code can work together. Here’s a modified loop that keeps a
counter variable, incrementing it by one each time.
The output of this loop would be the same as the output above, with an additional line printing
3 (the contents of
counter after the loop ends).
Some common errors when using block structures in Python include the following, many of which will result in an
- Not using the same number of spaces for each indentation level, or mixing tab indentation with multiple-space indentation. (Most Python programmers prefer using four spaces per level.)
- Forgetting the colon
:that ends the line before the block.
- Using something like a for-loop line that requires a block, but not indenting the next line.
- Needlessly indenting (creating a block) without a corresponding for-loop definition line.
We often want to loop over a range of integers. Conveniently, the
range() function returns a list of numbers. It commonly takes two parameters: (1) the starting integer (inclusive) and (2) the ending integer (exclusive). Thus we could program our for-loop slightly differently by generating a list of integers to use as indices, and iterating over that:
The output of one of the loops above:
The second example above illustrates the rationale behind the inclusive/exclusive nature of the
range() function: because indices start at zero and go to one less than the length of the list, we can use
range(0, len(ids)) (as opposed to needing to modify the ending index) to properly iterate over the indices of
ids without first knowing the length of the list. Seasoned programmers generally find this intuitive, but those who are not used to counting from zero may need some practice. You should study these examples of looping carefully, and try them out. These concepts are often more difficult for beginners, but they are important to learn.
Loops and the blocks they control can be nested, to powerful effect:
In the above, the outer for-loop controls a block of five lines; contained within is the inner for-loop controlling a block of only two lines. The outer block is principally concerned with the variable
i, while the inner block is principally concerned with the variable
j. We see that both blocks also make use of variables defined outside them; the inner block makes use of
j, while lines specific to the outer block make use of
i (but not
j). This is a common pattern we’ll be seeing more often. Can you determine the value of
total at the end without running the code?
Python and a few other languages include specialized shorthand syntax for creating lists from other lists known as list comprehensions. Effectively, this shorthand combines a for-loop syntax and list-creation syntax into a single line.
Here’s a quick example: starting with a list of numbers
[1, 2, 3, 4, 5, 6], we generate a list of squares (
[1, 4, 9, 16, 25, 36]):
Here we’re using a naming convention of
num in nums, but like a for-loop, the looping variable can be named almost anything; for example,
squares = [x ** 2 for x in nums] would accomplish the same task.
List comprehensions can be quite flexible and used in creative ways. Given a list of sequences, we can easily generate a list of lengths.
These structures support “conditional inclusion” as well, though we haven’t yet covered operators like
The next example generates a list of
1s for each element where the first base is
"T", and then uses the
sum() function to sum up the list, resulting in a count of sequences beginning with
Although many Python programmers often use list comprehensions, we won’t use them much in this book. Partially, this is because they are a feature that many programming languages don’t have, but also because they can become difficult to read and understand owing to their compactness. As an example, what do you think the following comprehension does?
[x for x in range(2, n) if x not in [j for i in range(2, sqrtn) for j in range (i*2, n, i)]] (Suppose
n = 100 and
sqrtn = 10. This example also makes use of the fact that
range() can take a step argument, as in
range(start, stop, step). )
- What is the value of
totalat the end of each loop set below? First, see if you can compute the answers by hand, and then write and execute the code with some added
print()statements to check your answers.
- Suppose we say the first for-loop block above has “depth” 1 and “width” 4, and the second has depth 2 and width 4, and the third has depth 3 and width 4. Can you define an equation that indicates what the
totalwould be for a nested for-loop block with depth d and width w? How does this equation relate to the number of times the interpreter has to execute the line
total = total + 1?
- Determine an equation that relates the final value of
totalbelow to the value of
- Given a declaration of a sequence string, like
seq = "ATGATAGAGGGATACGGGATAG", and a subsequence of interest, like
subseq = "GATA", write some code that prints all of the locations of that substring in the sequence, one per line, using only the Python concepts we’ve covered so far (such as
len(), for-loops, and
.split()). For the above example, the output should be
Your code should still work if the substring occurs at the start or end of the sequence, or if the subsequence occurs back to back (e.g., in
"GATA"occurs at positions 1, 7, and 11). As a hint, you may assume the subsequence is not self-overlapping (e.g., you needn’t worry about locating
"GAGAGAGAGA", which would occur at positions 1, 3, 5, and 7).
- Suppose we have a matrix represented as a list of columns:
cols = [[10, 20, 30, 40], [5, 6, 7, 8], [0.9, 0.10, 0.11, 0.12]]. Because each column is an internal list, this arrangement is said to be in “column-major order.” Write some code that produces the same data in “row-major order”; for example,
[[10, 5, 0.9], [20, 6, 0.10], [30, 7, 0.11], [40, 8, 0.12]]. You can assume that all columns have the same number of elements and that the matrix is at least 2 by 2.
This problem is a bit tricky, but it will help you organize your thoughts around loops and lists. You might start by first determining the number of rows in the data, and then building the “structure” of
rowsas a list of empty lists.
- It returns a special data type known as
None, which allows for a variable to exist but not reference any data. (Technically,
Noneis a type of data, albeit a very simple one.)
Nonecan be used as a type of placeholder, and so in some situations isn’t entirely useless. ↵
- Tuples are a cause of one of the more confusing parts of Python, because they are created by enclosing a list of elements inside of parentheses, but function calls also take parameters listed inside of parentheses, and mathematical expressions are grouped by parentheses, too! Consider the expression
(4 + 3) * 2. Is
(4 + 3)an integer, or a single-element tuple? Actually, it’s an integer. By default, Python looks for a comma to determine whether a tuple should be created, so
(4 + 3)is an integer, while
(4 + 3, 8)is a two-element tuple and
(4 + 3,)is a single-element tuple. Use parentheses deliberately in Python: either to group mathematical expressions, create tuples, or call functions—where the function name and opening parenthesis are neighboring, as in
print (a). Needlessly adding parentheses (and thereby accidentally creating tuples) has been the cause of some difficult-to-find bugs. ↵
- An alternative to
range()in Python 2.7 is
xrange(), which produces an iterable type that works much like a list of numbers but is more memory efficient. In more recent versions of Python (3.0 and above) the
range()function works like
xrange()has been removed. Programmers using Python 2.7 may wish to use
xrange()for efficiency, but we’ll stick with
range()so that our code works with the widest variety of Python versions, even if it sacrifices efficiency in some cases. There is one important difference between
xrange()in Python 2.7:
range()returns a list, while
xrange()returns an iterable type that lacks some features of true lists. For example,
nums = range(0, 4)followed by
nums = 1000would result in
[0, 1, 2, 1000], while
nums = xrange(0, 4)followed by
nums = 1000would produce an error. ↵