The phrase “control flow” refers to the fact that constructs like for-loops change the flow of program execution away from the simple top-to-bottom order. There are several other types of control flow we will cover, two of which are “conditional” in nature.
If-statements allow us to conditionally execute a block of code, depending on a variable referencing a Boolean
False, or more commonly a condition that returns a Boolean
False. The syntax is fairly simple, described here with an example.
All the lines from the starting
if to the last line in an
else: block are part of the same logical construct. Such a construct must have exactly one
if conditional block, may have one or more
elif blocks (they are optional), and may have exactly one catchall
else block at the end (also optional). Each conditional is evaluated in order: the first one that evaluates to
True will run, and the rest will be skipped. If an
else block is present, it will run if none of the earlier
elif blocks did as a “last resort.”
Just like with for-loops, if-statements can be nested inside of other blocks, and other blocks can occur inside if-statement blocks. Also just like for-loops, Python uses indentation (standard practice is four spaces per indentation level) to indicate block structure, so you will get an error if you needlessly indent (without a corresponding control flow line like
else) or forget to indent when an indentation is expected.
The above code would print
Number short: 2 number long: 2.
While-loops are less often used (depending on the nature of the programming being done), but they can be invaluable in certain situations and are a basic part of most programming languages. A while-loop executes a block of code so long as a condition remains
True. Note that if the condition never becomes
False, the block will execute over and over in an “infinite loop.” If the condition is
False to begin with, however, the block is skipped entirely.
The above will print
Counter is now: 0, followed by
Counter is now: 1,
Counter is now: 2,
Counter is now: 3, and finally
Done. Counter ends with: 4. As with using a for-loop over a range of integers, we can also use a while-loop to access specific indices within a string or list.
The above code will print
base is: A, then
base is: C, and so on, ending with
base is: T before finally printing
Done. While-loops can thus be used as a type of fine-grained for-loop, to iterate over elements of a string (or list), in turn using simple integer indexes and
 syntax. While the above example adds
base_index on each iteration, it could just as easily add some other number. Adding
3 would cause it to print every third base, for example.
Boolean Operators and Connectives
We’ve already seen one type of Boolean comparison,
<, which returns whether the value of its left-hand side is less than the value of its right-hand side. There are a number of others:
||less than or equal to?||
||greater than or equal to?||
||not equal to?||
These comparisons work for floats, integers, and even strings and lists. Sorting on strings and lists is done in lexicographic order: an ordering wherein item A is less than item B if the first element of A is less than the first element of B; in the case of a tie, the second element is considered, and so on. If in this process we run out of elements for comparison, that shorter one is smaller. When applied to strings, lexicographic order corresponds to the familiar alphabetical order.
Let’s print the sorted version of a Python list of strings, which does its sorting using the comparisons above. Note that numeric digits are considered to be “less than” alphabetic characters, and uppercase letters come before lowercase letters.
Boolean connectives let us combine conditionals that return
False into more complex statements that also return Boolean types.
These can be grouped with parentheses, and usually should be to avoid confusion, especially when more than one test follow a
Finally, note that generally each side of an
or should result in only
False. The expression
a == 3 or a == 7 has the correct form, whereas
a == 3 or 7 does not. (In fact,
7 in the latter context will be taken to mean
True, and so
a == 3 or 7 will always result in
Notice the similarity between
==, and yet they have dramatically different meanings: the former is the variable assignment operator, while the latter is an equality test. Accidentally using one where the other is meant is an easy way to produce erroneous code. Here
count == 1 won’t initialize
1; rather, it will return whether it already is
1 (or result in an error if
count doesn’t exist as a variable at that point). The reverse mistake is harder to make, as Python does not allow variable assignment in if-statement and while-loop definitions.
In the above, the intent is to determine whether the length of
seq is a multiple of 3 (as determined by the result of
len(seq)%3 using the modulus operator), but the if-statement in this case should actually be
if remainder == 0:. In many languages, the above would be a difficult-to-find bug (
remainder would be assigned to
0, and the result would be
True anyway!). In Python, the result is an error:
SyntaxError: invalid syntax.
Still, a certain class of dangerous comparison is common to nearly every language, Python included: the comparison of two float types for equality or inequality.
Although integers can be represented exactly in binary arithmetic (e.g.,
751 in binary is represented exactly as
1011101111), floating-point numbers can only be represented approximately. This shouldn’t be an entirely unfamiliar concept; for example, we might decide to round fractions to four decimal places when doing calculations on pencil and paper, working with 1/3 as 0.3333. The trouble is that these rounding errors can compound in difficult-to-predict ways. If we decide to compute (1/3)*(1/3)/(1/3) as 0.3333*0.3333/0.3333, working left to right we’d start with 0.3333*0.3333 rounded to four digits as 0.1110. This is then divided by 0.3333 and rounded again to produce an answer of 0.3330. So, even though we know that (1/3)*(1/3)/(1/3) == 1/3, our calculation process would call them unequal because it ultimately tests 0.3330 against 0.3333!
Modern computers have many more digits of precision (about 15 decimal digits at a minimum, in most cases), but the problem remains the same. Worse, numbers that don’t need rounding in our Base-10 arithmetic system do require rounding in the computer’s Base-2 system. Consider 0.2, which in binary is 0.001100110011, and so on. Indeed,
0.2 * 0.2 / 0.2 == 0.2 results in
While comparing floats with
>= is usually safe (within extremely small margins of error), comparison of floats with
!= usually indicates a misunderstanding of how floating-point numbers work. In practice, we’d determine if two floating-point values are sufficiently similar, within some defined margin of error.
Counting Stop Codons
As an example of using conditional control flow, we’ll consider the file
seq.txt, which contains a single DNA string on the first line. We wish to count the number of potential stop codons
"TGA" that occur in the sequence (on the forward strand only, for this example).
Our strategy will be as follows: First, we’ll need to open the file and read the sequence from the first line. We’ll need to keep a counter of the number of stop codons that we see; this counter will start at zero and we’ll add one to it for each
"TGA" subsequence we see. To find these three possibilities, we can use a for-loop and string slicing to inspect every 3bp subsequence of the sequence; the 3bp sequence at index
seq occurs at
seq[0:3], the one at position
1 occurs at
seq[1:4], and so on.
We must be careful not to attempt to read a subsequence that doesn’t occur in the sequence. If
seq = "AGAGAT", there are only four possible 3bp sequences, and attempting to select the one starting at index 4,
seq[4:7], would result in an error. To make matters worse, string indexing starts at
0, and there are also the peculiarities of the inclusive/exclusive nature of
 slicing and the
To help out, let’s draw a picture of an example sequence, with various indices and 3bp subsequences we’d like to look at annotated.
Given a starting index
index, the 3bp subsequence is defined as
seq[index:index + 3]. For the sequence above,
len(seq) would return
15. The first start index we are interested in is
0, while the last start index we want to include is
len(seq) - 3. If we were to use the
range() function to return a list of start sequences we are interested in, we would use
range(0, len(seq) - 3 + 1), where the
+ 1 accounts for the fact that
range() includes the first index, but is exclusive in the last index.
We should also remember to run
.strip() on the read sequence, as we don’t want the inclusion of any
\n newline characters messing up the correct computation of the sequence length!
Notice in the code below (which can be found in the file
stop_count_seq.py) the commented-out line
While coding, we used this line to print each codon to be sure that 3bp subsequences were reliably being considered, especially the first and last in
AAT). This is an important part of the debugging process because it is easy to make small “off-by-one” errors with this type of code. When satisfied with the solution, we simply commented out the print statement.
For windowing tasks like this, it can occasionally be easier to access the indices with a while-loop.
If we wished to access nonoverlapping codons, we could use
index = index + 3 rather than
index = index + 1 without any other changes to the code. Similarly, if we wished to inspect 5bp windows, we could replace instances of
5 (or use a
- The molecular weight of a single-stranded DNA string (in g/mol) is (count of
"A")*313.21 + (count of
"T")*304.2 + (count of
"C")*289.18 + (count of
"G")*329.21 – 61.96 (to account for removal of one phosphate and the addition of a hydroxyl on the single strand).
Write code that prints the total molecular weight for the sequence in the file
seq.txt. The result should be
21483.8. Call your program
- The file
seqs.txtcontains a number of sequences, one sequence per line. Write a new Python program that prints the molecular weight of each on a new line. For example:
You may wish to use substantial parts of the answer for question 1 inside of a loop of some kind. Call your program
- The file
ids_seqs.txtcontains the same sequences as
seqs.txt; however, this file also contains sequence IDs, with one ID per line followed by a tab character (
\t) followed by the sequence. Modify your program from question 2 to print the same output, in a similar format: one ID per line, followed by a tab character, followed by the molecular weight. The output format should thus look like so (but the numbers will be different, to avoid giving away the answer):
Call your program
Because the tab characters cause the output to align differently depending on the length of the ID string, you may wish to run the output through the command line tool
-toption, which automatically formats tab-separated input.
- Create a modified version of the program in question 3 of chapter 15, “Collections and Looping, Part 1: Lists and
for,” so that it also identifies the locations of subsequences that are self-overlapping. For example,
"at positions 1, 3, 5, 7, and 14.
- Python is one of the only languages that require blocks to be delineated by indentation. Most other languages use pairs of curly brackets to delineate blocks. Many programmers feel this removes too much creativity in formatting of Python code, while others like that it enforces a common method for visually distinguishing blocks. ↵
- In the absence of parentheses for grouping,
andtakes precedence over
or, much like multiplication takes precedence over addition in numeric algebra. Boolean logic, by the way, is named after the nineteenth-century mathematician and philosopher George Boole, who first formally described the “logical algebra” of combining truth values with connectives like “and,” “or,” and “not.” ↵
- You can see this for yourself:
print(0.2*0.2/0.2 == 0.2)prints
False! Some mathematically oriented languages are able to work entirely symbolically with such equations, bypassing the need to work with numbers at all. This requires a sophisticated parsing engine but enables such languages to evaluate even generic expressions like
x*x/x == xas true. ↵
- Yes, this sort of detailed logical thinking can be tedious, but it becomes easier with practice. Drawing pictures and considering small examples is also invaluable when working on programming problems, so keep a pencil and piece of paper handy. ↵