# Appendix B: Probability

## B.0.1 Probability Distributions

##### Probability Mass Function

A probability mass function for a discrete random variable $k$ is defined such that $P(k)$ is the probability of the variable taking the value $k$. These distributions are normalized, meaning summing over all possible values will equal $1$. In other words:

$\sum_k P(k) = 1$

##### Probability Density Functions

For continuous random variables $x$, a Probability Density Function gives the value $f(x)$ corresponding to the likelihood of the random variable taking on that value, and probabilities are computed from

$P(a \le x \le b) = \int_a^b f(x)dx$

Integrating over all values of $x$ will equal $1$.

$\int_{-\infty}^{\infty} f(x)dx = 1$

## B.0.2 Conditional Probability

In probability, we often want to talk about conditional probability, which gives us the probability of an event given that we know that another event has occurred.

$P(A|B) = \frac{P(A \cap B)}{P(B)}$

Bayes’ Theorem tells us that:

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

We can also expand a probability of an event into the conditional probabilities of each other condition, times the probability of these other events.

$P(x) = \sum_{i=1}^N P(x|A_i)P(A_i)$

This expression $P(x)$ is often called the evidence. We can express the probability of $A$ given $x$, which is called the posterior probability by

$P(A|x) = \frac{P(x|A)P(A)}{P(x)}$

Also in this equation, $P(x|A)$ is called the likelihood, and $P(A)$ is called the prior.

## B.0.3 Bernouli Distribution

One of the simplest probability distributions we should consider is the Bernoulli Distribution. It is essentially a single event, with a probability of success denoted $p$, and a probability of failure of $q=1-p$.

Example 1: A coin toss. Heads or tails? Typically for a fair coin, this will be such that $p=q=0.5$.

Example 2: Selecting a single nucleotide from the genome at random. Is it purine (A or G) or is it pyrimidine (C or T)? For the human genome, the GC content is about $0.417$, but it varies from chromosome to chromosome.

## B.0.4 Binomial Distribution

Central to the Binomial distribution is the binomial coefficient ${n \choose k} = \frac{n!}{(n-k)!k!}$ The number of ways to have $k$ successes out of $n$ trials is ${n \choose k}$. The probability of observing $k$ successes out of $n$ trials is $P(k|p,n) = {n \choose k} p^k q^{n-k}$. Since $p+q=1$, and $1$ raised to the $n$th power is still $1$, we can see that these probabilities are normalized (sum to one) by the equation:

$1 = (p+q)^n = \sum_{k=0}^n {n \choose k} p^k q^{n-k}$

Example 1: Consider tossing a fair coin 100 times, and repeating this for 1000 trials. A histogram for $k$, number of heads in $n$ tosses is shown in Figure B.1. The expected value of $k$ is $E[k] = np$, and the variance is $ar(k) = npq$.

## B.0.5 Multinomial Distribution

The multinomial distribution deals which situations when there are more than two possible outcomes (for example, DNA nucleotides). The multinomial coefficient, $M(\vec{n})$, in this example describes the number of ways to have a sequence of length $n$, with the number of occurrences of A,C,G,and T to be $n_A, n_C, n_G,$ and $n_T$ respectively.

$M(\vec{n}) = \frac{n!}{n_A! n_C! n_G! n_T!}$

The probability of a particular set of observed counts $\vec{n} = (n_A, n_C, n_G,n_T)$ depends on the frequencies $\vec{p} = (p_A, p_C, p_G,p_T)$ by the expression:

$P(\vec{n}|\vec{p}) = \frac{n!}{n_A! n_C! n_G! n_T!} \prod_{i=A}^T p_i^{n_i}$

## B.0.6 Poisson Distribution

The Poisson Distribution describes the number of observed events given an expected number of occurrences $\lambda$. Consider, for example, the probability of a red car driving past a particular street corner in your neighborhood. Its probability mass function is given by:

$P(k|\lambda) = \frac{\lambda^k e^{-\lambda}}{k!}$

It has one parameter $\lambda$, which is also the expected value of $k$ and the variance ($E[k] = var(k) = \lambda$).
It can be shown that for fixed $\lambda = np$, and for $n \rightarrow \infty$, the Binomial distribution is equivalent to the Poisson distribution.

In many applications, the $\lambda$ parameter for the Poisson distribution is treated as a rate. In this case, the expected value of $k$ is $E[k] = \lambda \ell$, so that $\lambda$ describes the rate of occurrence of the event per unit time (or per unit distance, or whatever the application is). The probability distribution is given by:

$P(k,\ell|\lambda) = \frac{(\lambda \ell)^k e^{-\lambda \ell}}{k!}$

Example: Consider modeling the number of mutations $k$ observed in a length of DNA $\ell$.
Consider the important case when $k=0$ for the Poisson Distribution.

$P(k=0,\ell|\lambda) = e^{-\lambda \ell}$

The probability that $k$ is not zero, is therefore:

$P(k \leq 0,\ell|\lambda) = 1 - e^{-\lambda \ell}$

(B.1)

Now consider $\ell$ being replaced by a random variable $x$.

## B.0.7 Exponential Distribution

The Exponential distribution is related to the Poisson by equation $B.1$. It is the probability of an event occurring in a length $x$. The cumulative distribution function $F(x)$ is the same as this last equation:

$F(x|\lambda) = 1 - e^{- \lambda x}$

And the probability density function is the derivative of the cumulative distribution function:

$P(x|\lambda) = \lambda e^{- \lambda x}$

Note that whereas the Poisson distribution is a discrete distribution, the Exponential distribution is continuous and $x$ can take on any real value such that $x \ge 0$.

## B.0.8 Normal Distribution

A very important distribution for dealing with data, the normal distribution models many natural processes. We already saw that the binomial distribution looks like the normal distribution for large $n$. Indeed, the Central Limit Theorem, the sum of random variables approaches the normal distribution for large $n$. Here is the p.d.f.:

$P(x|\mu,\sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(x - \mu)^2}{2 \sigma^2}}$

## B.0.9 Extreme Value Distribution

The maximum value in a set of random variables $X_1, ... , X_N$ is also a random variable. This variable is described by the Extreme Value Distribution, also known as the Gumbel Distribution.

$F(x|\mu,\sigma,\xi) = \exp \left\{ - \left[ 1+\xi \left( \frac{x - \mu}{\sigma} \right) \right]^{-1/\xi} \right\}$

For ${1+\xi(x - \mu)/\sigma \gt 0}$, and with a location parameter $\mu$, a location parameter $\sigma$, and a shape parameter $\xi$. In the limit that $xi \rightarrow 0$, the distribution reduces to:

$F(x|\mu,\sigma) = \exp \left\{ - \exp \left( - \frac{x - \mu}{\sigma} \right) \right\}$

In many cases, for practical purposes, one makes the assumption that $\xi \rightarrow 0$. 