# Chapter 5: Pseudorandom Generators

### Return to Table of Contents

We have already seen that randomness is essential for cryptographic security. Following Kerckhoff’s principle, we assume that an adversary knows *everything* about our cryptographic algorithms except for the outcome of the internal random choices made when running the algorithm. Private randomness truly is the *only* resource honest parties can leverage for security in the presence of an adversary.

One-time secrecy for encryption is a very demanding requirement, when we view randomness as a resource. One-time secrecy essentially requires one to use independent, uniformly random bits to mask every bit of plaintext that is sent. As a result, textbook one-time pad is usually grossly impractical for practical use.

In this chapter, we ask whether it might be beneficial to settle for a distribution of bits which is not uniform but merely “looks uniform enough.” This is exactly the idea behind **pseudorandomness**. A pseudorandom distribution is not uniform in a strict mathematical sense, but is *indistinguishable* from the uniform distribution by polynomial-time algorithms (in the sense defined in the previous chapter).

Perhaps surprisingly, a relatively small amount of *truly uniform* bits can be used to generate a huge amount of *pseudorandom* bits. Pseudorandomness therefore leads to an entire new world of cryptographic possibilities, an exploration that we begin in this chapter.

**5.1: Definition**

As mentioned above, a pseudorandom distribution “looks uniform” to all polynomial-time computations. We already know of a distribution that “looks uniform” — namely, the uniform distribution itself! A more interesting case is when *λ* uniform bits are used to *deterministically* (*i.e.*, without further use of randomness) produce *λ* + *ℓ* pseudorandom bits, for *ℓ* > 0. The process that “extends” *λ* bits into *λ* + *ℓ* bits is called a pseudorandom generator. More formally:

### Definition 5.1: PRG Security

*Let G* : {0,1}^{λ} → {0,1}^{λ + ℓ} *be a deterministic function with ℓ* > 0. *We say that G is a pseudorandom generator (PRG) if* ℒ

^{G}

_{prg-real}≋ ℒ

^{G}

_{prg-rand},

*where*:

*The value ℓ is called the stretch of the PRG. The input to the PRG is typically called a seed.*

#### Discussion:

- Is 0010110110 a random string? Is it pseudorandom? What about 0000000001? Do these questions make any sense?

Randomness and pseudorandomness are not properties of individual strings, they are properties of the *process* used to generate the string. We will try to be precise about how we talk about these things (and you should too). When we have a value *z* = *G*(*s*) where *G* is a PRG and *s* is chosen uniformly, we can say that *z* was “chosen pseudorandomly”, but not that *z* “is pseudorandom”. On the other hand, a *distribution* can be described as “pseudorandom.” The same goes for describing a value as “chosen uniformly” and describing a distribution as “uniform.”

- Pseudorandomness can happen only in the computational setting, where we restricts focus to polynomial-time adversaries. The exercises ask you to prove that for all functions
*G*(with positive stretch), ℒ^{G}_{prg-real }≢≡ ℒ^{G}_{prg-rand}(note the use of ≡ rather than ≋). That is, the output distribution of the PRG can never actually*be*uniform in a mathematical sense. Because it has positive stretch, the best it can hope to be is pseudorandom. - It’s sometimes convenient to think in terms of statistical tests. When given access to some data claiming to be uniformly generated, your first instinct is probably to perform a set of basic
*statistical tests*: Are there roughly an equal number of 0s as 1s? Does the substring 01010 occur with roughly the frequency I would expect? If I interpret the string as a series of points in the unit square [0,1)^{2}, is it true that roughly*π*/4 of them are within Euclidean distance 1 of the origin?^{1}

^{1}For a list of statistical tests of randomness that are actually used in practice, see http://csrc.nist.gov/

publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf.

The definition of pseudorandomness is kind of a “master” definition that encompasses all of these statistical tests and more. After all, what is a statistical test, but a polynomial-time procedure that obtains samples from a distribution and outputs a yes-or-no decision? Pseudorandomness implies that *every* statistical test will “accept” when given pseudorandomly generated inputs with essentially the same probability as when given uniformly sampled inputs.

- Consider the case of a length-doubling PRG (so
*ℓ*=*λ*; the PRG has input length*λ*and output length 2*λ*). The PRG only has 2^{λ}possible inputs, and so there are at most only 2^{λ}possible outputs. Among all of {0,1}^{2λ}, this is a miniscule fraction indeed. Almost all strings of length 2*λ*are*impossible outputs*of*G*. So how can outputs of*G*possibly “look uniform?”

The answer is that it’s not clear how to take advantage of this observation. While it is true that most strings (in a *relative* sense as a fraction of 2* ^{2λ}*) are impossible outputs of

*G*, it is also true that 2

^{λ}of them are possible, which is certainly a lot from the perspective of a program who runs in polynomial time in

*λ*. Recall that the problem at hand is designing a distinguisher to behave as differently as possible in the presence of pseudorandom and uniform distributions. It is not enough to behave differently on just a few strings here and there — an individual string can contribute at most 1/2

^{λ}to the final distinguishing advantage. A successful distinguisher must be able to recognize a huge number of outputs of

*G*so it can behave differently on them, but there are exponentially many

*G*-outputs, and they may not follow any easy-to-describe pattern.

Below is an example that explores these ideas in more detail.

#### Example

*Let G be a length-doubling PRG as above, let t be an arbitrary string t* ∈ {0,1}* ^{2λ}*,

*and consider the following distinguisher ?*

_{t}that has the value t hard-coded:*What is the distinguishing advantage of ? _{t}?*

We can see easily what happens when ?* _{t}* is linked to ℒ

^{G}_{prg-rand}. We get:

What happens when linked to ℒ^{G}_{prg-real} depends on whether *t* is a possible output of *G*, which is a simple property of *G*. We always allow distinguishers to depend arbitrarily on *G*. In particular, it is fine for a distinguisher to “know” whether its hard-coded value *t* is a possible output of *G*. What the distinguisher “doesn’t know” is which library it is linked to, and the value of *s* that was chosen in the case that it is linked to ℒ^{G}_{prg-real}.

Suppose for simplicity that *G* is injective (the exercises explore what happens when *G* is not). If *t* is a possible output of *G*, then there is exactly one choice of *s* such that *G*(*s*) = *t*, so we have:

Hence, the distinguishing advantage of ?* _{t}* is | 1/2

^{2λ}− 1/2

^{λ}≤ 1/2

^{λ}. If

*t*is not a possible output of

*G*, then we have:

Hence, the distinguishing advantage of ?_{t} is |1/2^{2λ} − 0| = 1/2^{2λ}.

In either case, ?_{t} has negligible distinguishing advantage. This merely shows that ?* _{t}* (for any hard-coded

*t*) is not a particularly helpful distinguisher for any PRG. Of course, any candidate PRG might be insecure because of other distinguishers, but this example should serve to illustrate that PRG security is at least compatible with the fact that some strings are impossible outputs of the PRG

#### Related Concept: Random Number Generation

The definition of a PRG includes a *uniformly sampled* seed. In practice, this PRG seed has to come from somewhere. Generally a source of “randomness” is provided by the hardware or operating system, and the process that generates these random bits is (confusingly) called a random number generator (RNG).

In this course we won’t cover low-level random number generation, but merely point out what makes it different than the PRGs that we study:

- The job of a PRG is to take a small amount of “ideal” (in other words, uniform) randomness and extend it.
- By contrast, an RNG usually takes many inputs over time and maintains an internal state. These inputs are often from physical/hardware sources. While these inputs are “noisy” in some sense, it is hard to imagine that they would be statistically
*uniform*. So the job of the RNG is to “refine” (sometimes many) sources of noisy data into uniform outputs.

#### Perspective on this Chapter

PRGs are a fundamental cryptographic building block that can be used to construct more interesting things. But you are unlikely to ever find yourself designing your own PRG or building something from a PRG that couldn’t be built from some higher-level primitive instead. For that reason, we will not discuss specific PRGs or how they are designed in practice. Rather, the purpose of this chapter is to build your understanding of the concepts (like pseudorandomness, indistinguishability, proof techniques) that will be necessary in the rest of the class.

**5.2: Application: Shorter Keys in One-Time-Series Encryption**

PRGs essentially convert a smaller amount of uniform randomness into a larger amount of *good-enough-for-polynomial-time* randomness. So a natural first application of PRGs is to obtain one-time-secure encryption but with keys that are shorter than the plaintext. This is the first of many acts of crypto-magic which are impossible except when restricting our focus to polynomial-time adversaries.

The main idea is to hold a short key *k*, expand it to a longer string using a PRG *G*, and use the result as a one-time pad on the (longer) plaintext. More precisely, let *G* : {0,1}^{λ} → {0,1}^{λ + ℓ} be a PRG, and define the following encryption scheme:

**Construction 5.2: Pseudo-OTP**

The resulting scheme will not have (perfect) one-time secrecy. That is, encryptions of *m _{L}* and

*m*will not be identically distributed in general. However, the distributions will be

_{R}*indistinguishable*if

*G*is a secure PRG. The precise flavor of security obtained by this construction is the following.

### Definition 5.3:

*Let Σ be an encryption scheme, and let* ℒ^{Σ}_{ots-L} *and* ℒ^{Σ}_{ots-R} *be defined as in Definition 2.5 (and repeated below for convenience). Then Σ has (computational) one-time secrecy if* ℒ

^{Σ}

_{ots-L}≋ ℒ

^{Σ}

_{ots-R}.

*That is, if for all polynomial-time distinguishers ?, we have*Pr[? ◊ ℒ

^{Σ}

_{ots-L}= 1] ≈ Pr[? ◊ ℒ

^{Σ}

_{ots-R}= 1].

### Claim 5.4

*Let *pOTP[*G*] *denote Construction 5.2 instantiated with PRG G. If G is a secure PRG then* pOTP[*G*] *has computational one-time secrecy*.

#### Proof:

We must show that ℒ^{pOTP[G]}_{ots-L} ≋ ℒ^{pOTP[G]}_{ots-R}. As usual, we will proceed using a sequence of hybrids that begins at ℒ^{pOTP[G]}_{ots-L} and ends at ℒ^{pOTP[G]}_{ots-R}. For each hybrid library, we will demonstrate that it is indistinguishable from the previous one. Note that we we are allowed to use the fact that *G* is a secure PRG. In practical terms, this means that if we can express some hybrid library in terms of ℒ^{G}_{prg-real} (one of the libraries in the PRG security definition), we can replace it with its counterpart ℒ^{G}_{prg-rand} (or vice-versa). The PRG security of *G* says that such a change will be indistinguishable.

The starting point is ℒ^{pOTP[G]}_{ots-L} , shown here with the details of pOTP[*G*] filled in

The first hybrid step is to factor out the computations involving *G*, in terms of the ℒ^{G}_{prg-real} library.

From the PRG security of *G*, we may replace the instance of ℒ^{G}_{prg-real} with ℒ^{G}_{prg-rand}. The resulting hybrid library ℒ_{hyb-2} is indistinguishable from the previous one.

A subroutine has been inlined. Note that the resulting library is precisely ℒ^{OTP}_{ots-L}! Here, OTP is instantiated on plaintexts of size *λ* + *ℓ* (and the variable *k* has been renamed to *z*).

The (perfect) one-time secrecy of OTP allows us to replace ℒ^{OTP}_{ots-L} with ℒ^{OTP}_{ots-R}; they are interchangeable.

The rest of the proof is essentially a “mirror image” of the previous steps, in which we perform the same steps but in reverse (and with *m _{R}* hanging around instead of

*m*).

_{L}A statement has been factored out into a subroutine, which happens to exactly match ℒ^{G}_{prg-rand}.

From the PRG security of *G*, we can replace ℒ^{G}_{prg-rand} with ℒ^{G}_{prg-real}. The resulting library is indistinguishable from the previous one.

A subroutine has been inlined. The result is ℒ^{pOTP[G]}_{ots-R} .

Summarizing, we showed a sequence of hybrid libraries satisfying the following:

ℒ^{pOTP[G]}_{ots-L} ≡ ℒ_{hyb-1} ≋ ℒ_{hyb-2} ≡ ℒ_{hyb-3} ≡ ℒ_{hyb-4} ≡ ℒ_{hyb-5} ≋ ℒ_{hyb-6} ≡ ℒ^{pOTP[G]}_{ots-R} .

Hence, ℒ^{pOTP[G]}_{ots-L} ≋ ℒ^{pOTP[G]}_{ots-R} , and pOTP has (computational) one-time secrecy.

**5.3: Taking the Contrapositive Point-of-View**

We just proved the statement “if *G* is a secure PRG, then pOTP[*G*] has one-time secrecy,” but let’s also think about the contrapositive of that statement:

*G*] scheme is

**not**one-time secret, then

*G*is

**not**a secure PRG.

If the pOTP scheme is not secure, then there is some distinguisher ? that can distinguish the two ℒ_{ots-*} libraries with better than negligible advantage. Knowing that such an ? exists, can we indeed break the security of *G*?

Imagine going through the sequence of hybrid libraries with this hypothetical ? as the calling program. We know that one of the steps of the proof must break down since ? successfully distinguishes between the endpoints of the hybrid sequence. Some of the steps of the proof were unconditional; for example, factoring out and inlining subroutines *never* has an effect on the calling program. These steps of the proof always hold; they are the steps where we write ℒ_{hyb-i} ≡ ℒ_{hyb-(i + 1)}.

The steps where we write ℒ_{hyb-i} ≋ ℒ_{hyb-(i + 1)} are *conditional*. In our proof, the steps ℒ_{hyb-1} ≋ ℒ_{hyb-2} and ℒ^{OTP}_{ots-R} ≋ ℒ_{hyb-4} relied on *G* being a secure PRG. So if a hypothetical ? was able to break the security of pOTP, then that same ? must also successfully distinguish between ℒ_{hyb-1} and ℒ_{hyb-2}, or between ℒ_{hyb-5} and ℒ_{hyb-6} — the only conditional steps of the proof.

Let’s examine the two cases:

- Suppose the hypothetical ? successfully distinguishes between ℒ
_{hyb-1}and ℒ_{hyb-2}. Let’s recall what these libraries actually look like:

Interestingly, these two libraries share a common component that is linked to either ℒ_{prg-real}or ℒ_{prg-rand}. (This is no coincidence!) Let’s call that common library ℒ^{∗}and write

Since ? successfully distinguishes between ℒ_{hyb-1}and ℒ_{hyb-2}, the following advantage is non-negligible:

But with a change of perspective, this means that ? ◊ ℒ^{∗} is a calling program that successfully distinguishes ℒ^{G}_{prg-real} from ℒ^{G}_{prg-rand}. In other words, ? ◊ ℒ^{∗} **breaks the PRG security of G!**

- Suppose the hypothetical ? only distinguishes ℒ
_{hyb-5}from ℒ_{hyb-6}. Going through the reasoning above, we will reach a similar conclusion but with a different ℒ^{∗}than before (in fact, it will be mostly the same library but with*m*replaced with_{L}*m*)._{R}

So you can think of our security proof as a very roundabout way of saying the following:

*If you give me an adversary/distinguisher ? that breaks the one-time secrecy of*pOTP[

*G*],

*then I can use it to build an adversary that breaks the PRG security of G. More specifically, G is guaranteed to be broken by (at least) one of the two distinguishers:*

^{2}

^{2}The statement is somewhat non-constructive, since we don’t know for sure which of the two distinguishers will be the one that actually works. But a way around this is to consider a single distinguisher that flips a coin and acts like the first one with probability 1/2 and acts like the second one with probability 1/2.

In fact, this would be the “traditional” method for proving security that you might see in many other cryptography texts. You would suppose you are given an adversary ? breaking pOTP, then you would demonstrate why at least one of the adversaries given above breaks the PRG. Unfortunately, it is quite difficult to explain to students how one is supposed to *come up with* these PRG-adversaries in terms of the one-time-secrecy adversaries. For that reason, we will reason about proofs in the “if this is secure then that is secure” realm, rather than the contrapositive “if that is insecure then this is insecure.”

**5.4: Extending the Stretch of a PRG**

Recall that the *stretch* of a PRG is the amount by which the PRG’s output length exceeds its input length. A PRG with very long stretch seems much more useful than one with small stretch. Is there a limit to the stretch of a PRG? Using only *?λ* bits of true uniform randomness, can we generate 100*λ*, or even *λ ^{3}* pseudorandom bits?

In this section we will see that once you can extend a PRG a little bit, you can also extend it a lot. This is another magical feat that is possible with pseudorandomness but not with truly uniform distributions. We will demonstrate the concept by extending a PRG with stretch *λ* into one with stretch 2*λ*, but the idea can be used to increase the stretch of any PRG indefinitely (see the exercises).

### Construction 5.5: PRG Feedback

*Let G* : {0,1}^{λ} → {0,1}* ^{2λ} be a length-doubling PRG (*i.e.

*, a PRG with stretch λ). When x*∈ {0,1}

*: {0,1}*

^{2λ}, we write x_{left}to denote the leftmost λ bits of x and x_{right}to denote the rightmost λ bits. Define the length-tripling function H*→ {0,1}*

^{λ}*:*

^{3λ}as follows

### Claim 5.6:

*If G is a secure length-doubling PRG, then H (defined above) is a secure length-tripling PRG.*

#### Proof:

We want to show that ℒ^{H}_{prg-real} ≋ ℒ^{H}_{prg-rand}. As usual, we do so with a hybrid sequence. Since we assume that *G* is a secure PRG, we are allowed to use the fact that ℒ^{G}_{prg-real} ≋ ℒ^{G}_{prg-rand}. In this proof, we will use the fact twice: once for each occurrence of *G* in the code of *H*.

The starting point is ℒ^{H}_{prg-real}, shown here with the details of *H* filled in.

The first invocation of *G* has been factored out into a subroutine. The resulting hybrid library includes an instance of ℒ^{G}_{prg-real}.

From the PRG security of *G*, we can replace the instance of ℒ^{G}_{prg-real} with ℒ^{G}_{prg-rand}. The resulting hybrid library is indistinguishable.

A subroutine has been inlined

Choosing 2*λ* uniformly random bits and then splitting them into two halves has exactly the same effect as choosing *λ* uniformly random bits and independently choosing *λ* more.

The remaining appearance of *G* has been factored out into a subroutine. Now ℒ^{G}_{prg-real} makes its second appearance

Again, the PRG security of *G* lets us replace ℒ^{G}_{prg-real} with ℒ^{G}_{prg-rand}. The resulting hybrid library is indistinguishable.

A subroutine has been inlined.

Similar to above, concatenating *λ* uniform bits with 2*λ* independently uniform bits has the same effect as sampling 3*λ* uniform bits. The result of this change is ℒ^{H}_{prg-rand}.

Through this sequence of hybrid libraries, we showed that:

ℒ^{H}_{prg-real} ≡ ℒ_{hyb-1} ≋ ℒ_{hyb-2} ≡ ℒ_{hyb-3} ≡ ℒ_{hyb-4} ≡ ℒ_{hyb-5} ≋ ℒ_{hyb-6} ≡ ℒ_{hyb-7} ≡ ℒ^{H}_{prg-rand}.

Hence, *H* is a secure PRG.

**Exercises:**

5.1: Let *G* : {0,1}^{λ} → {0,1}^{λ + ℓ}. Define im(*G*) = {*y* ∈ {0,1}^{λ + ℓ} | ∃*s* : *G*(*s*) = *y*}, the set of possible outputs of *G*. For simplicity, assume that *G* is injective (i.e., 1-to-1).

(a). What is |im(*G*)|, as a function of *λ* and *ℓ*?

(b). A string *y* is chosen randomly from {0,1}^{λ + ℓ}. What is the probability that *y* ∈ im(*G*), expressed as a function of *λ* and *ℓ*?

5.2: Let *G* : {0,1}* ^{λ}* → {0,1}

^{λ + ℓ}be an injective PRG. Consider the following distinguisher:

(a). What is the advantage of ? in distinguishing ℒ^{G}_{prg-real} and ℒ^{G}_{prg-rand}? Is it negligible?

(b). Does this contradict the fact that *G* is a PRG? Why or why not?

(c). What happens to the advantage if *G* is not injective?

5.3: Let *G* : {0,1}* ^{λ}* → {0,1}

^{λ + ℓ}be an injective PRG, and consider the following distinguisher:

What is the advantage of ? in distinguishing ℒ^{G}_{prg-real} from ℒ^{G}_{prg-rand}?

*Hint*: When computing Pr[? ◊ ℒ^{G}_{prg-rand} outputs 1], separate the probabilities based on whether *x* ∈ im(*G*) or not. (im(*G*) is defined in a previous problem)

5.4: For any PRG *G* : {0,1}* ^{λ}* → {0,1}

^{λ + ℓ}there will be many strings in {0,1}

^{λ + ℓ}that are impossible to get as output of

*G*. Let

*S*be any such set of impossible

*G*-outputs, and consider the following adversary that has

*S*hard-coded:

What is the advantage of ? in distinguishing ℒ^{G}_{prg-real} from ℒ^{G}_{prg-rand}? Why does an adversary like this one not automatically break every PRG?

5.5: Show that the scheme from Section 5.2 does not have *perfect* one-time secrecy, by showing that there must exist two messages *m _{1}* and

*m*whose ciphertext distributions differ.

_{2}*Hint*: There must exist strings *s _{1}*,

*s*∈ {0,1}

_{2}^{2λ}where

*s*∈ im(

_{1}*G*), and

*s*∉ im(

_{2}*G*). Use these two strings to find two messages

*m*and

_{1}*m*whose ciphertext distributions assign different probabilities to

_{2}*s*and

_{1}*s*. Note that it is legitimate for an attacker to “know”

_{2}*s*and

_{1}*s*, as these are properties of

_{2}*G*alone, and independent of the random choices made when executing the scheme.

5.6: (a). Let ? be any function. Show that the following function *G* is **not** a secure PRF (even if ? is a secure PRG!). Describe a successful distinguisher and explicitly compute its advantage:

(b).Let *G* : {0,1}* ^{λ}* → {0,1}

^{λ + ℓ}be a candidate PRG. Suppose there is a polynomial-time algorithm

*V*with the property that it inverts

*G*with non-negligible probability. That is,

Show that if an algorithm *V* exists with this property, then *G* is not a secure PRG. In other words, construct a distinguisher contradicting the PRG-security of *G* and show that it achieves non-negligible distinguishing advantage.

*Note:* Don’t assume anything about the output of *V* other than the property shown above. In particular, *V* might very frequently output the “wrong” thing.

5.7: In the “PRG feedback” construction *H* in Section 5.4, there are two calls to *G*. The security proof applies the PRG security rule to both of them, starting with the first. Describe what happens when you try to apply the PRG security of *G* to these two calls in the opposite order. Does the proof still work, or must it be in the order that was presented?

5.8: Let *ℓ’* > *ℓ* > 0. Extend the “PRG feedback” construction to transform any PRG of stretch *ℓ* into a PRG of stretch *ℓ’*. Formally define the new PRG and prove its security using the security of the underlying PRG.

5.9: Let *G* : {0,1}* ^{λ}* → {0,1}

^{3λ}be a length-tripling PRG. For each function below, state whether it is also a secure PRG. If the function is a secure PRG, give a proof. If not, then describe a successful distinguisher and explicitly compute its advantage.

5.10: Let *G* : {0,1}* ^{λ}* → {0,1}

^{3λ}be a length-tripling PRG. Prove that each of the following functions is also a secure PRG:

Note that *H* includes half of its input directly in the output. How do you reconcile this fact with the conclusion of Exercise 5.6(b)?

5.11: A frequently asked question in cryptography forums is whether it’s possible to determine which PRG implementation was used by looking at output samples.

Let *G*_{1} and *G*_{2} be two PRGs with matching input/output lengths. Define two libraries ℒ^{G1}_{which-prg} and ℒ^{G2}_{which-prg} as follows:

Prove that if *G*_{1} and *G*_{2} are both secure PRGs, then ℒ^{G1}_{which-prg} ≋ ℒ^{G2}_{which-prg} — that is, it is infeasible to distinguish which PRG was used simply by receiving output samples.

5.12: Prove that if PRGs exist, then P ≠ NP

*Hint:* {*y* | ∃*s* : *G*(*s*) = *y*} ∈ NP

*Note:* This implies that ℒ^{G}_{prg-real ≢≡} ℒ^{G}_{prg-rand} for all *G*. That is, there can be no PRG that is secure against computationally unbounded distinguishers.