# Chapter 11: Message Authentication Codes

### Return to Table of Contents

In a chosen-ciphertext attack, the adversary can decrypt arbitrary ciphertexts of its choosing. It would be helpful if there was a way for the decryption algorithm to distinguish between ciphertexts that were honestly generated (by the library itself) and ciphertexts created by the adversary. For example, if the decryption algorithm was able to detect and reject (*i.e.*, raise an error) ciphertexts not created honestly, this could be a promising approach to achieve CCA security.

The problem of *determining the origin* of information is known as **authentication**, and it’s a different problem then that of *hiding* information. For example, we may wish to know whether a ciphertext was created honestly by the library itself, while there is no need to *hide* the ciphertext.

In the libraries that define CCA security, honestly generated ciphertexts are generated by the libraries with knowledge of the secret key *k*, and again decrypted by the library with knowledge of the key. We would like a way to prove to the receiver (decryption procedure) that a piece of information (the ciphertext) was created by someone (the encryption procedure) who knows the secret key.

The tool for the job is called a **message authentication code**, or **MAC** for short. A MAC scheme consists of two algorithms:

- KeyGen: samples a secret key for the MAC
- MAC(
*k,m*): given the secret key*k*and a plaintext*m*∈ ?, output a MAC (sometimes called a “tag”)*t*. Typically MAC is deterministic.

We will follow the (confusing) convention of terminology, and use the term “MAC” to sometimes refer to the scheme itself, and sometimes just the “tag” associated with the message (i.e., “the MAC of *m*”).

Think of a MAC as a kind of signature that will be appended to a piece of data. Only someone with the secret key can generate (and verify) a valid MAC. Our intuitive security requirement is that: an adversary who sees valid MACs of many messages cannot produce a **forgery** — a valid MAC of a *different* message.

**11.1: Security Definition**

This is a totally new kind of security requirement. It has nothing to do with *hiding* information. Rather, it is about whether the adversary can actually generate a particular piece of data (*e.g.*, a MAC forgery).

To express such a requirement in a security definition, we need a slightly different approach to defining the libraries. Consider a much simpler statement: “no adversary should be able to guess a uniformly chosen value.” We can formalize this idea with the following two libraries:

The left library allows the calling program to attempt to guess a uniformly chosen “target” string. The right library doesn’t even bother to verify the calling program’s guess — in fact it doesn’t even bother to sample a random target string!

Focus on the difference between these two libraries. Their GUESS subroutines give the same output on nearly all inputs. There is only one input *r* on which they disagree. If a calling program can manage to find that input *r*, then it can easily distinguish the libraries. Therefore, if the libraries are indistinguishable, it means that the adversary cannot find/generate one of these special inputs! That’s the kind of property we want to express.

Indeed, in this case, an adversary who makes *q* queries to the GUESS subroutine achieves an advantage of at most *q*/2* ^{λ}*. For polynomial-time adversaries, this is a negligible advantage (since

*q*is a polynomial function of

*λ*).

**The MAC security definition**. Let’s follow the pattern from the simple example above. We want to say that no adversary can generate a MAC forgery. So our libraries will provide a mechanism to let the adversary *check* whether it has a forgery. One library will actually perform the check, and the other library will simply assume that a forgery can never happen. The two libraries are different only in how they behave when the adversary calls a subroutine on a *true forgery*. So by demanding that the two libraries be indistinguishable, we are actually demanding that it is difficult for the calling program to generate a forgery.

### Definition 11.1: MAC Security

*Let Σ be a MAC scheme. We say that Σ is a* **secure MAC** *if* ℒ^{Σ}_{mac-real} ≋ ℒ^{Σ}_{mac-fake}, *where:*

#### Discussion

- We do allow the adversary to request MACs of chosen messages, hence the GETMAC subroutine. This means that the adversary is always capable of obtaining valid MACs. However, MACs that were generated by GETMAC don’t count as forgeries — only a MAC of a
*different*message would be a forgery.

For this reason, the ℒ_{mac-fake} library keeps track of which MACs were generated by GETMAC, in the set ?. It is clear that these MACs should always be judged valid by VER. But for all other inputs to VER, ℒ_{mac-fake} simply answers false.

- The adversary “wins” by successfully finding
*any*forgery — a valid MAC of*any*“fresh” message. The definition doesn’t care whether it’s the MAC of any particular*meaningful*message.

**11.2: A PRF is a MAC**

The definition of a PRF says (more or less) that even if you’ve seen the output of the PRF on several chosen inputs, all other outputs look independently uniformly random. Furthermore, uniformly chosen values are hard to guess. Putting these two observations together, and we’re close to to the MAC definition. Indeed, a PRF is a secure MAC.

### Claim 11.2

*The scheme* MAC(*k,m*) = *F*(*k,m*) *is a secure MAC, when F is a secure pseudorandom function*.

**11.3: CBC-MAC**

A PRF typically supports only messages of a fixed length, but we will soon see that it’s useful to have a MAC that supports longer messages. A classical approach to extend the input size of a MAC involves the CBC block cipher mode applied to a PRF.

### Construction 11.3: CBC-MAC

*Let F be a PRF with in* = *out* = *λ. Then for every fixed parameter ℓ, CBC-MAC refers to the following MAC scheme:*

CBC-MAC differs from CBC encryption mode in two important ways: First, there is no initialization vector. Indeed, CBC-MAC is deterministic (you can think of it as CBC encryption but with the initialization vector fixed to all zeroes). Second, CBC-MAC outputs only the last block.

### Claim 11.4

*If F is a secure PRF with in* = *out* = *λ*, *then for every fixed ℓ, CBC-MAC is a secure MAC for message space *? = {*0,1*}* ^{λℓ}*.

Note that we have restricted the message space to messages of exactly *ℓ* blocks. Unlike CBC encryption, CBC-MAC is **not** suitable for messages of variable lengths. If the adversary is allowed to request the CBC-MAC of messages of different lengths, then it is possible for the adversary to generate a forgery (see the homework).

**11.4: Encrypt-Then-MAC**

Our motivation for studying MACs is that they seem useful in constructing a CCA-secure encryption scheme. The idea is to combine a MAC with a CPA-secure encryption scheme. The decryption algorithm can verify the MAC and raise an error if the MAC is invalid. There are several natural ways to combine a MAC and encryption scheme, but *not all are secure!* (See the exercises.) The safest way is known as encrypt-then-MAC:

### Construction 11.5: Enc-then-MAC

*Let E denote an encryption scheme, and M denote a MAC scheme where E*.? ⊆ *M*.? *(*i.e., *the MAC scheme is capable of generating MACs of ciphertexts in the E scheme). Then let EtM denote the encrypt-then-MAC construction given below:*

Importantly, the scheme computes a MAC *of the CPA ciphertext*, and not of the plaintext! The result is a CCA-secure encryption scheme:

### Claim 11.6

*If E has CPA security and M is a secure MAC, then EtM (Construction 11.5) has CCA security.*

#### Proof

As usual, we prove the claim with a sequence of hybrid libraries:

The starting point is ℒ^{EtM}_{cca-L}, shown here with the details of the encrypt-then-MAC construction highlighted. Our goal is to eventually swap *m _{L}* with

*m*. But the CPA security of

_{R}*E*should allow us to do just that, so what’s the catch?

To apply the CPA-security of *E*, we must to factor out the relevant call to *E*.Enc in terms of the CPA library ℒ^{E}_{cpa-L}. This means that *k _{e}* becomes private to the ℒ

_{cpa-L}library. But

*k*is also used in the last line of the library as

_{e}*E*.Dec(

*k*,

_{e}*c*). The

*CPA*security library for

*E*provides no way to carry out such

*E*.Dec statements!

The operations of the MAC scheme have been factored out in terms of ℒ^{M}_{mac-real}. Notably, in the DEC subroutine the condition “*t* ≠ *M*.MAC(*k _{m}*,

*c*)” has been replaced with “not VER(

*c*,

*t*).”

We have applied the security of the MAC scheme, and replaced ℒ_{mac-real} with ℒ_{mac-fake}.

We have inlined the ℒ_{mac-fake} library. This library keeps track of a set *S* of values for the purpose of the CCA interface, but also a set ? of values for the purposes of the MAC. However, it is clear from the code of this library that *S* and ? always have the same contents.

Therefore, the two conditions “(*c*,*t*) ∈ *S*” and “(*c*,*t*) ∉ ?” in the DEC subroutine are *exhaustive!* The final line of DEC is *unreachable*. This hybrid highlights the intuitive idea that an adversary can either query DEC with a ciphertext generated by CHALLENGE (the (*c*,*t*) ∈ *S* case) — in which case the response is null — or with a different ciphertext — in which case the response will be **err** since the MAC will not verify.

The unreachable statement has been removed and the redundant variables *S* and ? have been unified. Note that this hybrid library never uses *E*.Dec, making it possible to express its use of the *E* encryption scheme in terms of ℒ_{cpa-L}.

The statements involving the encryption scheme *E* have been factored out in terms of ℒ_{cpa-L}.

We have now reached the half-way point of the proof. The proof proceeds by replacing ℒ_{cpa-L} with ℒ_{cpa-R}, applying the same modifications as before (but in reverse order), to finally arrive at ℒ_{cca-R}. The repetitive details have been omitted, but we mention that when listing the same steps in reverse, the changes appear very bizarre indeed. For instance, we add an unreachable statement to the DEC subroutine; we create a redundant variable ? whose contents are the same as *S*; we mysteriously change one instance of *S* (the condition of the second if-statement in DEC) to refer to the other variable ?. Of course, all of this is so that we can factor out the statements referring to the MAC scheme (along with ?) in terms of ℒ_{mac-fake} and finally replace ℒ_{mac-fake} with ℒ_{mac-real}.

**Exercises**

11.1: Consider the following MAC scheme, where *F* is a secure PRF with *in* = *out* = *λ*:

Show that the scheme is **not** a secure MAC. Describe a distinguisher and compute its advantage.

11.2: Consider the following MAC scheme, where *F* is a secure PRF with *in* = *out* = *λ*:

Show that the scheme is **not** a secure MAC. Describe a distinguisher and compute its advantage.

11.3: Consider the following MAC scheme, where *F* is a secure PRF with *in* = *out* = *λ*:

Show that the scheme is **not** a secure MAC. Describe a distinguisher and compute its advantage.

11.4: Consider the following MAC scheme, where *F* is a secure PRF with *in* = 2*λ* and *out* = *λ*:

In the argument to *F*, we write *i*‖*m _{i}* to denote the integer

*i*(written as a

*λ*-bit binary number) concatenated with the message block

*m*. Show that the scheme is

_{i}**not**a secure MAC. Describe a distinguisher and compute its advantage.

11.5: Suppose we expand the message space of CBC-MAC to ? = ({0,1}^{λ})^{∗}. In other words, the adversary can request a MAC on any message whose length is an exact multiple of the block length *λ*. Show that the result is **not** a secure MAC. Construct a distinguisher and compute its advantage.

*Hint:* Request a MAC on two single-block messages, then use the result to forge the MAC of a two-block message.

11.6: Let *E* be a CPA-secure encryption scheme and *M* be a secure MAC. Show that the following encryption scheme (called encrypt & MAC) is **not** CCA-secure:

Describe a distinguisher and compute its advantage.

11.7: Let *E* be a CPA-secure encryption scheme and *M* be a secure MAC. Show that the following encryption scheme Σ (which I call encrypt-and-encrypted-MAC) is **not** CCA-secure:

Describe a distinguisher and compute its advantage.

11.8: In Construction 8.4, we encrypt one plaintext block into two ciphertext blocks. Imagine applying the Encrypt-then-MAC paradigm to this encryption scheme, but (erroneously) computing a MAC of *only* the second ciphertext block.

In other words, let *F* be a PRP with block length *blen* = *λ*, and let *M* be a MAC scheme for message space {0,1}* ^{λ}*. Define the following encryption scheme:

Show that the scheme does **not** have CCA security. Describe a successful attack and compute its advantage.

*Hint:* Obtain valid encryptions (*r,x,t*), (*r’,x’,t’*), and (*r”,x”,t”*) of unknown plaintexts *m, m’, m”*. Consider what information about the plaintexts is leaked via:

*k*

_{e},

*k*

_{m}), (

*r’,x,t*)) ⊕ Dec((

*k*

_{e},

*k*

_{m}), (

*r”,x,t*)) ⊕

*x’*⊕

*x”*.

11.9: When we combine different cryptographic ingredients (*e.g.*, combining a CPA-secure encryption scheme with a MAC to obtain a CCA secure scheme) we generally require the two ingredients to use *separate, independent keys*. It would be more convenient if the entire scheme just used a single *λ*-bit key.

(a). Suppose we are using Encrypt-then-MAC, where both the encryption scheme and MAC have keys that are *λ* bits long. Refer to the proof of security in the notes (11.4) and **describe where it breaks down** when we modify Encrypt-then-MAC to use the same key for both the encryption & MAC components:

(b). While Encrypt-then-MAC requires independent keys *k*_{e} and *k*_{m} for the two components, show that they can both be *derived* from a single key using a PRF. In more detail, let *F* be a PRF with *in* = 1 and *out* = *λ*. Prove that the following modified Encrypt-then-MAC construction is CCA-secure:

You should not have to re-prove all the tedious steps of the Encrypt-then-MAC security proof. Rather, you should apply the security of the PRF in order to reach the *original* Encrypt-then-MAC construction, whose security we already proved (so you don’t have to repeat).