- What metadata is
- What metadata can reveal
- Why metadata is difficult to protect
What Is Metadata?
Metadata is all the information about the data but not the data itself and is best illustrated with a few examples.
- For a phone call, the metadata will include the phone numbers involved, the start time of the call, and the length of the call. For cell phone calls, the metadata will likely include the location of your phone (the GPS coordinates), the cell tower that you are connected to, and even the type of phone you are using. Metadata of phone calls would not include the audio transmission itself—this would be the “data.” The historical use of recording phone-call metadata is for the purposes of billing.
- Most modern digital photographs include information about the time and place the photo was taken, the type of camera used, and its settings. In this case, the photo itself is the data. Many websites, such as Facebook, Twitter, and Instagram, remove this metadata for your privacy when you upload a photo or video. Others do not, such as Google, Flickr, and YouTube.
- Almost all modern color printers, at the request of the US government to printer manufacturers over fears of their use in money counterfeiting, print a forensic code on each page that may be visible or not. In this case, the printed sheet (less the forensic code) would be the data, and the information encoded by the forensic code would be the metadata. The forensic code, which may or may not be visible to the human eye, has been known to include the day and time the sheet was printed and the serial number of the printer used.
The first disclosure by Edward Snowden revealed that the NSA was collecting all the metadata of calls made by Verizon customers, forcing a conversation about metadata into the public consciousness. A debate on what privacy was being invaded by this practice ensued. Earlier that year, the Associated Press fought back against the collection of metadata obtained by subpoena from the Justice Department, saying, “These records potentially reveal communications with confidential sources across all of the newsgathering activities undertaken by the AP during a two-month period, provide a road map to AP’s newsgathering operations, and disclose information about AP’s activities and operations that the government has no conceivable right to know.” A court opinion noted that the collection of GPS data through such metadata collection “can deduce whether he is a weekly churchgoer, a heavy drinker, a regular at the gym, an unfaithful husband, an outpatient receiving medical treatment, an associate of particular individuals or political groups.”
In an internal document, the NSA has referred to metadata as being one of the agency’s “most useful tools.”
Metadata and the Internet
When you visit a website, information is being sent between your computer and the server of the website through the internet. At a basic level, a message is sent from your computer to the server requesting the contents of the website, and then the contents of the website are sent from the server to your computer. The information being sent over the internet is often referred to as traffic, and any message being sent will actually be broken up into many shorter messages or packets. Each packet has three main parts:
- The header includes the internet address of the sender and the receiver (e.g., your computer and the website’s server) and a description of the type of data that is being sent (e.g., HTML).
- The data is the content of the message (e.g., the content of the web page or part of the web page).
- The trailer indicates the end of the packet and provides proof that the packet has not been corrupted in transit (using a hash function).
The metadata is composed of the header and the trailer. The header is difficult to protect or conceal because it indicates where a packet should be sent. Just like sending a letter, an address is needed for delivery. Your internet address, or IP address, is related to your physical location; in fact, often your physical location can be determined from your IP address.
This description applies to any information that is sent over the internet—email, video streaming, VOIP calls, and instant messages included.
In Context: Protecting a Whistleblower
In May 2017, Reality Winner disclosed NSA documents reporting on Russian interference in the 2016 US presidential election. Her arrest, days before the story was published, prompted much speculation around how she was so quickly identified as the whistleblower, with many people pointing the blame at the website Intercept for their handling of the story. Reality Winner had anonymously mailed a color printout of the documents to the Intercept. In standard journalistic fashion, the Intercept sent a photograph of the documents to the NSA for verification. The same photograph was redacted and made public in their reporting. Shortly after the publication of the story, several people pointed out that printer forensic code was visible in the photo and determined the day and time the document was printed and the serial number of the printer. While it is possible that the FBI could have identified Reality Winner from this information (to best protect its source, the Intercept should have redacted the forensic code from the photo), it is probably more likely she was outed by logs of file accesses on her work computer.
- CNN. “AP Blasts Feds for Phone Records Search.” May 14, 2013.
- Electronic Frontier Foundation. “Justice Department Subpoena of AP Journalists Shows Need to Protect Calling Records.” May 13, 2013.
- Electronic Frontier Foundation. “Secret Code in Color Printers Lets Government Track You.” October 16, 2005.
- Snowden Archive—the SIDtoday Files. “The Rewards of Metadata.” Intercept, January 23, 2004.
- New Yorker. “The Metadata Program in Eleven Documents.” December 31, 2013.
- Intercept. “Top-Secret NSA Report Details Russian Hacking Effort Days before 2016 Election.” June 5, 2017.
- Atlantic. “The Mysterious Printer Code That Could Have Led the FBI to Reality Winner.” June 6, 2017.