Chapter 18. Data Analysis and Coding

Introduction

Piled before you lie hundreds of pages of fieldnotes you have taken, observations you’ve made while volunteering at city hall. You also have transcripts of interviews you have conducted with the mayor and city council members. What do you do with all this data? How can you use it to answer your original research question (e.g., “How do political polarization and party membership affect local politics?”)? Before you can make sense of your data, you will have to organize and simplify it in a way that allows you to access it more deeply and thoroughly. We call this process coding.[1] Coding is the iterative process of assigning meaning to the data you have collected in order to both simplify and identify patterns. This chapter introduces you to the process of qualitative data analysis and the basic concept of coding, while the following chapter (chapter 19) will take you further into the various kinds of codes and how to use them effectively.

To those who have not yet conducted a qualitative study, the sheer amount of collected data will be a surprise. Qualitative data can be absolutely overwhelming—it may mean hundreds if not thousands of pages of interview transcripts, or fieldnotes, or retrieved documents. How do you make sense of it? Students often want very clear guidelines here, and although I try to accommodate them as much as possible, in the end, analyzing qualitative data is a bit more of an art than a science: “The process of bringing order, structure, and interpretation to a mass of collected data is messy, ambiguous, time-consuming, creative, and fascinating. It does not proceed in a linear fashion: it is not neat. At times, the researcher may feel like an eccentric and tormented artist; not to worry, this is normal” (Marshall and Rossman 2016:214).

To complicate matters further, each approach (e.g., Grounded Theory, deep ethnography, phenomenology) has its own language and bag of tricks (techniques) when it comes to analysis. Grounded Theory, for example, uses in vivo coding to generate new theoretical insights that emerge from a rigorous but open approach to data analysis. Ethnographers, in contrast, are more focused on creating a rich description of the practices, behaviors, and beliefs that operate in a particular field. They are less interested in generating theory and more interested in getting the picture right, valuing verisimilitude in the presentation. And then there are some researchers who seek to account for the qualitative data using almost quantitative methods of analysis, perhaps counting and comparing the uses of certain narrative frames in media accounts of a phenomenon. Qualitative content analysis (QCA) often includes elements of counting (see chapter 17). For these researchers, having very clear hypotheses and clearly defined “variables” before beginning analysis is standard practice, whereas the same would be expressly forbidden by those researchers, like grounded theorists, taking a more emergent approach.

All that said, there are some helpful techniques to get you started, and these will be presented in this and the following chapter. As you become more of an expert yourself, you may want to read more deeply about the tradition that speaks to your research. But know that there are many excellent qualitative researchers that use what works for any given study, who take what they can from each tradition. Most of us find this permissible (but watch out for the methodological purists that exist among us).

 

Null
[Untitled image] by Mary Cullen on Unsplash

Qualitative Data Analysis as a Long Process!

Although most of this and the following chapter will focus on coding, it is important to understand that coding is just one (very important) aspect of the long data-analysis process. We can consider seven phases of data analysis, each of which is important for moving your voluminous data into “findings” that can be reported to others. The first phase involves data organization. This might mean creating a special password-protected Dropbox folder for storing your digital files. It might mean acquiring computer-assisted qualitative data-analysis software (CAQDAS) and uploading all transcripts, fieldnotes, and digital files to its storage repository for eventual coding and analysis. Finding a helpful way to store your material can take a lot of time, and you need to be smart about this from the very beginning. Losing data because of poor filing systems or mislabeling is something you want to avoid. You will also want to ensure that you have procedures in place to protect the confidentiality of your interviewees and informants. Filing signed consent forms (with names) separately from transcripts and linking them through an ID number or other code that only you have access to (and store safely) are important.

Once you have all of your material safely and conveniently stored, you will need to immerse yourself in the data. The second phase consists of reading and rereading or viewing and reviewing all of your data. As you do this, you can begin to identify themes or patterns in the data, perhaps writing short memos to yourself about what you are seeing. You are not committing to anything in this third phase but rather keeping your eyes and mind open to what you see. In an actual study, you may very well still be “in the field” or collecting interviews as you do this, and what you see might push you toward either concluding your data collection or expanding so that you can follow a particular group or factor that is emerging as important. For example, you may have interviewed twelve international college students about how they are adjusting to life in the US but realized as you read your transcripts that important gender differences may exist and you have only interviewed two women (and ten men). So you go back out and make sure you have enough female respondents to check your impression that gender matters here. The seven phases do not proceed entirely linearly! It is best to think of them as recursive; conceptually, there is a path to follow, but it meanders and flows.

Coding is the activity of the fourth phase. The second part of this chapter and all of chapter 19 will focus on coding in greater detail. For now, know that coding is the primary tool for analyzing qualitative data and that its purpose is to both simplify and highlight the important elements buried in mounds of data. Coding is a rigorous and systematic process of identifying meaning, patterns, and relationships. It is a more formal extension of what you, as a conscious human being, are trained to do every day when confronting new material and experiences. The “trick” or skill is to learn how to take what you do naturally and semiconsciously in your mind and put it down on paper so it can be documented and verified and tested and refined.

At the conclusion of the coding phase, your material will be searchable, intelligible, and ready for deeper analysis. You can begin to offer interpretations based on all the work you have done so far. This fifth phase might require you to write analytic memos, beginning with short (perhaps a paragraph or two) interpretations of various aspects of the data. You might then attempt stitching together both reflective and analytical memos into longer (up to five pages) general interpretations or theories about the relationships, activities, patterns you have noted as salient.

As you do this, you may be rereading the data, or parts of the data, and reviewing your codes. It’s possible you get to this phase and decide you need to go back to the beginning. Maybe your entire research question or focus has shifted based on what you are now thinking is important. Again, the process is recursive, not linear. The sixth phase requires you to check the interpretations you have generated. Are you really seeing this relationship, or are you ignoring something important you forgot to code? As we don’t have statistical tests to check the validity of our findings as quantitative researchers do, we need to incorporate self-checks on our interpretations. Ask yourself what evidence would exist to counter your interpretation and then actively look for that evidence. Later on, if someone asks you how you know you are correct in believing your interpretation, you will be able to explain what you did to verify this. Guard yourself against accusations of “cherry-picking,” selecting only the data that supports your preexisting notion or expectation about what you will find.[2]

The seventh and final phase involves writing up the results of the study. Qualitative results can be written in a variety of ways for various audiences (see chapter 20). Due to the particularities of qualitative research, findings do not exist independently of their being written down. This is different for quantitative research or experimental research, where completed analyses can somewhat speak for themselves. A box of collected qualitative data remains a box of collected qualitative data without its written interpretation. Qualitative research is often evaluated on the strength of its presentation. Some traditions of qualitative inquiry, such as deep ethnography, depend on written thick descriptions, without which the research is wholly incomplete, even nonexistent. All of that practice journaling and writing memos (reflective and analytical) help develop writing skills integral to the presentation of the findings.

Remember that these are seven conceptual phases that operate in roughly this order but with a lot of meandering and recursivity throughout the process. This is very different from quantitative data analysis, which is conducted fairly linearly and processually (first you state a falsifiable research question with hypotheses, then you collect your data or acquire your data set, then you analyze the data, etc.). Things are a bit messier when conducting qualitative research. Embrace the chaos and confusion, and sort your way through the maze. Budget a lot of time for this process. Your research question might change in the middle of data collection. Don’t worry about that. The key to being nimble and flexible in qualitative research is to start thinking and continue thinking about your data, even as it is being collected. All seven phases can be started before all the data has been gathered. Data collection does not always precede data analysis. In some ways, “qualitative data collection is qualitative data analysis.… By integrating data collection and data analysis, instead of breaking them up into two distinct steps, we both enrich our insights and stave off anxiety. We all know the anxiety that builds when we put something off—the longer we put it off, the more anxious we get. If we treat data collection as this mass of work we must do before we can get started on the even bigger mass of work that is analysis, we set ourselves up for massive anxiety” (Rubin 2021:182–183; emphasis added).

The Coding Stage

A code is “a word or short phrase that symbolically assigns a summative, salient, essence-capturing, and/or evocative attribute for a portion of language-based or visual data” (Saldaña 2014:5). Codes can be applied to particular sections of or entire transcripts, documents, or even videos. For example, one might code a video taken of a preschooler trying to solve a puzzle as “puzzle,” or one could take the transcript of that video and highlight particular sections or portions as “arranging puzzle pieces” (a descriptive code) or “frustration” (a summative emotion-based code). If the preschooler happily shouts out, “I see it!” you can denote the code “I see it!” (this is an example of an in vivo, participant-created code). As one can see from even this short example, there are many different kinds of codes and many different strategies and techniques for coding, more of which will be discussed in detail in chapter 19. The point to remember is that coding is a rigorous systematic process—to some extent, you are always coding whenever you look at a person or try to make sense of a situation or event, but you rarely do this consciously. Coding is the process of naming what you are seeing and how you are simplifying the data so that you can make sense of it in a way that is consistent with your study and in a way that others can understand and follow and replicate. Another way of saying this is that a code is “a researcher-generated interpretation that symbolizes or translates data” (Vogt et al. 2014:13).

As with qualitative data analysis generally, coding is often done recursively, meaning that you do not merely take one pass through the data to create your codes. Saldaña (2014) differentiates first-cycle coding from second-cycle coding. The goal of first-cycle coding is to “tag” or identify what emerges as important codes. Note that I said emerges—you don’t always know from the beginning what will be an important aspect of the study or not, so the coding process is really the place for you to begin making the kinds of notes necessary for future analyses. In second-cycle coding, you will want to be much more focused—no longer gathering wholly new codes but synthesizing what you have into metacodes.

You might also conceive of the coding process in four parts (figure 18.1). First, identify a representative or diverse sample set of interview transcripts (or fieldnotes or other documents). This is the group you are going to use to get a sense of what might be emerging. In my own study of career obstacles to success among first-generation and working-class persons in sociology, I might select one interview from each career stage: a graduate student, a junior faculty member, a senior faculty member.

Figure 18.1. Coding Process in Four Stages

Second, code everything (“open coding”). See what emerges, and don’t limit yourself in any way. You will end up with a ton of codes, many more than you will end up with, but this is an excellent way to not foreclose an interesting finding too early in the analysis. Note the importance of starting with a sample of your collected data, because otherwise, open coding all your data is, frankly, impossible and counterproductive. You will just get stuck in the weeds.

Third, pare down your coding list. Where you may have begun with fifty (or more!) codes, you probably want no more than twenty remaining. Go back through the weeds and pull out everything that does not have the potential to bloom into a nicely shaped garden. Note that you should do this before tackling all of your data. Sometimes, however, you might need to rethink the sample you chose. Let’s say that the graduate student interview brought up some interesting gender issues that were pertinent to female-identifying sociologists, but both the junior and the senior faculty members identified as male. In that case, I might read through and open code at least one other interview transcript, perhaps a female-identifying senior faculty member, before paring down my list of codes.

This is also the time to create a codebook if you are using one, a master guide to the codes you are using, including examples (see Sample Codebooks 1 and 2). A codebook is simply a document that lists and describes the codes you are using. It is easy to forget what you meant the first time you penciled a coded notation next to a passage, so the codebook allows you to be clear and consistent with the use of your codes. There is not one correct way to create a codebook, but generally speaking, the codebook should include (1) the code (either name or identification number or both), (2) a description of what the code signifies and when and where it should be applied, and (3) an example of the code to help clarify (2). Listing all the codes down somewhere also allows you to organize and reorganize them, which can be part of the analytical process. It is possible that your twenty remaining codes can be neatly organized into five to seven master “themes.” Codebooks can and should develop as you recursively read through and code your collected material.[3]

Fourth, using the pared-down list of codes (or codebook), read through and code all the data. I know many qualitative researchers who work without a codebook, but it is still a good practice, especially for beginners. At the very least, read through your list of codes before you begin this “closed coding” step so that you can minimize the chance of missing a passage or section that needs to be coded. The final step is…to do it all again. Or, at least, do closed coding (step four) again. All of this takes a great deal of time, and you should plan accordingly.

Researcher Note

People often say that qualitative research takes a lot of time. Some say this because qualitative researchers often collect their own data. This part can be time consuming, but to me, it’s the analytical process that takes the most time. I usually read every transcript twice before starting to code, then it usually takes me six rounds of coding until I’m satisfied I’ve thoroughly coded everything. Even after the coding, it usually takes me a year to figure out how to put the analysis together into a coherent argument and to figure out what language to use. Just deciding what name to use for a particular group or idea can take months. Understanding this going in can be helpful so that you know to be patient with yourself.

—Jessi Streib, author of The Power of the Past and Privilege Lost 

Note that there is no magic in any of this, nor is there any single “right” way to code or any “correct” codes. What you see in the data will be prompted by your position as a researcher and your scholarly interests. Where the above codes on a preschooler solving a puzzle emerged from my own interest in puzzle solving, another researcher might focus on something wholly different. A scholar of linguistics, for example, may focus instead on the verbalizations made by the child during the discovery process, perhaps even noting particular vocalizations (incidence of grrrs and gritting of the teeth, for example). Your recording of the codes you used is the important part, as it allows other researchers to assess the reliability and validity of your analyses based on those codes. Chapter 19 will provide more details about the kinds of codes you might develop.

Saldaña (2014) lists seven “necessary personal attributes” for successful coding. To paraphrase, they are the following:

  1. Having (or practicing) good organizational skills
  2. Perseverance
  3. The ability and willingness to deal with ambiguity
  4. Flexibility
  5. Creativity, broadly understood, which includes “the ability to think visually, to think symbolically, to think in metaphors, and to think of as many ways as possible to approach a problem” (20)
  6. Commitment to being rigorously ethical
  7. Having an extensive vocabulary[4]

Writing Analytic Memos during/after Coding

Coding the data you have collected is only one aspect of analyzing it. Too many beginners have coded their data and then wondered what to do next. Coding is meant to help organize your data so that you can see it more clearly, but it is not itself an analysis. Thinking about the data, reviewing the coded data, and bringing in the previous literature (here is where you use your literature review and theory) to help make sense of what you have collected are all important aspects of data analysis. Analytic memos are notes you write to yourself about the data. They can be short (a single page or even a paragraph) or long (several pages). These memos can themselves be the subject of subsequent analytic memoing as part of the recursive process that is qualitative data analysis.

Short analytic memos are written about impressions you have about the data, what is emerging, and what might be of interest later on. You can write a short memo about a particular code, for example, and why this code seems important and where it might connect to previous literature. For example, I might write a paragraph about a “cultural capital” code that I use whenever a working-class sociologist says anything about “not fitting in” with their peers (e.g., not having the right accent or hairstyle or private school background). I could then write a little bit about Bourdieu, who originated the notion of cultural capital, and try to make some connections between his definition and how I am applying it here. I can also use the memo to raise questions or doubts I have about what I am seeing (e.g., Maybe the type of school belongs somewhere else? Is this really the right code?). Later on, I can incorporate some of this writing into the theory section of my final paper or article. Here are some types of things that might form the basis of a short memo: something you want to remember, something you noticed that was new or different, a reaction you had, a suspicion or hunch that you are developing, a pattern you are noticing, any inferences you are starting to draw. Rubin (2021) advises, “Always include some quotation or excerpt from your dataset…that set you off on this idea. It’s happened to me so many times—I’ll have a really strong reaction to a piece of data, write down some insight without the original quotation or context, and then [later] have no idea what I was talking about and have no way of recreating my insight because I can’t remember what piece of data made me think this way” (203).

All CAQDAS programs include spaces for writing, generating, and storing memos. You can link a memo to a particular transcript, for example. But you can just as easily keep a notebook at hand in which you write notes to yourself, if you prefer the more tactile approach. Drawing pictures that illustrate themes and patterns you are beginning to see also works. The point is to write early and write often, as these memos are the building blocks of your eventual final product (chapter 20).

In the next chapter (chapter 19), we will go a little deeper into codes and how to use them to identify patterns and themes in your data. This chapter has given you an idea of the process of data analysis, but there is much yet to learn about the elements of that process!

Qualitative Data-Analysis Samples

The following three passages are examples of how qualitative researchers describe their data-analysis practices. The first, by Harvey, is a useful example of how data analysis can shift the original research questions. The second example, by Thai, shows multiple stages of coding and how these stages build upward to conceptual themes and theorization. The third example, by Lamont, shows a masterful use of a variety of techniques to generate theory.

Example 1: “Look Someone in the Eye” by Peter Francis Harvey (2022)

I entered the field intending to study gender socialization. However, through the iterative process of writing fieldnotes, rereading them, conducting further research, and writing extensive analytic memos, my focus shifted. Abductive analysis encourages the search for unexpected findings in light of existing literature. In my early data collection, fieldnotes, and memoing, classed comportment was unmistakably prominent in both schools. I was surprised by how pervasive this bodily socialization proved to be and further surprised by the discrepancies between the two schools.…I returned to the literature to compare my empirical findings.…To further clarify patterns within my data and to aid the search for disconfirming evidence, I constructed data matrices (Miles, Huberman, and Saldaña 2013). While rereading my fieldnotes, I used ATLAS.ti to code and recode key sections (Miles et al. 2013), punctuating this process with additional analytic memos. (2022:1420)

Example 2:” Policing and Symbolic Control” by Mai Thai (2022)

Conventional to qualitative research, my analyses iterated between theory development and testing. Analytical memos were written throughout the data collection, and my analyses using MAXQDA software helped me develop, confirm, and challenge specific themes.…My early coding scheme which included descriptive codes (e.g., uniform inspection, college trips) and verbatim codes of the common terms used by field site participants (e.g., “never quit,” “ghetto”) led me to conceptualize valorization. Later analyses developed into thematic codes (e.g., good citizens, criminality) and process codes (e.g., valorization, criminalization), which helped refine my arguments. (2022:1191–1192)

Example 3: The Dignity of Working Men by Michèle Lamont (2000)

To analyze the interviews, I summarized them in a 13-page document including socio-demographic information as well as information on the boundary work of the interviewees. To facilitate comparisons, I noted some of the respondents’ answers on grids and summarized these on matrix displays using techniques suggested by Miles and Huberman for standardizing and processing qualitative data. Interviews were also analyzed one by one, with a focus on the criteria that each respondent mobilized for the evaluation of status. Moreover, I located each interviewee on several five-point scales pertaining to the most significant dimensions they used to evaluate status. I also compared individual interviewees with respondents who were similar to and different from them, both within and across samples. Finally, I classified all the transcripts thematically to perform a systematic analysis of all the important themes that appear in the interviews, approaching the latter as data against which theoretical questions can be explored. (2000:256–257)

Sample Codebook 1

This is an abridged version of the codebook used to analyze qualitative responses to a question about how class affects careers in sociology. Note the use of numbers to organize the flow, supplemented by highlighting techniques (e.g., bolding) and subcoding numbers.

01. CAPS: Any reference to “capitals” in the response, even if the specific words are not used

01.1: cultural capital
01.2: social capital
01.3: economic capital

(can be mixed: “0.12”= both cultural and asocial capital; “0.23”= both social and economic)

01. CAPS: a reference to “capitals” in which the specific words are used [bold: thus, 01.23 means that both social capital and economic capital were mentioned specifically

02. DEBT: discussion of debt

02.1: mentions personal issues around debt
02.2: discusses debt but in the abstract only (e.g., “people with debt have to worry”)

03. FirstP: how the response is positioned

03.1: neutral or abstract response
03.2: discusses self (“I”)
03.3: discusses others (“they”)

Sample Coded Passage:

“I was really hurt when I didn’t get that scholarship.  It was going to cost me thousands of dollars to stay in the program, and I was going to have to borrow all of it.  My faculty advisor wasn’t helpful at all.  They told 03.2
me not to worry about it, because it wasn’t really that much money!  I almost fell over when they said that!  Like, do they not understand what it’s like to be poor?  I just felt so isolated then.  I was on my own. 02.1. 01.3
I couldn’t talk to anyone about it, because no one else seemed to worry about it. Talk about economic capital!” 01.3

*Question: What other codes jump out to you here? Shouldn’t there be a code for feelings of loneliness or alienation? What about an emotions code?

Sample Codebook 2

CODE DEFINITION WHEN TO APPLY IN VIVO EXAMPLE
ALIENATION Feeling out of place in academia Any time uses the word alienation or impostor syndrome or feeling out of place “I was so lonely in graduate school. It was an alienating experience.”
CULTURAL CAPITAL Knowledge or other cultural resources that affect success in academia When “cultural capital” is used but also when knowledge or lack of knowledge about cultural things are discussed “We went to a fancy restaurant after my job interview and I was paralyzed with fear because I did not know which fork I was supposed to be using. Yikes!”
SOCIAL CAPITAL Social networks that advance success in academia When “social capital” is used but also when social networks are discussed or knowing the right people “I didn’t know who to turn to. It seemed like everyone else had parents who could help them and I didn’t know anyone else who had ever even gone to college!”

This is an example that uses "word" categories only, with descriptions and examples for each code

Further Readings

Elliott, Victoria. 2018. “Thinking about the Coding Process in Qualitative Analysis.” Qualitative Report 23(11):2850–2861. Address common questions those new to coding ask, including the use of “counting” and how to shore up reliability.

Friese, Susanne. 2019. Qualitative Data Analysis with ATLAS.ti. 3rd ed. A good guide to ATLAS.ti, arguably the most used CAQDAS program. Organized around a series of “skills training” to get you up to speed.

Jackson, Kristi, and Pat Bazeley. 2019. Qualitative Data Analysis with NVIVO. 3rd ed. Thousand Oaks, CA: SAGE. If you want to use the CAQDAS program NVivo, this is a good affordable guide to doing so. Includes copious examples, figures, and graphic displays.

LeCompte, Margaret D. 2000. “Analyzing Qualitative Data.” Theory into Practice 39(3):146–154. A very practical and readable guide to the entire coding process, with particular applicability to educational program evaluation/policy analysis.

Miles, Matthew B., and A. Michael Huberman. 1994. Qualitative Data Analysis: An Expanded Sourcebook. 2nd ed. Thousand Oaks, CA: SAGE. A classic reference on coding. May now be superseded by Miles, Huberman, and Saldaña (2019).

Miles, Matthew B., A. Michael Huberman, and Johnny Saldaña. 2019. Qualitative Data Analysis: A Methods Sourcebook. 4th ed. Thousand Oaks, CA; SAGE. A practical methods sourcebook for all qualitative researchers at all levels using visual displays and examples. Highly recommended.

Saldaña, Johnny. 2014. The Coding Manual for Qualitative Researchers. 2nd ed. Thousand Oaks, CA: SAGE. The most complete and comprehensive compendium of coding techniques out there. Essential reference.

Silver, Christina. 2014. Using Software in Qualitative Research: A Step-by-Step Guide. 2nd ed. Thousand Oaks, CA; SAGE. If you are unsure which CAQDAS program you are interested in using or want to compare the features and usages of each, this guidebook is quite helpful.

Vogt, W. Paul, Elaine R. Vogt, Diane C. Gardner, and Lynne M. Haeffele2014. Selecting the Right Analyses for Your Data: Quantitative, Qualitative, and Mixed Methods. New York: The Guilford Press. User-friendly reference guide to all forms of analysis; may be particularly helpful for those engaged in mixed-methods research.


  1. When you have collected content (historical, media, archival) that interests you because of its communicative aspect, content analysis (chapter 17) is appropriate. Whereas content analysis is both a research method and a tool of analysis, coding is a tool of analysis that can be used for all kinds of data to address any number of questions. Content analysis itself includes coding.
  2. Scientific research, whether quantitative or qualitative, demands we keep an open mind as we conduct our research, that we are “neutral” regarding what is actually there to find. Students who are trained in non-research-based disciplines such as the arts or philosophy or who are (admirably) focused on pursuing social justice can too easily fall into the trap of thinking their job is to “demonstrate” something through the data. That is not the job of a researcher. The job of a researcher is to present (and interpret) findings—things “out there” (even if inside other people’s hearts and minds). One helpful suggestion: when formulating your research question, if you already know the answer (or think you do), scrap that research. Ask a question to which you do not yet know the answer.
  3. Codebooks are particularly useful for collaborative research so that codes are applied and interpreted similarly. If you are working with a team of researchers, you will want to take extra care that your codebooks remain in synch and that any refinements or developments are shared with fellow coders. You will also want to conduct an “intercoder reliability” check, testing whether the codes you have developed are clearly identifiable so that multiple coders are using them similarly. Messy, unclear codes that can be interpreted differently by different coders will make it much more difficult to identify patterns across the data.
  4. Note that this is important for creating/denoting new codes. The vocabulary does not need to be in English or any particular language. You can use whatever words or phrases capture what it is you are seeing in the data.
definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.