Coding spontaneous brand awareness questions using R

Open-ended text questions require coding (ie: categorization) in order for them to be analyzed quantitatively. A common metric is spontaneous brand awareness, where respondents have to type in all the names of the brands they recall.

Best practice is that there is a separate input box for each brand. That way, when it comes time to do the coding, there is a simple one-to-one coding of the response into a category. If the brands were altogether in one text field, it makes coding much harder because there is a one-to-many relationship, and each of the responses is likely to be unique. Whereas, with one-to-one coding, responses are likely to overlap, and the coding process is much faster. The article below assumes a one-to-one text capture and coding process.

Each of the variables with the open-ended text should be stored as text variables in Crunch. If they are not, be sure to set them as text variables first.

The process is summarized as follows:

Identify the text variables, by alias
Export into a CSV a list of all the unique responses
Categorized/codify the responses in the CSV
Feed the CSV back into the R script to generate categorical variables

The first step is to log in and access your dataset with R In addition, you'll also need to make sure you load up the following packages:

library(crunch)
library(dplyr)
library(purrr)

In R, you then nominate the list of variables using their aliases. The example below has 3 variables, but it can be as many as you like.

text_responses <- list(ds$q1a_1, ds$q1a_2, ds$q1a_3)

The next piece of code generates the csv file. It will be exported to your working directory (as you've set it to be in R Studio). You can change the name of the output csv as per the blue text below.

unique_responses <- unique(unlist(lapply(text_responses, as.vector)))
output <- as.data.frame(unique_responses)
output$code <- ""
write.csv(output, file = "raw_codes.csv")

From here, you open the CSV and you can type in the category/code you want in the "code" column on the right. A useful suggestion is to sort alphabetically the unique_responses to aid in the coding. When done, save your CSV and read it into R, as per the below.

case_data <- read.csv("raw_codes_done.csv", stringsAsFactors = FALSE)

And then run the following code to generate each coded categorical variable.

for (i in 1:length(text_responses)) {
 coded_alias <- paste0(alias(text_responses[[i]]),"_coded")
 input_var <- name(text_responses[[i]])
 output_name <- paste(name(text_responses[[i]]), "Coded")
 ds[[coded_alias]] <- makeCaseVariable(
 cases = case_data %>%
 group_by(code) %>%
 summarize(expression = list(text_responses[[i]] %in% unique_responses)) %>%
 select(name = code, expression) %>%
 pmap(base::list), 
 name = output_name
 )
}

Finally, you can then derive a new categorical array if you wish, using all the text variables you've coded. From there, if you want a net count of mentions for each brand, you can transform it using Transforming different mentions of coded text data into a single multiple response variable in R

Help Center

Related articles