Open-ended text questions require coding (ie: categorization) in order for them to be analyzed quantitatively. A common metric is spontaneous brand awareness, where respondents have to type in all the names of the brands they recall.
Best practice is that there is a separate input box for each brand. That way, when it comes time to do the coding, there is a simple one-to-one coding of the response into a category. If the brands were altogether in one text field, it makes coding much harder because there is a one-to-many relationship, and each of the responses is likely to be unique. Whereas, with one-to-one coding, responses are likely to overlap, and the coding process is much faster. The article below assumes a one-to-one text capture and coding process.
Each of the variables with the open-ended text should be stored as text variables in Crunch. If they are not, be sure to set them as text variables first.
The process is summarized as follows:
- Identify the text variables, by alias
- Export into a CSV a list of all the unique responses
- Categorized/codify the responses in the CSV
- Feed the CSV back into the R script to generate categorical variables
The first step is to log in and access your dataset with R In addition, you'll also need to make sure you load up the following packages:
library(crunch)
library(dplyr)
library(purrr)
In R, you then nominate the list of variables using their aliases. The example below has 3 variables, but it can be as many as you like.
text_responses <- list(ds$q1a_1, ds$q1a_2, ds$q1a_3)
The next piece of code generates the csv file. It will be exported to your working directory (as you've set it to be in R Studio). You can change the name of the output csv as per the blue text below.
unique_responses <- unique(unlist(lapply(text_responses, as.vector)))
output <- as.data.frame(unique_responses)
output$code <- ""
write.csv(output, file = "raw_codes.csv")
From here, you open the CSV and you can type in the category/code you want in the "code" column on the right. A useful suggestion is to sort alphabetically the unique_responses to aid in the coding. When done, save your CSV and read it into R, as per the below.
case_data <- read.csv("raw_codes_done.csv", stringsAsFactors = FALSE)
And then run the following code to generate each coded categorical variable.
for (i in 1:length(text_responses)) {
coded_alias <- paste0(alias(text_responses[[i]]),"_coded")
input_var <- name(text_responses[[i]])
output_name <- paste(name(text_responses[[i]]), "Coded")
ds[[coded_alias]] <- makeCaseVariable(
cases = case_data %>%
group_by(code) %>%
summarize(expression = list(text_responses[[i]] %in% unique_responses)) %>%
select(name = code, expression) %>%
pmap(base::list),
name = output_name
)
}
Finally, you can then derive a new categorical array if you wish, using all the text variables you've coded. From there, if you want a net count of mentions for each brand, you can transform it using Transforming different mentions of coded text data into a single multiple response variable in R