How to create a new 'top 2 box' summary using R
It's common in research to analyze a scale question by considering the "top boxes" (or "bottom boxes") scores. If you have a 1–5 agreement scale (i.e., Likert scale) then the "top 2 box" might be Strongly Agree + Agree. In the following example, the "top 2 box" takes the Almost always and Quite often categories:
Crunch offers two ways to condense the scale into any subset of categories:
- subtotals (sometimes called nets by researchers) displayed alongside all of the categories (that's the case in the example above), or
- any number of derived multiple response variables that preserve only the selected categories.
The latter is useful when you want to work with the items (e.g., in a cross tab), summarized as their ‘top box’ score, alongside other variables.
This article describes how to create a 'top 2 box' variable (the second option) using R. It's very likely in the future that this will be achieved more effortlessly with Crunch Automation and/or in the web-app itself.
Do I need to keep my original variables intact or bind them up?
You have two choices with making a 'top 2 box' with Crunch. You can either:
- "use up" the contributing variables to make a new question, or
- keep your existing variables intact. In Crunch this is called deriving a new variable.
In Crunch's web-app, you can only do the first option (at the time this article was written), and only with separate variables (by using the Create Variable + button in the very bottom left and choose Multiple Response). Your contributing variables will need to be separate—they cannot be bound up already in a Categorical Array. If they are, then you'll need to first separate the variables from the Categorical Array:
However, if you want to use the second option to create a 'top 2 box' variable AND keep your original variables (in the categorical array) as they are, then you must use R code.
Using R to derive a new 'top box' variable from a categorical array
To get started:
- Log in and load your dataset.
- Figure out which are the variables you are going to use. If you're using a categorical array of variables, then you just need the alias so R knows what to work with.
- The following example uses the Brand Health sample dataset and a variable called Newspaper Readership. You can see the alias newspaper_freq on the right-hand side:
- Use the following command to create a 'top 2 box' variable summary from the grid, counting the responses for Almost always and Quite often:
- You do not need to write the following command as two lines, but it can be easier to have a separate line so you can specify what is going to "count" in the top 2 box.
- Because R syntax requires precise use of both capitalization, spaces, and other punctuation marks, it's recommended that you open the Properties > Edit of your variable and then copy/paste the labels, to prevent any errors.
- The c() function is the combine function: it combines the statements in the brackets as an array, which are then stored as count_me.
count_me <- c("Almost always (at least 3 out of 4 issues)", "Quite often (at least 1 out of 4 issues)") ds$reader_t2b <- deriveArray(ds$newspaper_freq, name='Newspaper Readership (Often) Top 2 Box', selections = count_me)
Explanaton of command
In the second line of the above command, deriveArray() is created and stores as a new variable called reader_t2b within the dataset (ds). Below describes each of the elements:
- The first argument is the dataset (stored as ds) and the alias of the categorical array to use (newspaper_freq).
- name='Newspaper Readership (Often) Top 2 Box'
- Becomes the name of the new array.
- selections = count_me
- refers to what is set up on the previous line. An alternative approach is not to use the labels but the values instead: it could be selections = c(1, 2), which is a selection of the first and second values. However, Crunch recommends using the labels, as there is less room for error (especially if something about the source variable changes).
After you finish, a new variable appears at the bottom of your Variable Organizer (Accordion) for you to use.
Other approaches with R
There are other approaches to doing this with R code, as noted on the deriveArray() page. For example, if you don't have a categorical array setup, and instead want to derive a top box variable from other discreet variables, then you can just specify those variables (using their aliases).