Stacking a dataset using R
Stacking is process where the data is transformed, and variables (columns) can be rearranged to act as cases (rows). This is sometimes called hierarchical data.
For example, if you ask each respondent about 5 different occasions, then you may like to stack the data so that each row in the new dataset is an occasion. In this case, if you had 500 rows in your original dataset, you'll end up with 2,500 rows in the stacked data. Another example, is where the same rating scale is applied to different brands and you want to analyse with each case representing a different brand (rather than a different respondent).
You can stack a dataset using R. R pulls the data into R Studio, reshapes the data, and then stores the data as a new dataset.
We give a generic worked example here:
Stacking by writing out the full alias structure
There's a few bit to change in the R code below, highlighted in blue for you. Remember, you'll need to install the packages tidyr and stringr as well.
variables_to_retain <- c("retain_1", "retain_2")
variables_to_stack <- list(
'q1' = c('q1_1', 'q1_2', 'q1_3', 'q1_4', 'q1_5', 'q1_6'),
'q2' = c('q2_1', 'q2_2', 'q2_3', 'q2_4', 'q2_5', 'q2_6'),
'q3' = c('q3_1', 'q3_2', 'q3_3', 'q3_4', 'q3_5', 'q3_6'),
'q4' = c('q4_1', 'q4_2', 'q4_3', 'q4_4', 'q4_5', 'q4_6'),
'q5' = c('q5_1', 'q5_2', 'q5_3', 'q5_4', 'q5_5', 'q5_6')
spec <- tibble(
.name = unname(unlist(variables_to_stack)),
.value = rep(names(variables_to_stack), lengths(variables_to_stack))
mutate(occasion = row_number())
keep_vars <- c(variables_to_retain, spec$.name)
stacked <- ds[keep_vars] %>%
as.data.frame(force = TRUE) %>%
ds_stacked <- newDataset(stacked, name = "new Stacked dataset")
Some notes on the changes - these are all aliases unless noted otherwise. In the generic example above, there were 6 brands being considered, hence the pattern in the original aliases.
- variables_to_retain - all existing variables will be dropped in the new dataset. In the new dataset, you may like to crosstabulate by other variables (such as demographics) so include a list of these aliases here to retain in the new dataset. It's also a very good idea to retain an unique respondent ID variable (if you want to do any future linking or matching to your unstacked dataset).
- variables_to_stack - is a list of the variables you want to stacked, specified by lists of "strings". In the above generic example, 6 variables pertaining to question 1 are being collapsed into one stacked variable called 'q1'. Likewise, 6 variables pertaining to question 2 are being collapsed into one variable 'q2'.
- "new Stacked dataset" - you have the option at this point to give your new dataset a name.
In the above, the resulting stacked dataset will have 6 times as many rows as the original dataset.
The Occasion Variable:
The code above creates a variable in the new dataset with alias = occasion. You can then convert that variable to a categorical and used to indicate the occasion from the unstacked data. For example, value = Brand 1, 2 = Brand 2, 3 = Brand 3, etc.
type(ds_stacked$occasion) <- "categorical"
names(categories(ds_stacked$occasion)) <- c("Brand A", "Brand B", "Brand C", "Brand D", "Brand E", "Brand F", "No Data")