Stacking a dataset using R

Stacking is a process where the data is transformed, and variables (columns) can be rearranged to act as cases (rows). This is sometimes called hierarchical data.

For example, if you ask each respondent about 5 different occasions, then you may like to stack the data so that each row in the new dataset is an occasion. In this case, if you had 500 rows in your original dataset, you'll end up with 2,500 rows in the stacked data. Another example, is where the same rating scale is applied to different brands and you want to analyse with each case representing a different brand (rather than a different respondent).

You can stack a dataset using R.

R pulls the data into R Studio, reshapes the data, and then stores the data as a new dataset.

We give a generic worked example here: Stacking restructure Crunch.R

Stacking by writing out the full alias structure

There's a few bit to change in the R code below, highlighted in blue for you. Remember, you'll need to install the packages at the top of this script.

library(tidyr)
library(crunch)
library(stringr)
library(dplyr)
library(purrr)
login()

ds <- loadDataset("your_URL")

new_ds_name <- "New stacked dataset"

variables_to_retain <- c("retain_1", "retain_2")

variables_to_stack <- list(
 'q1' = c('q1_1', 'q1_2', 'q1_3', 'q1_4', 'q1_5', 'q1_6'),
 'q2' = c('q2_1', 'q2_2', 'q2_3', 'q2_4', 'q2_5', 'q2_6'),
 'q3' = c('q3_1', 'q3_2', 'q3_3', 'q3_4', 'q3_5', 'q3_6'),
 'q4' = c('q4_1', 'q4_2', 'q4_3', 'q4_4', 'q4_5', 'q4_6'),
 'q5' = c('q5_1', 'q5_2', 'q5_3', 'q5_4', 'q5_5', 'q5_6')
 )

occasion_labels <- c("occasion 1", "occasion 2", "occasion 3", "occasion 4", "occasion 5", "occasion 6")
#

And then run the full script as per this .R file: Stacking restructure Crunch Dec 2021.R

In the generic example above, there were 6 occasions being considered, hence the pattern in the original aliases, and nominating 6 occasion labels.

variables_to_retain - all existing variables will be dropped in the new dataset. In the new dataset, you may like to crosstabulate by other variables (such as demographics) so include a list of these aliases here to retain in the new dataset. It's also a very good idea to retain an unique respondent ID variable (if you want to do any future linking or matching to your unstacked dataset).
variables_to_stack - is a list of the variables you want to stacked, specified by lists of "strings". In the above generic example, 6 variables pertaining to question 1 are being collapsed into one stacked variable called 'q1'. Likewise, 6 variables pertaining to question 2 are being collapsed into one variable 'q2'.
"new Stacked dataset" - you have the option at this point to give your new dataset a name - later in the R script.

In the above, the resulting stacked dataset will have 6 times as many rows as the original dataset.

Tip: if you want to retain all the variables except those being stacked, you can use this line upfront:

variables_to_retain <- setdiff(aliases(allVariables(ds)), unlist(variables_to_stack))

Stacking restructure Crunch Dec 2021.R
5 KB Download

Help Center

Stacking by writing out the full alias structure

Related articles