Setting up a dataset with R
Overview
The following is a resource guide for Data Processing departments.
Basic setup
- Install R Studio and Crunch packages.
- Load your dataset.
- File your dataset into the right project folder
Optional setup
- Load your metadata into an Excel spreadsheet
- Convert XML to an Excel file
Upfront cleaning
Checking and changing variable properties
- Change the variables Names
- A descriptive title that summaries what the variable is (e.g., 'Gender')
- Give variable Descriptions (optional but recommended)
- Typically, this includes the actual wording of the question (e.g., 'Q1: Are you male or female?')
- Change the Aliases (optional)
- Correct the Variable Type
- Change any variables that are numeric to be categorical
- Change any categorical variables to be numeric
- Set any relevant variables as date/time
- Make any variables that are supposed to be weights available as a weight
- Make any variables that are supposed to be filters available as filters
- Correct the values for any variables for which you want the mean shown, including:
- NPS recoding, scale reverse for rating scale questions or other recodings
- capping (if relevant)
- setting outliers to missing values (if relevant)
Creating multi-variable sets
- Define Multiple Response questions
- Derive new variable with an option to hide the original contributing source variables
- Define Categorical Arrays
- Derive new variable with an option to hide the original contributing source variables
- Split Categorical Array into component variables
- Split Multiple Response questions into component variables
Creating more variables (optional)
- Banded time variables (as an approach to time-series smoothing)
- Improved smoothing techniques will become available soon in the Crunch web app
- Any of the following:
- filters
- interaction variables
- weights
- banners (called Multitables in Crunch)
- other variables you may need
- Standardize variables
Setting the base
- Defining Missing Values for each variable
- For example, include or exclude 'Don’t Knows' from a scale
- Rebasing questions based on other questions
- Fixing survey skips
- Basing to another question’s response
Changing variable summary information
- Set subtotals (NETs)
- Merging categories
- Derive new variable with option to hide the original contributing source variables
- Creating banded categories ("Buckets")
- From a numeric variable
- From a categorical variable
- Derive a new variable with an option to hide the original contributing source variables
Housekeeping
- Setting up the Variable Organizer ("Accordion")
- Make folders and sub-folders in the variable organizer
- Put variables into folders
- Change or set the order of variables within each folder
- Hiding variables that you no longer need to see and removing clutter, such as:
- Variables that were used to define the categorical arrays and/or MR questions
Final cleaning
- Removing cases you don’t want in the dataset (aka, "Exclusions"), such as:
- Bad sample (e.g., not meet screener)
- Outliers
- Speedsters
- Flat-liners/straight-liners
Analysis
Exporting
- Exporting a summary CSV of your metadata (alias, labels, folder)