You can go through each of your variables and describe (or redescribe) them one by one using R. However, there are certain circumstances where it may be prudent to adopt a more automatic approach:
- You have lots of variables, and you want to do things quickly
- You want to hand the responsibility to define the variable metadata to someone who is not an R use
Below is an approach we've developed that can help you main metadata cleaning/organizing using a CSV. Many researchers and DP departments understand how to work CSV's efficiently to change variable names, use formulae, bulk find/replace text, and bulk copy text where necessary.
There are two aspects to use:
- Running the R code (in R studio)
- Tidying up all your variable information using the CSV that's generated and re-imported back into the code
Rather than detailing the code on this page, it's best described the .R file (using the comments that are prefixed with a # sign). The code you can download here found here: Variable Setup Script v16.R
The majority of your work using this code effectively will be the set up of the CSV that's generated. This is explained below (a worked example to follow shortly).
The R code
Some guidance:
- You first need to load log in and access your dataset with R
- You may like to change and manger the names of the .csv files that are generated
output_label <- "variable_organizing.csv"
meta_organize <- read.csv("variable_organizing_done.csv", stringsAsFactors = FALSE)
Rather than running all the code at once (unless you're super confident), it's best to work down each slab of code in chunks, to perform the various functions, skipping over the bits that you don't need or want to run.
The CSV
The CSV that is produced has 4 columns of data already filled in, and some empty columns. You can do work in any of the columns.
Working left to right along the columns:
- display.name
- This is the variable name in Crunch (the title of the variable you see in the Variable Orgaziner in the web app - you can think of it as a variable title or headline).
- Crunch's name is not the same function as the SPSS variable name. Crunch's equivalent for this is the alias (last column), which it uses for code, such as R, and for when it merges data.
- By default, Crunch copies over the SPSS variable name as both the variable alias and name in Crunch. Typically, though, you will want to tidy this to be more suitable for analysis/reporting.
- For example, the SPSS variable name might be age_categories and this comes into Crunch as the same for the name and alias. But you might change the name to be just Age, and unless you have a particular reason otherwise, you should just leave the alias as it is.
- description
- This is SPSS variable label and is it like the subtitle on a variable card. It can be optionally shown on tables and exports.
- Typically, researchers have the question text (from the questionnaire) as the description.
- Optionally, you can delete this, if you don't want a description at all. This is not common practice.
- Importantly, if you have a categorical array or multiple response variable being created, then the description becomes the label of each of the sub-variables.
- array.alias
- This is how you specify the grouping of variables for multiple response (multiple check-box) or categorical arrays (matrix style questions)
- All you need to do is repeat an alias for the variable-to-be-created in each of the rows of this column
- For example, if you have 6 variables (rows in the CSV) that are supposed to be part of the one variable, then you could put Q1 in each of the 6 rows. Then when the R code reads the CSV it will know that all the variables with Q1 belong together.
- array.name
- This becomes the new name of the array, and is the title, much like the name on a single-variable question. You should copy it to be the same for each variable in the array.
- array.description
- This becomes the description of the new grouped variable
- selected.values
- If you the array to be a multiple response variable, then put the values in this column, separated with a comma. Most commonly, multiple response variables are arrays of 1's and 0's (and missing data). So that you will put a 1 in this case for against the all the sub variables of the array.
- folder
- This is the folder you want to put the variable in. For multi-variable questions, then you repeat the same folder for each line (as is the case for array.grouping, array.name and array.description)
- alias
- The alias that is assigned to each variable on import (equivalent to the SPSS variable name). As noted above, you probably don't need to change it, unless you're doing some data merging where you need to align variables.