Setting up a Dataset: How Data Should Look in Crunch

This article is part of The Definitive Guide to Importing and Preparing Data.

End users will find Crunch most beneficial when they have a 'clean and tidy' dataset with which to work. Having 'clean' data is important, as you remove incomplete respondents, straight-liners, or respondents who fail other important survey metrics. The 'tidy' part relates to the organization of the data in Crunch and is particularly important as it enables researchers to more effectively self-serve when they can quickly find the data they are looking for.

Crunch is designed to deal specifically with survey questionnaire structures, such as grids. To use Crunch's features most effectively, those setting up datasets should seek to structure the dataset like a questionnaire, rather than a tab book. The basic principle is as follows: aim to set up the dataset so the researcher can easily retrieve the information they need.

If you are setting up a dataset from an SPSS file (.sav), please be aware that sometimes this type of file can lose some of the metadata that is important for analysis. Additionally, a direct survey importer may not set things up exactly as you would like them. During data collection, questions may be set up as multiple-response questions and categorical arrays (i.e., typically scales structured like a grid). When imported into Crunch, the options may appear as individual variables in the variable sidebar and need to be recombined into a multiple-response variable or categorical array in a Crunch dataset. Additionally, updating the names of the variables to more descriptive words for the question can help the end user work more efficiently in Crunch.

Setting up a dataset like a questionnaire includes:

Succinct summary variable names (i.e., titles on variable cards, acting as easy-to-understand headlines)
Variable descriptions which include the questionnaire wording, with the question numbering optional
Create multiple response and categorical array variables (and hide the individual sub-variables)
Organize the variables in the Variable Sidebar (left-side panel) with folders that correlate to the questionnaire sections

Detailed Tips

The tips below fall under the Setup phase in The Definitive Guide of Importing and Preparing Data.

Utilize Folders: Follow the structure of the questionnaire for creating the initial folders

Avoid long lists of variables in the Variable Sidebar by utilizing folders.
Use the questionnaire breaks as the default (e.g., "Screener" section, "Awareness" section, "KPI" section), unless researchers ask for something otherwise.
Sub-folders (i.e., folders within folders) can be used to organize even further.
Once you are finished organizing the variables into folders, confirm that the overall Variable Sidebar makes sense.
- For example, a "Weight" variable doesn’t belong in a Screener section.
- Tip: You can use the “Organize variables” section of the Dataset Properties menu to organize variables into folders and quickly change the variable names and descriptions.

Succinct Variable Names: Variable names should be accurate without being long-winded

For example: "7032 - AGE" and "7132 - AGE VARIABLE" may not be very useful as titles. Alternatively, you could rename them "All Age Categories" and "Age - Young vs. Old".
- Note:
  1. You could create subtotals on the "All Age Categories" variable so that you may only need one age variable.
  2. End users can also make additional variables using the Combine Categories functionality.

Sentence Case: Variable names should be sentence case

It is easier to read words in sentence case at a glance when viewing and expanding the folders in the Variable Sidebar.

Question Numbers: Include the question number in the description rather than the title

Generally, it is not necessary to include the question numbering (e.g., "Q4", "Q3004") in the variable name, for several reasons:

When building and saving analysis for reporting, generally having the question number on the title of the chart/graph is not needed.
When the question number is included in the description with the full question text, it more closely resembles how it was shown in the questionnaire.
End users can still use the search bar at the top of the page to find a variable by question number as the search also includes descriptions.
- Note: Aliases are not considered when searching in Crunch, only the names, descriptions, and category labels. For this reason, including question numbers in the description is a simple way to make them accessible but not prominent in the Variable Sidebar.

Scale Questions as Categorical Arrays: The norm for all scale-type questions should be set as a Categorical Array

The first step in setting up a dataset, is to atone for the loss of metadata in an SPSS file. In this case, the metadata that is lost is that these variables are grouped as a single categorical array. See: Group Variables into Arrays
Rather than having a long list of the individual array questions in the Variable Sidebar, combining them into one array variable consolidates the list in the Sidebar while allowing the end user to compare across the subvariables in a grid. The user is also able to expand the array and look at the individual subvariables independently if needed.
When a researcher is analyzing an array crossed with another variable, Crunch will automatically give the user the option to isolate the array by subvariable (the Tabs feature).
- Tip: Set a concise name and sensible description for the array, then set short subvariable labels for the the individual items in the array.

Clear Category Values: Remove values for categorical questions that have no intrinsic ordering to the categories (e.g., “Which region do you live in”)

It can be helpful to remove the category values from nominal/categorical variables so that a mean is not calculated when producing tables. These means could be meaningless when shown on a variable that is not a true scale question.
- Note: We also suggest removing the values on scales if means are not likely to be used in the analysis.

Help Center

Detailed Tips

Related articles