Before you start splicing and dicing your data, you need to be sure that you're working with a tidy dataset.
The image on the left not a tidy dataset - it's a raw datafile that has been uploaded into Crunch without any additional work. On the right, that is what you should be getting, a tidy dataset. It is the same data, except:
- the variables have nice titles
- the variables are organized into folders
- multiple response and matrix/grid questions have been grouped together as one variable (this is not visible in image below)
A typical convention is that folders reflect the sections in the questionnaire. You can have folders withing folders. You can have as many folders as you like.
How do I get a tidy dataset?
If you have a DP department setting up the dataset for you, you should give them dataset specs. This is instead of table specs. We provide guidance below as to what you should include the dataset specs below, including a template Excel.
If you're looking to DIY dataset setup, then you should refer to the Definitive Guide for Uploading and Preparing Data. It is recommended that you first work through this orientation course using a tidy dataset (which can be provided for you).
What goes into a dataset spec?
Crunch enables researchers to do most things themselves, including making new variables and tidying data. On the other hand, the better the initial dataset setup, the better your experience will be.
What exactly you will do versus the DP department - depends on the relationship in your organization. But we offer the following as a guide:
Remember, the setup is not a one-pass-opportunity:
- Your DP can make further manipulations to the dataset
- You can make things within the app yourself - such as nets (subtotals) on question categories, filters, weights, and other variables.
Note: As per the image above, you generally don't need DP to make filters for you. You make the table you want and then filter in Crunch (with simple drag-drop, as you'll see).
Key elements of a dataset to ensure are set up
The following is a breakdown of what you see in Crunch. These things are called metadata - because they include information about the data.
Title - this should be something pithy to describe the variable (no more than a few words). Movie Rankings is a reasonable title. The subsequent variable "Thank you! You are.... " should be relabelled "Closing Message".
Folder - these are expandable organizers in bold text in the left-hand panel. You can have folders within folders (as many levels as you like). You can also put variables into a hidden folder (for example, the Closing Message is not important to the analysis and you can ask for that to be hidden).
Description - this provides more detailed information about the variable. Typically, it is the wording of the question.
Notes - these are other notes on the variable. A common use of notes is to describe variable base (eg: "Base: Amongst all respondents").
Alias - all variables have an alias, and this is only used for data processing purposes - so you don't need to do anything with them. You can toggle off the aliases being shown in your settings if they are the distraction.
Variable Type - notice that Movie Rankings is a single variable in the above as is reflected in the Variable Sidebar (panel on the left). If they were 7 distinct variables (one for each movie), then this is an incorrect setup. Movie Rankings is a categorical array - how it appears in the questionnaire is generally how it should be set up in Crunch (as a grid).
Data specs checklist
We recommend working through the following checklist, based on your questionnaire and study, so that you have the optimal setup for your dataset from the start.
- List the multiple response questions within your questionnaire and their categories
- This way DP will ensure that all the variables involved in that questions are collated together.
- This includes multiple check-box questions (which Crunch calls Multiple Response) and scale-type grids (which Crunch calls Categorical Arrays)
- List of any questions that may need to be rebased
- For instance, a certain filter applied to a question based on a skip or an answer to a question prior.
- List any scale-type questions for which:
- You need summary tables for top 2 box, bottom 2 box etc
- You need the mean on a particular scale (eg: -5 to +5 or 0 to +10?)
- You need the questions banded up in a particular way (0-2, 3-4, 5-6, 7-10)
- List of the questions that require NPS recoding or banding
- For example, likelihood to recommend questions that have a scale (0-10 typically)
- List the scale/numeric questions where you need the 'mean' calculated in more than one way
- For example, excluding 0 or DK, as well as including 0 or DK
- List any "calculated" variables which are not specified in the questionnaire
- For example, “take Birth Year and create a new Age variable”
- List any questions that need to be “depiped”
- That is, a concept is piped through to a questions
- Sometimes the concepts are done in rotation, or randomized, and need “delooping”
- If you need any hierarchical reforming of the data (called stacking)
- For example, you collect information at the respondent level, but you need to analyse it at the brand-level.
- If you need any weighting variables set up in advance
Also, be sure to add additional things of your own.
Dataset template
You can use the following template and adapt it to suit your needs: