The Script Builder is a Google Sheets workbook that consolidates all of the key metadata from a dataset into one place. It acts as both a metadata reference and a tool for automatically generating Crunch Automation.
The Script Builder takes a lot of the legwork out of writing Crunch Automation by hand for many of the key repetitive tasks that go into turning a raw import into a tidy dataset. Furthermore, the Script Builder can be used at any stage, not just at the initial import of data, to generate Crunch Automation for key tasks and/or to serve as a metadata reference.
The principle goal for anyone analyzing their work in Crunch is that they are working with a neat and tidy dataset. It is only with a neat and tidy dataset that analysts can effectively do their analysis and reporting.
Creating a neat and tidy Crunch dataset primarily involves fixing and adding metadata. This is particularly relevant at the time of initial import of the data (either from a file or from a data collection platform).
As per the Definitive Guide to Uploading and Preparing Data, some key tasks need to be checked off as part of the setup phase, such as:
- Organizing variables into folders
- Creating arrays (either multiple response or categorical arrays)
- Changing the variable metadata — titles, descriptions, and notes
- Changing category metadata — labels, numeric values, and/or date assignment
- Reordering variables, subvariables, categories, and labels
- Setting subtotals (nets)
There are more tasks than the ones described above that involve a dataset setup. For example, recoding values to missing (i.e., “rebasing”), creating top-2-box summaries, creating nets on multiple response variables, transforming data, and so on.
The aim of the Script Builder is not to comprehensively handle every task in the Definitive Guide and every command in the list of Crunch Automation commands; rather, the Script Builder is designed to deal with just the core tasks, as described in the following.
What does the Script Builder do?
The Script Builder conveniently consolidates all the relevant metadata into a single Google Sheets workbook where the editor can make alterations. In Google Sheets, the user can take advantage of Google Sheet functions: apply a filter/sort rows, find/replace options, formulas and functions, macros, and more.
To use the Script Builder:
- Click the Script builder link in the dataset menu:
- After clicking this link, it may take several seconds to create your Google Sheets workbook, depending on how many variables your dataset has. When the sheet is ready, you will see a screen like this:
- Click the link to open your script builder.
- After making some worksheet changes, you may encounter this message in the Google Sheet:
- Click Continue and authenticate to your Google account:
- Google notifies you of the access this Google Sheet is requesting:
- Click Allow to continue.
- Make all the changes that are necessary in the various tabs of the Script Builder.
- Review the Crunch Automation code on the latest tab, which is automatically generated for you.
- Choose the Crunch Automation script, either by:
- Clicking the conveniently marked button within Google Sheets (last tab) to use the code as-is, or
- Copying/pasting the script elsewhere to make any amendments/changes before running it from within the Crunch web app.
You can then repeat the above steps as many times as you need. We encourage you, though, not to run one command at a time.
Tabs overview and what they do
The Script Builder includes the following four tabs, with which you can generate scripts:
- Schema tab
- Variables tab
- Subvariables tab
- Categories tab
Each tab generates the Crunch Automation commands, which are described in the following sections:
- CHANGE command (TITLE/DESCRIPTION/NOTES)
- ORGANIZE command
- CREATE CATEGORICAL ARRAY command
- CREATE MULTIPLE DICHOTOMY command
- CHANGE command (LABELS)
- LABEL CATEGORIES command
- SET NUMERIC VALUE command and UNSET NUMERIC VALUE command
- SET DATES command
- SET MISSING command
- DISPLAY command
- CREATE CATEGORICAL RECODE command
- Using a filter on the second row in each tab can assist with sorting and filtering to specific variables.
- Using the find/replace function can be useful in cleaning variable metadata (e.g., removing extraneous field information).
- It's recommended that you try to make as many changes as you can in fewer scripts (ideally one script), rather than doing lots of little changes in many scripts.
The Schema tab
The schema tab allows you to change variable aliases and types. As noted on each of the Crunch Automation pages, these are schema-changing commands and can only be run in the first script.
Most commonly, for the average dataset setup, you don’t need to work in this tab at all and you can safely skip it.
If you need to do conversions of variable types, it is often best to use the CREATE CONVERT command or to derive other variables (e.g., when you are recoding). These commands are outside of the scope and purpose of the Script Builder.
Changing a variable alias
- Simply change the alias in column A (“Alias”).
- The cell will turn green to indicate a change.
Changing a variable type
- Simply change the type in column B (“Data Type”). There are a fixed number of options you can change it to.
- If you select “Date/Time”, then you must include the date/time mask in column C. This will be indicated by red cells if it is missing.
The Variables tab
The variables tab is where most of the work is done to help you achieve a tidy dataset. It allows you to organize variables into folders (including reordering them and/or hiding them), changing metadata labels, and, importantly, building arrays.
The first two columns (A and B) are inflexible—these are the Variable (alias) column and the Data Type. If you try to alter them, the change will simply revert back automatically. The reason is that these changes must be done on the Schema tab (see above). The alias and variable type are only included here for reference purposes.
Organizing variables into folders
- The “Folder” (column C), which allows you to specify the folder you want to put a variable into.
- It will create the folder if that folder doesn’t exist.
- As per the ORGANIZE command documentation, you use the | symbol to create subfolders.
- If you want to put it into a HIDDEN or SECURE folder, you must include that from the start. For example, if you want to make a subfolder “System Vars” within the hidden directory, you must put HIDDEN|System Vars.
Changing variable details
Changing the metadata is very simple in the Script Builder:
- Overwrite the cell for the “Title”, “Description”, and “Notes” fields respectively in columns D, E, and F.
- Use copy/paste and find/replace to assist in this task.
- The cells will turn green indicating a change (and that the subsequent Crunch Automation code has been generated).
Building arrays (the Array Builder)
Use the following commands to create arrays, which is a key part of a dataset setup if the datafile (e.g., SPSS file) lacks the metadata that defines the arrays:
Creating multiple response and categorical arrays share a nearly identical process. The only difference is that a multiple response takes a “SELECTED” value as defined by the “Dichotomize” column in the Script Builder.
The way variables are grouped into a new array is via the new target alias: you nominate a target alias in the “Array Builder > Alias” (column G) and you put the same target alias for all the rows (variables) you want to group together.
Let’s illustrate using the following example. There are six variables (rows) that should belong to one array, which is a multiple response:
To build an array:
- Put qlb as the target alias in column G (“alias” of the Array Builder). Use series fill to drag the alias down for all the relevant rows.
- Notice that the remaining cells in the array builder for this new variable turn green, red, or grey:
- Green = it will work as-is and you don’t need to input anything, though it is highly recommended to include a title for the new variable. In this case, a suggested title would be “Awareness”.
- Red = it will not work as indicated as all subvariables need a label. In this case, the easiest thing to do is to copy over the information from column E and do a find/replace to remove the “Awareness:” title from the labels.
- Grey = don’t include anything here, which is because there can only be one title/description/notes and selected value (“dichotomize”). As a result, the grey section is a follower to the first row (green) and repeats the information automatically (for reference purposes).
- Optionally, enter a value for “Dichotomize”. This is crucial in this example, otherwise, you will create a categorical array.
- As per the documentation for the CREATE MULTIPLE DICHOTOMY command, you can either specify a category id (number), or
- You could include a “1” or “Yes” (in this example).
Hint: You can look up the variable category information on the Categories tab of this Script Builder, in order to check what the categories of all the variables are.
In order to specify a new variable (“Awareness”) that is a multiple response, the Array Builder information in the Script Builder appears as follows:
By default, all of the input variables are hidden (i.e., variables q1b_1...q1b_6). The Crunch Automation script will include the subcommand HIDE INPUTS. You can edit this out of the Crunch Automation script if needed, but in the vast majority of scenarios, it is prudent to hide the original (non-array) variables.
The Subvariables tab
The Subvariables Tab displays the subvariables that are part of each array. The sole purpose of this tab in the Script Builder is to change the labels.
You cannot change the order of subvariables with Crunch Automation: you can only derive new arrays or change the order of subvariables in the web app (using drag/drop). Changing the order of subvariables via scripting (Crunch Automation) is being considered for future development.
This tab displays both the alias of the variable set (Column A, the “Variable” column) and the individual aliases of the subvariables (Column B, the “Alias” column). You cannot change either aliases here—any changes will revert back automatically.
You cannot have duplicate subvariable labels. For example, you cannot have two subvariables labeled “Apple”.
Currently, you cannot use the Subvariables Tab to create subtotals (nets) on a multiple response variable, but this may be incorporated in future releases. For steps on how to create a net on a multiple response variable, please refer to the Definitive Guide.
The Categories tab
The categories tab allows you to see all the categories for each variable within the dataset.
For reference, you can see the variable alias and the category ids (“code”). You cannot change these—they are simply here for reference.
Hint: Turn row 2 into a Filter row and use that to filter to specific variable aliases.
Using the Categories tab, you can change information on an existing variable, including:
- Change the category labels.
- Set and unset categories to be missing.
- Change the numeric values associated with categories.
- Create subtotals (nets).
You can also create new variables using the Combine column (e.g., collapsing categories into broader buckets). This is an alternative to using subtotals.
Changing category labels
- Overwrite the information in column C.
- The cell will turn green to indicate a change.
- Duplicate category labels are permitted.
Setting and unsetting categories to be missing
- Set the category to missing or not with TRUE/FALSE values.
- If you intend to recode missing to other information (sometimes referred to in DP as ‘rebasing’) then please refer to the Definitive Guide. There are numerous commands associated with recoding to missing (‘rebasing’).
Changing the numeric values associated with categories
- Input a numeric value (float) in the “Value” column.
- As a float, it can involve decimals and a negative sign.
- You can leave a category blank for the value column, meaning that a particular category has no value. On a scale, for instance, you might have the category “Not applicable” with no numeric value, so it doesn’t skew the scale mean.
Hint: Generally speaking, most categorical variables should not have any values (numeric or date) associated with them. Datafiles (such as SPSS files) can automatically assign a numeric value where one is not appropriate (e.g., Male and Females have numeric values of 1s and 2s where these are not meaningful). A better practice is to clean out all of the numeric values first and then only set values where it is appropriate. Crunch recommends only setting values for ordinal and date categories.
Creating subtotals (nets)
- Put the label for each subtotal in the Subtotal column.
- Repeating labels for a subtotal tells the Script Builder that you want categories (rows) to belong to the same subtotal. In the following example, two subtotals are created, “Bottom 2” and “Top 2”:
- Subtotals always default to the top. If you want subtotals in other positions, you can edit the Crunch Automation code. Although traditional tabs may have subtotals (nets) interspersed, Crunch recommends having all subtotals at the top (or bottom), which is because the user can interactively switch subtotals on/off (which traditional tabs do not allow).
Combining categories into a new variable (nets)
- The “Combine” column operates the same as the “Subtotal” column but creates a new variable rather than subtotals.
- The Script Builder automatically generates a target alias and title for the new variable, using the suffix “_combined” in both cases. You can edit the Crunch Automation script if you want an alternative alias or title.
- The description and notes fields are copied over from the original variable.
Reviewing the Automation Script
After you have made your changes in the Schema, Variables, Subvariables, and Categories tabs, go to the Automation script tab:
Review the Crunch Automation script that has been created for you. At this point you can:
- Click the Generate automation script button to regenerate the script. If you are making lots of script changes, it may be helpful to turn off automatic script generation via the Crunch settings screen and then click the Generate automation script button on this tab when you are ready.
- Manually edit the Crunch automation script in the cell provided.
- Copy and paste the generated Crunch automation script to another file, text editor, or location of your choosing.
- Click the Run automation script button to send this script to the Crunch API to be run. When you do this, the script results appear in the bottom box.
In your workbook, there is a Crunch menu you can access to execute various functions:
- Revert selected values — after you change the value of a cell, you can highlight one or more cells to revert them to their original values before you made your change.
- Reload dataset schema — re-downloads and re-populates your workbook with the latest dataset schema from the Crunch API. This can be helpful, especially after running a Crunch automation script.
- Generate automation script — generates your automation script manually in case you have turned off automatic script generation in the settings menu. This does the same thing as clicking the same button on the Automation script tab.
- Run automation script — sends your automation script to the Crunch API to be run. This does the same thing as clicking the same button on the Automation script tab.
- Remove API key — this is a security feature that you can activate that will remove your personal API key from the worksheet in case you plan to share this workbook with other people. Note that if you later wish to reload the dataset schema or run an automation script from within the workbook, you will need to add your API key back via the Settings screen.
- Settings — opens the Crunch settings screen.
You can access the Settings screen to review and change the API key, Domain, and Dataset ID used in this workbook:
You can also modify these settings:
- Automatically generate automation code — for extremely large datasets or workbooks where you are making hundreds of changes, Google Sheet performance will be better if you uncheck this box and generate automation code manually via the link from the Crunch menu or by clicking the button on the Automation script tab.
- Ignore hidden variables — this is enabled by default to suppress all hidden variables you may have modified in the past. Uncheck this box and reload your dataset schema in order to view all hidden variables.