Overview
Before using Crunch Automation, we recommend you review the following article to familiarize yourself with the basic concepts of a declarative language:
We also highly recommend that you save your complete Crunch Automation declaration in a text file in your own system. If changes need to be made to a dataset’s declaration, you should reimport the raw data and run the modified declarations on the newly imported raw data.
This page describes how to use Crunch Automation, a scripting language to execute common Crunch commands. Crunch Automation runs inside the Crunch app, which means 1.) no development environment is required and 2.) no code is run on users' computers. Crunch Automation is executed via scripts, which are plain text files with a sequence of automation commands that run on existing datasets. Examples of available operations include creating filters, creating new derived variables, and changing variable attributes.
You must be the dataset’s current editor in order to execute a Crunch Automation script on a dataset. Crunch Automation can be accessed from within a dataset using the dataset dropdown menu in the upper left corner:
Please note the following permissions you need in order to run the following:
- You must be an account admin to run user scripts.
- User scripts must be run at the account level (top-level folder).
- You must have editor permission on a folder to run folder scripts.
See the how-to guide for further information on how to use Crunch Automation in the web app.
Writing a script
Scripts are plain flat-text files that consist of a list of commands, each terminated by a semicolon ; and a new line for the next command.
Comments
Scripts support comments. To write or comment any line, it must begin with the pound # character.
Variables
Variables are referred to by their unique alias in a dataset. If any RENAME command is used, then this how you should refer to the variable going forward.
Aliases do not need quotation marks within the script unless they are invalid. Valid aliases:
- cannot start with a number
- cannot contain spaces
- cannot contain any special characters, such as ()!@#$%^& *[]{}?¿`\\'\".,/"
You cannot create new variables with invalid aliases.
If your dataset already includes invalid aliases, you can refer to them in an Automation script by using backticks (`). You are encouraged to rename them using the RENAME command.
# Example use of an alias without escaping
variable_alias
# Example of an alias that has spaces escaped
`alias with space`
Referring to variables in an array
When working with an array of variables, each variable is referenced as array_alias[variable_alias].
For example, if you are working with an array (my_cat_array) that contains variables (var_1, var_2, var_3), you would refer to var_1 as follows:
CHANGE TITLE IN my_cat_array[var_1] WITH "My Variable Name";
Expressions
The Crunch Automation language allows you to use logical and arithmetic expressions in filters or other commands.
Parenthesis order is respected, and the following junctions are supported:
- <
- >
- >=
- <=
- ==
- AND
- OR
- ORNM
- +
- -
- *
- /
Running a script
Following any dataset’s shoji:entity representation, a scripts catalog will be linked.
To run a dataset, a client must perform a POST request with a shoji:entity to the scripts’ catalog with a body attribute containing the text contents of the script file inlined in the payload:
POST /api/datasets/123456/scripts/ HTTP/1.1
{
"element": "shoji:entity",
"body": {
"body": "<Script contents>"
}
}
The server will validate that the commands are consistent with the dataset’s schema and return either a 202 response indicating a progress of the script’s execution, or a 400 response with any detected errors.
Managing a script
There is no management of scripts: once a script is executed, you cannot manage it any further after that. Its record exists as an execution log, as a reference of what changes were performed on it.
There is also no support to DELETE scripts. The only alternative to undo or delete a script is to revert the dataset to a savepoint before its execution.
Reverting a script
There are two ways to revert the output of a script:
- UNDO - Delete he artifacts and variables created by a script, or
- RESTORE - A hard revert that returns the dataset to the state it was before running such script, deleting not only the artifacts and variables created by a script but also reverting all other changes made by the Crunch app, rCrunch, or any other API calls subsequent to running that script.
The difference between both is that a hard revert restores the dataset, as it drops all ensuing scripts and their output (artifacts and variables), while an undo only deletes the artifacts and variables created by this script, but changes made by other scripts and this script's record will remain in place. In both cases, the associated script will be deleted.
Undo
Undoing a script's output is accomplished by sending a DELETE request on the script's output catalog.
In cases where there are any dependencies that prevent the artifacts from being deleted, the request will return a 409 response.
Restore
The application stores a savepoint of the dataset directly before the execution of the commands in a script.
To perform a script-driven restore where all the associated non-versioned artifacts are deleted, the client must send a POST request to the revert endpoint. This will fire up an asynchronous task to iteratively drop all the artifacts from all scripts and restore the dataset.
In cases where there are any dependencies that prevent this script from being reverted, the request will return a 409 response. Keep in mind that some automation commands (e.g. ORGANIZE) cannot be reverted by the undo or restore commands.
Working with variables
See the following articles to learn more about how to work with variables using Crunch Automation:
Material variables
When you create new variables based on existing variables, such as case statements or recodes, they are stored as a “formula” that expresses the logic you describe. When you include the CREATE MATERIAL keyword in a Crunch Automation CREATE command, the system evaluates these computed expressions eagerly, and saves the concrete data values to the dataset instead of the formula. The result no longer depends on the input variables and will not change if the inputs or their data are changed or updated. Note that material variables are always public (part of the dataset schema).
In general, Crunch recommends that data processing teams create calculated variables as material variables, so that they can be certain that the values will not change due to any upstream dependency in the logic they express. Material variables may be required to align datasets to ensure that variable schemas are compatible.
Example
CREATE MATERIAL NUMERIC 2022 - birthyr AS age;
CREATE MATERIAL CATEGORICAL CUT age
BREAKS MIN, 30, 45, 65, MAX
LABELS "Under 30" CODE 1, "30–44" CODE 2, "45–64" CODE 3, "65+" CODE 4
AS age_groups;
Crunch Automation commands
See the following sections to view a current list of all of the commands:
- Dataset commands
- Dashboard commands
- System commands: