Every Crunch dataset is defined by raw data plus a script that consists of a series of declarations:
- Raw data is whatever is imported into Crunch – CSV, SPSS, or any other format.
- Declarations are performed upon this raw data to define a schema and metadata that cleans it, organizes it, and makes it readable and explorable for you and your clients.
Crunch Automation scripts are written in a declarative scripting language. A declarative script differs from most other scripting languages you may be familiar with because it is not procedural – the data is not edited then saved in a new state. The raw data still exists in Crunch and the declarations in the script provide the lens through which that data is viewed by you and your customers.
This has a few ramifications:
- You will never need to “revert” your dataset along a series of save points to try to track down some state change – the raw data and its transformations have no versioning.
- The raw data and declarations are always internally consistent – it is not possible to have a reference to a variable that has been deleted or changed its alias because attempting to apply a declaration that does this will raise an exception, allowing you to make sure your data and schema are always synchronized.
- Applying an existing schema (or part of an existing schema) to a new set of raw data is very simple – you can just import the new raw data and apply the necessary declarations. For example, if many of your datasets have an identical set of demographic variables, you can use the same set of declarations to clean and organize those variables, perhaps using simple find and replace if the new raw data uses different aliases.
In the future, editors will be able to access a dataset’s entire Crunch Automation script and edit and reapply it. So if your raw data is a monthly survey that adds new questions every month, you can edit the script to handle these variables as the data is streamed in.
Schema versus Metadata
It is important to understand the difference between schema and metadata.
A dataset’s schema defines what changes are made to the definition of the raw data – changing variable types, dropping variables, transforming variables, and changing variable aliases.
A dataset’s metadata defines cosmetic changes that do not affect the schema. This can include variable and category names and descriptions, the organization of variables into folders, and the creation of filters and derived variables.
Types of Crunch Automation
There are two kinds of Crunch automation scripts, dataset scripts and system scripts:
- Dataset scripts — applied to a single dataset. They can define extend the schema by defining additional variables based on existing ones, organize variables into variable folders, create dashboards, and update variables names and descriptions to be human-readable.
- System scripts — applied to a specific dataset folder or to the account. They can organize datasets into dataset folders, grant/remove/edit permissions on dataset folders, or create/alter users on the account.
How Crunch Automation works
A Crunch Automation script can be applied via the API or via the Crunch user interface (either at the account, folder, or dataset level) by selecting Crunch Automation from the main menu. Currently it is possible to run multiple partial scripts sequentially, but this is not recommended. If you do choose to run partial scripts there is a restriction that commands that alter the schema must be run first in order for us to guarantee that a metadata-altering command is not invalidated by an underlying schema change.
We highly recommend that you save your complete Crunch Automation script to a text file in your own system. This means if you ever need to reimport your data, you can reapply the script in its entirety.