See Crunch Automation basics for more information.
About schema commands
Schema is a complete, low-level description of the type of each variable used for validation and integrity checking—data that conforms to a schema is valid. The schema RENAME/REPLACE commands must always appear at the top of the script, since they can change the schema of the dataset on which they’re running.
After the first non-schema command is declared, the app returns an error if any schema command is found subsequently in the script (i.e., after the schema commands run, the schema for the dataset becomes immutable).
Crunch Automation makes the schema immutable to avoid further mistakes and alignment issues (for appending data). If applicable, you should make changes to the schema prior to running the script to address any errors that occur in data collection.
You can also use the CREATE CONVERT command to create derivations (of a different variable type and alias), which are not schema-changing by definition.
About the REPLACE CONVERT command
The REPLACE CONVERT command allows you to change the type of the mentioned variables to another type. This mutation changes the variables' data.
It is necessary to provide options depending on the target type to indicate how to handle the values found in the original column.
When converting multiple variables into one command, they should all be of the same type.
If the input variable is a derived variable, it returns an error since it's not possible to perform casting with options to a derived variable.
To categorical
Converting to categorical requires that you indicate the desired categories with their labels and optional codes, and whether it is missing or not. Any extra value in the data that isn't specified will get an automatic category assigned with the next available code.
In the case of converting numeric to categorical types, the original numeric value will be stored in each category's numeric_value attribute, unless a NUMERIC VALUE has been specified.
It is only possible to convert text and numeric columns to categorical.
To datetime
Converting to datetime is only allowed from text and categorical variables. It is necessary to provide a date mask indicating the format to parse the values as; additionally, an optional resolution can be indicated using the following valid format:
- ["Y", "3M", "M", "W", "D", "h", "m", "s", "ms", "us"]
You must include the % symbol in the script (e.g., "%Y-%m-%D"). See the example script below.
Anything that doesn't comply to the format will be converted to missing data.
To text
Converting to text is only possible from numeric and categorical variables. In the case of numeric variables, the numbers will be converted to a floating-point string (adding a .0 decimal in the case of integers).
To numeric
It is possible to convert text and categorical variables to numeric. In the case of categorical variables, the numeric_value attribute will be used as the final value.
REPLACE CONVERT alias, ..., alias TO CATEGORICAL WITH
VALUE "string"|number TO "label" [CODE code [NUMERIC VALUE <float>] [MISSING]],
...,
VALUE "string"|number TO "label" [CODE code [NUMERIC VALUE <float>] [MISSING]],
;
REPLACE CONVERT alias, ..., alias TO DATETIME FORMAT "string" [RESOLUTION "string"];
REPLACE CONVERT alias, ..., alias TO NUMERIC;
REPLACE CONVERT alias, ..., alias TO TEXT;
Using the DATE attribute
The REPLACE CONVERT command also allows you to create categorical-date variables by specifying a DATE attribute to each of the categories defined in the command.
Creating a categorical date variable requires that all the non-missing categories contain a DATE attribute that should contain a valid and unique ISO-8601 date string:
REPLACE CONVERT my_num TO CATEGORICAL WITH
value 1 to "Level 1" DATE "2022-01-21",
value 2 to "Level 2" DATE "2022-01-22",
value 3 to "Level 3" DATE "2022-01-23",
value 4 to "Level 4" DATE "2022-01-24",
value 5 to "Level 5" CODE 500 DATE "2022-01-25" MISSING
AS my_cat;
Example
REPLACE CONVERT var1, var2 TO NUMERIC;
REPLACE CONVERT var3, var4 TO CATEGORICAL
WITH
VALUE "a" TO "Letter A" CODE 1 MISSING,
VALUE "b" TO "Letter B" CODE 2;
REPLACE CONVERT var3, var4 TO CATEGORICAL
WITH
VALUE 1 TO "First",
VALUE 2 TO "Second",
VALUE 3 TO "Third";
REPLACE CONVERT my_date_str TO DATETIME FORMAT "%Y-%m-%D";