The following functions are used to apply declaration commends to multiple variables in Crunch Automation.
Using the subvariables_of function
The subvariables_of function allows you to pass multiple array aliases and returns all variables in those arrays in the format array_alias[variable_alias].
subvariables_of(my_array1, my_array2, ...)
If any non-arrays are passed to this function an error is returned.
Using the ALL() function
You can use the ALL() function to indicate all the variables in a dataset. This function takes a single argument that refers to all variables of a certain type:
For example, use the following for variables my_num1, my_num2, my_text, and my_categorical:
ALL() "my_num1", "my_num2", "my_text", "my_categorical"
all("numeric") "my_num1", "my_num2"
All("text") "my_text"
Using the STARTSWITH / ENDSWITH functions
The STARTSWITH() and ENDSWITH() functions allow you to match variables aliases that either start or end with certain strings (case sensitive):
STARTSWITH("prefix_")
ENDSWITH("_suffix")
Using regex matching variables
It is now possible to expand variables using regex matching against the dataset schema:
regex("quoted_regex")
which evaluates the regular expression against the beginning and end of each alias in the dataset.
For variables a_1, a_2, a_21, a_a, b_1:
regex("a_.") "a_1", "a_2", "a_a"
regex("a_.+") "a_1", "a_2", "a_a", "a_21"
REGEX("a_[1-9]") "a_1", "a_2"
regex("a_[a-z]") "a_a"
Additionally, a LIKE() function is available as well that performs SQL-like % matching:
% character will match any string
_ character will match any single character
For variables 1a2, 2a3, 1b2:
like("%a%") "1a2", "2a3"
LIKE("1%") "1a2", "1b2"
LIKE("1_2") "1a2", "1b2"
As with the rest of the syntax, the function calls are not case sensitive, but the matching argument is.
These can be combined in the same way with other kinds of expansions:
- a_1...a_10
- regex("b.?")
- other alias
which returns the full list of all three expansions.
Using expressions
Expressions support function calls, and the following functions are supported:
- add_valid(numeric, ..., numeric): Takes a list of numeric values or numeric arrays and returns the row sum of the valid values.
- all_missing(array): Returns rows where all of the subvariables have missing values.
- all_valid(array): Returns rows where none of the subvariables have missing values.
- any_missing(array): Returns rows where any of the subvariables have a missing value.
- complete_cases(variable, ..., variable): Returns the matching rows where values of all variables involved are not missing.
- duplicates(variable): Returns the column removing any repeated value. Returning only the first occurrence for each.
- is_missing(variable): Returns boolean type with True values on those rows that have a missing value.
- is_valid(variable): Returns boolean type with True on rows that don’t have a missing value.
- straightline(array): Returns a logical variable selecting those rows where only one value has been selected for all responses.
- trim(numeric_variable, min_number, max_number): Replaces all values lesser than min_number and replaces them with it, same for max_number.
Additionally, all boolean expressions can be wrapped in a NOT (expr) negation.
The special value NULL can be used in expressions to refer to system missing values for categorical variables.
Example expressions:
CREATE FILTER Age > 18 AND (Gender = "F" OR Gender = "Female") NAME "Adult women";
CREATE FILTER favorite_browser IN ["Chrome", "Firefox"] AND NOT is_missing(answered) NAME "Top Browsers";
CREATE FILTER 3 <= rating < 8 AND (Age / 2) + 7 NAME "Medium rating dating range";
CREATE FILTER opinion = NULL NAME "No opinion";
CREATE NUMERIC (2020 - Age) * 2 AS double_age NAME "Double Age";
Example script:
# First commands specify the Schema block
REPLACE CONVERT v1, v2 TO CATEGORICAL;
REPLACE CONVERT v3, v4 TO NUMERIC;
RENAME v1 TO Age;
RENAME v2 TO Year;
# This is the first non-schema command; no schema commands are possible
# after this.
# Sets the exclusion filter on the dataset
SET EXCLUSION v3 < 0;
CREATE FILTER Age < 20 AS "Teenagers";