Crunch expressions are used to compute on a dataset, to do nuanced selects, updates, and deletes, and to accomplish many other routine operations. Expressions are JSON objects in which each term is wrapped in an object which declares whether the term is a variable, a value, or a function, etc. While verbose, doing so allows us to be more explicit about the operations we wish to do.
Expressions generally contain references to variables, values, or columns of values, often composed in functions. The output of expressions can be other variables, values, or cube aggregations, depending on the context and expression content. Some endpoints have special semantics, but the general structure of the expressions follows the model described below.
Variable terms
Terms refer to variables when they include a “variable” member. The value is the URL for the desired variable. For example:
- {"variable": "../variables/X/"}
- {"variable": "https://[workspace].crunch.io/api/datasets/48ffc3/joins/abcd/variables/Y/"}
URLs must either be absolute or relative to the URL of the current request. For example, to refer to a variable in a query at https://[workspace].crunch.io/api/datasets/48ffc3/cube/, a variable at https://[workspace].crunch.io/api/datasets/48ffc3/variables/9410fc/ may be referenced by its full URL or by “../variables/9410fc/”.
Value terms
Terms refer to data values when they include a “value” member. Its value is any individual data value; that is, a value that is addressable by a column and row in the dataset. For example:
- {"value": 13}
- {"value": [3, 4, 5]}
Column terms
Terms refer to columns (construct them, actually) when they include a “column” member. The value is an array of data values. You may include “type” and/or “references” members as well.
- {"column": [1, 2, 3, 17]}
- {"column": [{"?": -2}, 1, 4, 1], "type": {"class": "categorical", "categories": [...], ...}}
Function terms
Terms refer to functions (and operators) when they include a “function” member. The value is the identifier for the desired function. They parameterize the function with an “args” member, whose value is an array of terms, one for each argument. Examples:
- {"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}
- {"function": "contains", "args": [{"variable": "../joins/abcd/variables/Y/"}, {"value": "foo"}]}
You may include a “references” member to provide a name, alias, description, etc to the output of the function.
The following is a list of all supported functions available for crunch expressions. Note that these functions can be used in conjuction to compose an expression.
Array functions
- array — returns the given Frame as an array. The type of each variable in the Frame must be close enough to form a supertype for the array.
- get — returns a subvariable from the given column.
- subvariables — return a Frame containing subvariables of the given array.
- tiered — return a variable formed by collapsing the given array’s subvariables in the given category tiers.
- complete_cases — returns a logical column, with “selected” for each row where all of the input variables are valid, and “missing” where they are all missing.
Binary functions
In general, these operate only on “numeric” types.
- + — add
- - — subtract
- * — multiply
- / — div divide
- // — floor division
- ^ — power
- % — modulus
- & — bitwise and
- | — bitwise or
- ~ — invert
Logic functions
These all return a “logical” categorical column with just three categories: one is marked “selected”, one is marked “missing”, and the other is neither.
User interfaces may use the presence of a “selected” category to decide to reduce analyses to only show the “selected” category.
Any “logical” column can be used as a filter expression; rows which result in a “selected” value will match the filter, and those which are “missing” or “other” will not.
- == — equals. Exact matches will return “selected”. For non-matching values, if either input term is missing, the result is missing. Otherwise, the result is “other”.
- != — not equals. Exact matches will return “other”. For non-matching values, if either input term is missing, the result is missing. Otherwise, the result is “selected”. This is the same result as not(==).
- =><= — between
- between — between
- < — less than
- > — greater than
- <= — less than or equal
- >= — greater than or equal
- in — “selected” for each row where A is an element of array B, or a key of object B.
- all — “selected” for each row where all subvariables in a multiple_response array are selected.
- any — “selected” for each row where any subvariable in a multiple_response array is selected.
- is_none_of_the_above — “selected” for each row where no subvariables in a multiple_response array are selected, unless all subvariables have missing values.
- contains — “selected” for each row where text value A is a substring of text value B.
- icontains — Case-insensitive version of ‘contains’.
- ~= — compare against regular expression (regex).
- and — logical and. A “selected” value and “selected” results in “selected”. If either input term is missing, the result is missing. Otherwise, the result is “other”.
- or — logical or. If either input term is “selected”, the result is “selected”. A “missing” value or “missing” results in “missing”. Otherwise, the result is “other”.
- not — logical not; this is the “relative complement”–any missing values will remain missing.
- not_selected — logical not; this is the “absolute complement”–any missing values will become “selected”.
- selected — returns “selected” only for “selected” categories; “other” and “missing” values will become “other”.
- is_valid — logical array of rows which are valid for the given column.
- is_missing — logical array of rows which are missing for the given column.
- any_missing — logical array of rows where any of the subvariables are missing.
- all_valid — logical array of rows where all of the subvariables are valid.
- all_missing — logical array of rows where all of the subvariables are missing.
Selection functions
- as_selected — returns the given variable reduced to the [1, 0, -1] “logical” categories.
- selected_array — returns a bool array from the given categorical, plus None/__none__/__any__
- selected_depth — returns a numeric column containing the number of selected categories in each row of the given array.
- selections — returns the given array, reduced to the [1, 0, -1] “logical” categories, plus an __any__ magic subvariable.
Miscellaneous functions
- add_valid — returns a numeric column with the sum of the input numeric variables; returns no data when all the columns are missing.
- alter_categories — returns a categorical column with the same values but updates the metadata of the input variable. Only allowed changes are categories names and order and subvariables names.
- bin — returns column’s values broken into equidistant bins.
- case — evaluates the given conditions in order, selecting the corresponding choice.
- cast — returns a Column of column’s values cast to the given type.
- char_length — returns the length of each string (or missing reason) in the given column.
- copy_variable — returns a copy of the column with a copy of its metadata.
- combine_categories — returns a column of categories combined according to the category_info.
- combine_responses — combines the given categorical variables into a new one.
- lookup — maps each row of source through its keys index to a corresponding value.
- missing — returns the given column as missing for the given reason.
- normalize — returns a column with the given values normalized so sum(c) == len(c).
- row — returns a numeric column with row indices.
- typeof — returns (a copy of) the type of the given column.
- unmissing — returns the given column with user missing replaced by valid values.
{
"function": "alter_categories",
"args": [{"value": {
"categories": [
{"id": "<cat_id>", "name": "<new name>"},
// ... repeat for each category to be renamed
],
"order": [1, 2, 3, 4],
"subvariables": [
{"id": "<subvar_id>", "name": "<new name>"},
// ... repeat for each subvariable to be renamed
]
}}]
}
Date functions
- default_rollup_resolution — default_rollup_resolution
- datetime_to_numeric — converts the given datetime column to numeric.
- format_datetime — converts datetime values to strings using the fmt as strftime mask.
- numeric_to_datetime — converts the given numeric column to datetime with the given resolution.
- parse_datetime — parses string to datetime using optional format string.
- rollup — returns column’s values (which must be type datetime) into calendrical bins.
- difftime — takes two columns as input and an optional resolution value and returns a numeric column representing the number of units diference between the two columns.
- now — has no parameters and returns the current time on the server.
- datetime — given a set of numeric columns (year, day, month), returns a datetime variable. Optionally add hours, minutes, and seconds. Note that seconds is a floating point number with no more than 6 decimal places.
Frame functions
- page — returns the given frame, limited/offset by the given values.
- make_frame or select — returns a frame built from the given map of variables, optionally specifying the frame’s variable order.
- frame_subset — returns a frame NOT including the indicated variables.
- sheet — returns the given frame, limited/offset in the number of variables.
- dependents — returns the given frame with only dependents of the given variable.
- adapt — returns the given frame adapted to the given to_key.
- join — returns a JoinedFrame from the given list of subframes.
- find — returns a Frame with those variables which match the given criteria.
- flatten — returns a frame including all variables, plus all subvariables as dotted ids.
{
"function": "make_frame",
"args": [
{
"map": {
<destination id>: <variable expression>,
<destination id>: <variable expression>,
...
}
},
{"value": ["destination id 2", "destination id 1", ...]}
]
}
- make_frame or select: Receives as its first argument a map expression where destination id is the ID that the mapped variable will have on the resulting frame and where variable expression` can be a reference to an existing variable from the source frame ({"variable": "source frame id"}) or a completely new variable that doesn’t exist on the source frame. An optional second argument defines the order in which variables must be arranged in the resulting frame.
{
"function": "frame_subset",
"args": [
// the frame to subset
{"frame": "unsecured"},
// the IDs of variables to explicitly include
{"value": null},
// the IDs of variables to explicitly exclude
{"value": ["000001", "000009", "001874", ...]}
]
}
- frame_subset: Same as select but will exclude the passed variable IDs from the source frame.
Measures functions
- cube_count
- cube_distinct_count
- cube_max — a measure which returns the maximum value in a column.
- cube_mean
- cube_min — a measure which returns the minimum value in a column.
- cube_missing_frequencies — returns an object with parallel ‘code’ and ‘count’ arrays.
- cube_quantile
- cube_stddev — a measure which returns the standard deviation value in a column.
- cube_sum
- cube_valid_count
- cube_weighted_max
- cube_weighted_min
- top — returns the given (1D/1M) cube, filtered to its top N members.
Cube functions
- autocube — returns a cube crossing A by B (which may be None).
- autofreq — returns a cube of frequencies for A.
- cube — returns a Cube instance from the given arguments.
- each — yields one expression result per item in the given iterable.
- multitable — returns cubes for each target variable crossed by None + each template variable.
- transpose — transposes the given cube, rearranging its (0-based) axes to the given order.
- stack — returns a cube of 1 more dimension formed by stacking the given array.
Filter terms
Terms that refer to filters entities by URL are shorthand for the logical expression stored in the entity. So, {"filter": "../filters/59fc4d/"} yields the Crunch expression contained in the Filter entity’s “expression” attribute. Filter terms can be combined together with other expressions as well. For example, {"function": "and", "args": [{"filter": "../filters/59fc4d/"}, {"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}]} would “and” together the logical expression in filter 59fc4d with the X == 13 expression.