Each dimension of an analysis can be simply one variable, a function over it, a traversal of its subvariables (for array variables), or even a combination of multiple variables (e.g. A + B). Any expression you can use in a “make_frame” command can be used as a dimension. The big difference is that the system will consider the distinct values rather than all values of the result. Variables which are already “categorical” or “enumerated” will simply use their “categories” or “elements” as the extent. Other variables form their extents from their distinct values.
For example, if “3ffd45” is a categorical variable with three categories (one of which is “No Data”: -1), then the following dimension expressions:
{
"dimensions": [
{"variable": "datasets/ab8832/variables/3ffd45/"},
{"function": "+", "args": [{"variable": "datasets/ab8832/variables/2098f1/"}, {"value": 5}]}
]
}
…would form a result cube with two dimensions: one using the categories of variable “3ffd45”, and one using the distinct values of (variable “2098f1” + 5). If variable “2098f1” has the distinct values [5, 15, 25, 35], then we would obtain a cube with the following extents:
1 | 2 | -1 | |
5 | |||
15 | |||
25 | |||
35 |
Each dimension used in a cube query needs to be reduced to distinct values. For categorical or enumerated variables, we only need to refer to the variable, and the system will automatically use the “categories” or “elements” metadata to determine the distinct values. For other types, the default is to scan the variable’s data to find the unique values present and use those. Often, however, we want a more sophisticated approach: numeric variables, for example, are usually more useful when binned into a handful of ranges, like “0 to 10, 10 to 20, …90 to 100” rather than 100 distinct points (or many more when dealing with non-integers). The available dimensioning functions vary from type to type; the most common are:
- categorical: {“variable”: url}
- text: {“variable”: url}
- numeric: Group the distinct values into a smaller number of bins via:
- {“function”: “bin”, “args”: [{“variable”: url}]}
- datetime: Roll up seconds into hours, days into months, or any other grouping via:
- {“function”: “rollup”, “args”: [{“variable”: url}, {“value”: variable.rollup_resolution}]}
- categorical_array:
- One dimension for the subvariables: {“each”: url}
- One dimension for the categories: {“variable”: url}
- multiple response:
- One dimension for the subvariables: {“each”: url}
- One dimension for the selected-ness, which means transforming the array from a set of arbitrary categories to a standard “selected” set of categories (1, 0, -1) via:
- {“function”: “selections”, “args”: [{“variable”: url}]}