Lakehouse Filtering Specification
When using the Tabulation API, row-level filtering on the data is achieved using a JSON filter specification.
A filter is either:
- A logical operator applied to filter expressions.
- A condition operator applied to a dataset variable.
- An array operator applied to an array variable's axes.
Basic Filter Expressions Examples
{"state": {"==": 17}}
or
{"and": [..., ...]}A logical operator must use a list as the value.
A condition operator uses a value type matching the operator.
Operators Reference
Condition Operators
This summarizes the value types supported:
| Operator | Supported value types | Description |
== |
categorical, numeric, text, datetime |
Value is an exact match |
> |
numeric, datetime |
Value is greater than |
>= |
numeric, datetime |
Value is greater than or equal |
< |
numeric, datetime |
Value is less than |
<= |
numeric, datetime |
Value is less than or equal |
in |
categorical, numeric |
Value matches any in the specified list |
is_missing |
all variable types | Value is null or belongs to a missing category |
Array Operators
| Operator | Description |
axis_selected |
The value of the specified axis is a selected category |
axis_any |
The value of at least one of the specified axis is a selected category |
axis_all |
The value of every specified axis is a selected category |
Logical Operators
| Operator | Description |
and |
All conditions must be true |
or |
At least one must be true |
not |
Negates the enclosed conditions |
selected |
For array variables: checks selected values (excludes missing) |
not_selected |
For array variables: negates selected check |
Using Variable Values
When constructing a condition operator, the filter values must match the values in the dataset schema (see Crunch Logical Schema). A common mistake is to use the label.
We'll use the following schema and metadata from the example files to, to go through a few examples:
Schema
{
"state": {
"name": "state",
"type": "categorical",
"values": [1, 17]
},
"sex": {
"name": "sex",
"type": "categorical",
"values": [1, 2]
},
"age": {
"name": "age",
"type": "numeric"
},
"race_mr": {
"name": "race_mr",
"type": "array",
"value": {
"type": "categorical",
"values": [1, 0]
},
"axes": [
{
"name": "subvariables",
"type": "categorical",
"values": [
"race_mr_1",
"race_mr_2",
"race_mr_3",
"race_mr_4",
"race_mr_5",
"race_mr_6",
"race_mr_7",
"race_mr_8",
"race_mr_9"
]
}
]
}
}Metadata
{
"state": {
"label": "state",
"description": "State based on self-reported zipcode",
"values": [
{"value": 1, "label": "AL", "missing": null},
{"value": 17, "label": "IL", "missing": null}
]
},
"sex": {
"label": "sex",
"description": "SEX. [ENTER RESPONDENT'S SEX:]",
"values": [
{"value": 1, "label": "Male", "missing": null},
{"value": 2, "label": "Female", "missing": null}
]
},
"age": {
"label": "age",
"description": "Age in years"
},
"race_mr": {
"label": "Race - Multiple Response",
"description": "Which of the following describes your race? You can select as many as apply. White, Black or African American, Asian or Asian American or some other race.",
"notes": "",
"axes": {
"subvariables": {
"values": [
{
"value": "race_mr_1",
"label": "White (e.g., Caucasian, European, Irish, Italian, Arab, Middle Eastern)"
},
{
"value": "race_mr_2",
"label": "Black or African-American (e.g., Negro, Kenyan, Nigerian, Haitian)"
},
...
]
}
},
}Examples
Equality and Inclusion
// State is IL
{"state": {"==": 17}}
// State is AL or IL
{"state": {"in": [1, 17]}}
// State is not IL
{"not": [{"state": {"==": 17}}]}
// Females in State IL
{"and": [{"sex": {"==": 2}}, {"state": {"==": 17}}]}Comparison Operators (Datetime)
// Survey response started before March 1st, 2026
{"starttime": {"<": "2026-03-01"}}
// Survey response started on or after March 1st, 2026
{"starttime": {">=": "2026-03-01"}}
// Survey response started within a date range
{"and": [
{"starttime": {">=": "2026-01-01"}},
{"starttime": {"<": "2026-03-01"}}
]}Comparison Operators (Numeric)
// Age greater than 30
{"age": {">": 30}}
// Age between 18 and 65 (inclusive)
{"and": [{"age": {">=": 18}}, {"age": {"<=": 65}}]}
// Age less than 25
{"age": {"<": 25}}Missing Values
// State is missing category
{"state": {"is_missing": true}}
// State is valid (non-missing)
{"not": [{"state": {"is_missing": true}}]}Combining Conditions
// Females under 30 OR Males over 50
{"or": [
{"and": [{"sex": {"==": 2}}, {"age": {"<": 30}}]},
{"and": [{"sex": {"==": 1}}, {"age": {">": 50}}]}
]}Array Variables
For array variables (e.g., multiple response and categorical arrays), use array operators to filter by axis:
// Selected "Hispanic/Latino" in the "Which of the following describes your race?" multiple response question
{"race_mr": {"axis_selected": "race_mr_2"}}
// Selected ANY of "White" and "Hispanic/Latino"
{"race_mr": {"axis_any": ["race_mr_1", "race_mr_7"]}}
// Selected ALL (both) "White" and "Hispanic/Latino"
{"race_mr": {"axis_all": ["race_mr_1", "race_mr_7"]}}