See Crunch Automation basics for more information.
The CREATE FILTER command now requires that you check the "Strict subvariable syntax" checkbox at the top of the Crunch Automation panel.
With this box checked, when you refer to a single element ("subvariable") of an array alone (not explicitly in the context of the array it is a member of), you will now need to use a new array_alias[subvariable_alias] syntax to provide the necessary indication of where the subvariable can be found (i.e., inside which array).
The CREATE FILTER command allows you to create filters for a dataset. All filters created are public filters, viewable and usable by anyone who has access to the dataset.
The filter is simply based on a condition based on boolean logic, which takes the cases into the filter that evaluates to TRUE.
Most importantly, filters are not variables, so you won’t be able to view them in the Variable Sidebar on the left in the Crunch web app. Filters are only found in the filter drop-down menus in the app. It is for that reason that there is no “AS” part of the command (because you’re not storing the filter as a variable).
The COMPLETE CASES modifier limits the filtered rows only to those where all the variables involved in the condition have valid data.
For more details on how to write conditions/expressions, see our Expressions section of the Crunch Automation basics guide.
You should express the condition:
- For categorical variables — to the category ID and not the assigned numeric value.
- For numeric variables — to the numeric value.
For that reason, you may wish to use the “labels” instead of codes when working with categorical variables. For example, var_gender = "Female" rather than var_gender = 2, because it may be unclear if “Female” has a category code of 1 or 2.
CREATE FILTER
condition
[COMPLETE CASES]
NAME "string";
Use cases
Below you will find several examples of scripts that can be built using the Crunch Automation command.
Time Period: Suppose we want to create a filter for a certain time period. This can be done by expressing the following inequality using dates.
CREATE FILTER
wave_date>="2024-07-01" AND wave_date<"2024-10-01"
NAME "2024 Q3";
Two Categorical Variables: In this example, we are creating a filter for males under 40 years old.
CREATE FILTER
gender="Male" AND (age_group="18-24 years" OR age_group="25-39 years")
NAME "Males Under 40";
Multiple Response Variable: If you'd like to filter based on a user selecting a particular option in a multiple response question, there are two ways you can declare that in the script.
Option 1:
CREATE FILTER
q5_mr[q5_1]=1 AND s1="Female"
NAME "Females - Shopped Store 1 P12M";
Option 2:
CREATE FILTER
ANY(q5_mr, [q5_1]) AND s1="Female"
NAME "Females - Shopped Store 1 P12M";
Categorical Array: Here is an example where you'd like to filter based on users who visit a specific brand's store very or somewhat frequently (options 4 and 5).
CREATE FILTER
q17_array[q7_3] IN [4, 5]
NAME "Visit Brand's Store Regularly";