See Crunch Automation basics for more information.
The CREATE LOGICAL command allows you to create a logical variable from an expression. A logical expression evaluates to TRUE or FALSE for each row (case). The resulting logical variable takes one of three possible values: Selected, Other, or Missing.
- If an expression evaluates to TRUE, then the case is assigned 1 = “Selected”
- If an expression evaluates to FALSE, then the case is assigned 0 = “Other”
- If the truth of the expression for a case cannot be determined, then the case is Missing (NA). (Note: an expression involving “False OR Missing” conditions is Missing, because it cannot be determined from Missing data if it is True.)
This variable can be later used for filtering or conditions. It is frequently used in creating ‘nets’ across variables.
For more information on how to write expressions, see the Expressions section of the Crunch Automation basics article.
In the expressions:
- For categorical variables and integers in expressions (X=1) — refer to the category id and not the assigned numeric value, which is not the same as a code.
- For numeric variables — to the numeric value.
For that reason, you may wish to use the category names instead of codes when working with categorical variables. For example, var_gender = "Female" rather than var_gender = 2, because it may be unclear if “Female” has a category code of 1 or 2.
CREATE LOGICAL
expression
AS alias
[TITLE "string"]
[DESCRIPTION "string"]
[NOTES "string"];
Use cases
Logical variables can be used in many different scenarios. The following example is how to create a dichotomous variable by netting over two questions, which is taken from the 2019 Core Trends Mobile & Broadband example.
The following screenshot describes a variable that captures the respondents who answered “Yes” to Use Internet or Email as well as Mobile Internet Use:
CREATE LOGICAL
eminuse = "Yes" OR intmob = "Yes"
AS use_internet
TITLE "All Internet Users"
DESCRIPTION "NET variable";
The following output appears in the Crunch app after the script runs:
Consider a similar case from the same dataset, where respondents may not have valid responses for the variables involved. In the following example, we create an “any smart phone or cell phone” variable summarizing all three:
CREATE LOGICAL
smart2 = "Yes, smartphone" OR ql1 = "Yes, have cell phone" OR ql1a = "Yes, someone in household has cell phone”
AS any_phone
TITLE "Any smart or cell phone";
The script uses the standard evaluation of OR, which is usually the correct thing to do. To treat variables as if they should be valid whether they were asked or not, and thereby assume that Missing represents a False response, then ORNM is an alternative operator that is missing only if both of its arguments are missing.
If you need to change how missing data are handled (e.g., ‘rebasing’ variables) then you can consider using CREATE CATEGORICAL CASE or the ORNM operator, which evaluates Missing or False as False, rather than missing.
Using a 'Truth Table' to understand missingness
Logic involving missing data can be counterintuitive at the aggregate level. It can appear you have different results and base sizes, which is often due to how the logic considers missing data.
Consider the following logic expression:
var_A = 1 OR var_B = 1
The final result could be 83%, 63%, or 75% due to how missing respondents were treated on just two variables. The three types of logical evaluations for OR are:
- OR — default
- ORNM — "Or Non-Missing"
- CC — "Complete Cases"
The outcome of these is illustrated in the following Truth Table:
Respondent | var_A | var_B | OR | ORNM | CC |
---|---|---|---|---|---|
1 | T | T | T | T | T |
2 | T | F | T | T | T |
3 | T | NA | T | T | NA |
4 | F | T | T | T | T |
5 | F | F | F | F | F |
6 | F | NA | NA | F | NA |
7 | NA | T | T | T | NA |
8 | NA | F | NA | F | NA |
9 | NA | NA | NA | NA | NA |
True / Base | 3 / 6 | 3 / 6 | 5 / 6 | 5 / 8 | 3 / 4 |
Percents | 50% | 50% | 83% | 63% | 75% |