See Crunch Automation basics for more information.
The CREATE CATEGORICAL CASE command allows you to create your own categorical variable. There are many use cases for this (e.g., allocating respondents to segments, fixing survey skips (DP jargon sometimes refers to this as "rebasing"), and combining variables into one). CATEGORICAL CASE variables can be thought of as cure-all for many DP tasks.
Each category is defined on a case-by-case basis by a logical condition (see the expressions section for more information on conditions/expressions). When a case (i.e., row of the data, typically a respondent) evaluates to TRUE in the expression (boolean logic), then it is allocated to that category. The web application allows you to do the same thing using the Create Categorical variable builder.
Each category is defined sequentially and the result is the first match. For each condition, in turn, only the cases that haven’t yet been allocated are eligible. It’s equivalent to the "IF, ELSE, ELSE IF" commands you see in other programming languages (such as R and JavaScript). Any unallocated cases that have not met any conditions are then set as missing. You can also set cases to be missing at any time by using NULL.
THEN VARIABLE (Optional Argument)
You can also optionally set each category as a constant or as a fill from another variable. Both examples are illustrated in the following script. If you are filling from another variable, then this is when you use the THEN VARIABLE argument.
Creating a variable with constant categories:
CREATE CATEGORICAL
CASE
WHEN condition THEN "label" [CODE value [MISSING]]
...
WHEN condition THEN "label" [CODE value [MISSING]]
[ELSE "string" [CODE value [MISSING]|ELSE INTO NULL]
END
AS alias
[TITLE "string"]
[DESCRIPTION "string"]
[NOTES "string"];
Creating a variable with a variable fill:
CREATE CATEGORICAL
CASE
WHEN condition THEN VARIABLE alias
...
WHEN condition THEN VARIABLE alias
[ELSE "string" [CODE value [MISSING]|ELSE INTO NULL]
END
AS alias
[TITLE "string"]
[DESCRIPTION "string"]
[NOTES "string"];
Using REPLACE
You can also use REPLACE within the CATEGORICAL CASE command. The difference is that REPLACE does not include the AS... assignment at the end, which is used to indicated a new variable. Instead, the new variables are indicated at the beginning (before WHEN).
Everything else remains the same, including how it computes. The results are not stored on a new variable but instead overwrites the mentioned columns:
REPLACE CATEGORICAL
CASE alias,...,alias
WHEN condition THEN "label" [CODE value [MISSING]]
...
WHEN condition THEN "label" [CODE value [MISSING]]
[ELSE "string" [CODE value [MISSING]|ELSE INTO NULL]
END;
Use cases
There are two use cases in the following: the first use case uses constant output categories and the second one uses a variable fill.
- Allocating respondents to segments (example: combining two questions)
- Setting certain cases with missing or non-missing (example: fixing a survey skip—DP jargon sometimes refers to this as "rebasing a variable")
The data for both cases can be found in the example files (Core Trends Survey 2019) article.
Allocating respondents to segments
Suppose we have the following two variables, and the second variable is a confirmation of the speed status (a double-check in the survey). For analysis purposes, we want to combine this into one (and perhaps clean up the labels as well), which leads us to the third variable on the right:
This can be achieved with the following Crunch Automation script:
CREATE CATEGORICAL CASE
WHEN bbhome2 = "No, higher speed connection" OR bbhome1 = "Higher-speed" THEN "High speed"
WHEN bbhome2 = "Yes, dial-up" OR bbhome1 = "Dial-up" THEN "Dial up"
WHEN bbhome1 = "(VOL) Both Slow-speed/Dial-up and Higher-speed/Broadband" THEN "Both"
WHEN bbhome1 = "(VOL) Access internet only using cell phone or tablet" THEN "Cell/Phone tablet"
ELSE INTO NULL
END
AS bbhome_combined
TITLE "Confirmed Speed of subscription"
DESCRIPTION "Combing questions about home internet speed"
NOTES "Base: Only those with home internet access";
Setting certain cases with missing or non-missing
Consider the following variable, which has n=130 set to system missing (“No Data”). We want to create a categorical case where the system missing values are reallocated to a "No Internet" category. In that sense, the variable is "based on the total" (respondents in the data; i.e., all rows):