This article is part of The Definitive Guide to Uploading and Preparing Data.
There's often a need with surveys to recode values at the case level to valid or missing. This can involve individual responses to another question in the survey (ie: a case-level condition).
- For example, everyone in the survey was asked which US State they live in, but the study covered China, the UK and the USA.
- For example, all respondents were asked which of the following brands they would consider purchasing, but really they should only be exposed to the brands if they were aware of them (ie: like a brand funnel).
The best tool for this task is Crunch Automation, which is designed to deal with recoding at the case-level.
Recode task + command(s) to use
For each of the following recoding tasks, what is the current command to use, and where might we need more/better commands for greater efficiency. An example(s) is given for each use case. In parentheses is the common DP term for the tasks.
Variable Type |
Recoding use case |
Command to use |
Categorical |
Categorical variable - recode system missing to total |
CREATE CATEGORICAL CASE… THEN VARIABLE |
Categorical |
Categorical variable - recode values to missing based on another question |
CREATE CATEGORICAL CASE… THEN VARIABLE |
Categorical |
Categorical variable - setting categories within variable as valid or missing |
SET/UNSET MISSING (for proportions) SET/UNSET VALUES (for scale means) |
Multiple response |
Multiple response - recode subvariables to missing if not selected anything in the multiple response |
CREATE MULTIPLE DICHOTOMY WITH RECODE… EXCLUDE EMPTY |
Multiple response |
Multiple response - recode all subvariables to answers on another question on a per subvariable level |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS |
Multiple response |
Multiple response - recode all subvariables to the same condition |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
|
Categorical array |
CREATE CATEGORICAL CASE... THEN VARIABLE
|
|
Categorical array |
Categorical array - recode all subvariables to the same condition (‘rebase’) - this could be “to the total” or to a particular survey skip that applies to the entire array |
CREATE CATEGORICAL CASE... THEN VARIABLE
|
Categorical array |
Categorical array - setting categories within the array as valid or missing |
SET/UNSET MISSING (for proportions) SET/UNSET VALUES (for scale means) |
Numeric |
Numeric - recode values to missing based on another question |
CREATE NUMERIC CASE... THEN VARIABLE |
Numeric |
Numeric - recode missing values to zero |
CREATE NUMERIC CASE... THEN VARIABLE |
Numeric array |
Numeric array - recode subvariables based on any condition |
CREATE NUMERIC CASE... THEN VARIABLE |
Use Cases
Categorical variable - recode system missing to total
In the following example, question 7 is a categorical variable (q7) representing a single selection of your type of employment (3 categories: full-time, part-time, other). The survey had a skip such that it was only asked of those who said they had employment at question 6 (ie: q6 = "Yes, I am employed in some form"). For analysis purposes, we want to recode the missing values in q7 to become valid, so that the analyst can determine the proportion amongst the total sample who have full-time employment.
CREATE CATEGORICAL CASE
WHEN is_missing(q7) THEN "Not employed" CODE 99
ELSE VARIABLE q7
END
AS q7_rc
TITLE "Employment status"
NOTES "Base: amongst all survey respondents";
Categorical variable - recode values to missing based on another question
In the following example, question 9 is a single-select rating of how much you like cola drinks. It was asked of everyone in the study, whereas it should only have been asked of those who actually drink cola (which was at question 1 - "Do you drink cola?"). In other words, a survey skip was not included in the programming, and we want to recode certain cases to missing for question 9.
CREATE CATEGORICAL CASE
WHEN q1 = 1 THEN VARIABLE q9
ELSE INTO NULL
END
AS q9_rc
TITLE "Affect towards colas"
NOTES "Base: amongst those who drink cola";
Multiple response - recode subvariables to missing if not selected anything in the multiple response
In the following example, question 2 asked about consideration - which of the following brands of cola would you consider purchasing? A new variable is being created whereby all new subvariables are being recoded to missing if a case does not provides a SELECTED (1 ="Yes") response on any of the source variables. This is commonly referred to as "rebasing to those answered". The EXCLUDE EMPTY argument is the key here.
CREATE MULTIPLE DICHOTOMY WITH RECODE
q2_1 (SELECTED 1 LABEL "Coke"),
q2_2 (SELECTED 1 LABEL "Diet Coke"),
q2_3 (SELECTED 1 LABEL "Coke Zero"),
q2_4 (SELECTED 1 LABEL "Pepsi"),
q2_5 (SELECTED 1 LABEL "Diet Pepsi"),
q2_6 (SELECTED 1 LABEL "Pepsi Max")
EXCLUDE EMPTY
AS q2_rc
TITLE "Consideration"
NOTES "Base: Amongst those who answered";
Multiple response: recode all subvariables to answers on another question on a per subvariable level
In the following example, question 2 asked about consideration - which of the following brands of cola would you consider purchasing? A new variable is being created whereby each of the subvariables are being recoded to have missingness where a user was not aware of the brand (awareness is question 1 in the study). This is done by using the VALID command for each subvariable. Alternatively, the CREATE MULTIPLE DICHOTOMY FROM CONDITIONS command also allows you to specify the MISSING condition (ie: the reverse logic).
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 VALID q1_1 = 1 LABEL "Coke",
SELECT q2_2 = 1 VALID q1_2 = 1 LABEL "Diet Coke",
SELECT q2_3 = 1 VALID q1_3 = 1 LABEL "Coke Zero",
SELECT q2_4 = 1 VALID q1_4 = 1 LABEL "Pepsi",
SELECT q2_5 = 1 VALID q1_5 = 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 VALID q1_6 = 1 LABEL "Pepsi Max"
END
AS q2_rc
TITLE "Consideration"
NOTES "Base: All those aware of each respective brands";
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 MISSING q1_1 != 1 LABEL "Coke",
SELECT q2_2 = 1 MISSING q1_2 != 1 LABEL "Diet Coke",
SELECT q2_3 = 1 MISSING q1_3 != 1 LABEL "Coke Zero",
SELECT q2_4 = 1 MISSING q1_4 != 1 LABEL "Pepsi",
SELECT q2_5 = 1 MISSING q1_5 != 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 MISSING q1_6 != 1 LABEL "Pepsi Max"
END
AS q2_rc
TITLE "Consideration"
NOTES "Base: All those aware of each respective brands";
Multiple response - recode all subvariables to the same condition
In the following example, question 2 asked about consideration - which of the following brands of cola would you consider purchasing? A new variable is being created whereby all new subvariables are being recoded to have no missingness (often refered to as "rebasing to the total"). The same specification of valid is happening in one go using the VALID argument after END. In this example, the function is_valid() is being used on a variable that has no missing data (gender) - this means any respondent who does not receive a "Selected" outcome will receive a valid (non-missing) "Other" response. Alternatively, the CREATE MULTIPLE DICHOTOMY FROM CONDITIONS command also allows you to specify the MISSING condition (ie: the reverse logic).
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 LABEL "Coke",
SELECT q2_2 = 1 LABEL "Diet Coke",
SELECT q2_3 = 1 LABEL "Coke Zero",
SELECT q2_4 = 1 LABEL "Pepsi",
SELECT q2_5 = 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 LABEL "Pepsi Max"
END
VALID is_valid(gender)
AS q2_rc
TITLE "Consideration"
NOTES "Base: All respondents";
Categorical array - recode all subvariables to answers on another question on a per subvariable level
In the example below, question 1 was about awareness (of 6 brands). Question 3 was a rating scale of how much they like each brand. A new array variable q3_rc is being created such that each subvariable is only non-missing if they are aware of the respective brand.
CREATE CATEGORICAL CASE
WHEN q1_1 = 1 THEN VARIABLE q4_1
ELSE INTO NULL
AS q4_rc_1
DESCRIPTION "Coke";
CREATE CATEGORICAL CASE
WHEN q1_2 = 1 THEN VARIABLE q4_2
ELSE INTO NULL
AS q4_rc_2
DESCRIPTION "Pepsi";
CREATE CATEGORICAL CASE
WHEN q1_3 = 1 THEN VARIABLE q4_3
ELSE INTO NULL
AS q4_rc_3
DESCRIPTION "Fanta";
CREATE CATEGORICAL ARRAY
q4_rc_1, q4_rc_2, q4_rc_3
LABELS COPY(DESCRIPTION)
AS q4_array_rc
TITLE "Rating of brand"
NOTES "Base: Aware of each respective brand";
Categorical array - recode all subvariables to the same condition
See above - the condition (the part after WHEN) will be the same for all variables.
There is no one-step command available at the time this article was written.