This article is part of The Definitive Guide to Importing and Preparing Data.
There is occasionally a need to recode case-level values to be valid or missing. This can involve recoding individual responses to another question in the survey (i.e., a case-level condition) or to the total sample.
Below are several examples:
- All respondents were asked which US State they live in, though the study covered China, the UK, and the USA. The US State question should be recoded for those who live in the USA.
- All respondents were asked which brands they would consider purchasing, but they should have only been exposed to the brands of which they were aware (e.g., a brand funnel question).
Crunch Automation is the best tool for this task as it is designed to deal with recoding at the case level.
Note: These commands all create new variables rather than replacing existing variables.
Recode Task + Command(s) to Use
There are different Crunch Automation commands to use depending on the recoding task and the type of variable. The table below lists several common recoding tasks and their corresponding commands. Common data processing terminology for the task is in parentheses. Use cases and sample scripts can be found below the table or by clicking the links in the Recoding Task column.
Variable Type |
Recoding Task |
Crunch Automation Command |
Categorical |
Recode System Missing to Total (Categorical) |
CREATE CATEGORICAL CASE |
Categorical |
Recode Values to Missing Based on Another Question (Categorical) |
CREATE CATEGORICAL CASE |
Categorical |
Setting Categories of a Variable as Valid/Missing (Categorical) |
|
Multiple Response |
Recode Subvariables to Missing if Nothing is Selected in the Question (Multiple Response) |
CREATE MULTIPLE DICHOTOMY WITH RECODE […] EXCLUDE EMPTY |
Multiple Response |
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Multiple Response) |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS |
Multiple Response |
Recode All Subvariables to the Same Condition (Multiple Response) |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS |
Categorical Array |
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Categorical Array) |
This task requires multiple steps:
|
Categorical Array |
Recode All Subvariables to the Same Condition (Categorical Array) |
This task requires multiple steps:
Note: The expression for the valid criteria will be the same for each subvariable. |
Categorical Array |
Setting Categories of a Variable as Valid/Missing (Categorical Array) |
|
Numeric |
Recode Values to Missing Based on Another Question (Numeric) |
CREATE NUMERIC CASE |
Numeric |
CREATE NUMERIC CASE |
|
Numeric Array |
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Numeric Array) |
This task requires multiple steps:
|
Numeric Array |
Recode All Subvariables to the Same Condition (Numeric Array) |
This task requires multiple steps:
Note: The expression for the valid criteria will be the same for each subvariable. |
Use Cases
Recode System Missing to Total (Categorical Variable)
In the following example, question 7 is a categorical variable (alias q7) representing a single selection of a respondent's type of employment (3 categories: full-time, part-time, other). The survey had a skip such that Q7 was only asked of those who said they were employed at question 6 (e.g., q6 = "Yes, I am employed in some form").
For analysis purposes, you may want to recode the missing values in q7 to become valid, so that you can determine the proportion of the total sample who are employed full-time, rather than the proportion of just those who are employed at all.
CREATE CATEGORICAL
CASE
WHEN is_missing(q7) THEN "Not employed" CODE 99
ELSE VARIABLE q7
END
AS q7_rc
TITLE "Employment Status"
DESCRIPTION "What is your employment status?"
NOTES "Base: All survey respondents";
Further documentation: CREATE CATEGORICAL CASE command
Recode Values to Missing Based on Another Question (Categorical)
In the following example, question 9 (alias q9) is a single-select rating of how much the respondent likes cola drinks. It was asked of everyone in the study, though for analysis purposes, it should be based on those who answered "Yes" (option 1) in question 1 - "Do you drink cola?". Since a survey skip was not included in the programming, you may want to recode certain cases as missing for question 9.
CREATE CATEGORICAL
CASE
WHEN q1 = 1 THEN VARIABLE q9
ELSE INTO NULL
END
AS q9_rc
TITLE "Affect Toward Cola"
DESCRIPTION "How much do you like cola drinks?"
NOTES "Base: Those who drink cola";
Further documentation: CREATE CATEGORICAL CASE command
Recode Subvariables to Missing if Nothing is Selected in the Question (Multiple Response)
In the following example, question 2 asked about cola brand consideration ("Which of the following brands of cola would you consider purchasing?"). You may want to create a new variable with all of the subvariables (aliases q2_1 to q2_6) recoded as missing, if a respondent/case does not provide a SELECTED (1 ="Yes") response in any of the source variables. This is commonly referred to as "rebasing to those answered".
Note: The EXCLUDE EMPTY argument is the key here.
CREATE MULTIPLE DICHOTOMY WITH RECODE
q2_1 (LABEL "Coca-Cola"),
q2_2 (LABEL "Diet Coke"),
q2_3 (LABEL "Coke Zero"),
q2_4 (LABEL "Pepsi"),
q2_5 (LABEL "Diet Pepsi"),
q2_6 (LABEL "Pepsi Max")
WITH
SELECTED 1
NOT SELECTED 0
EXCLUDE EMPTY
AS q2_rc
TITLE "Consideration"
DESCRIPTION "Which of the following brands of cola would you consider purchasing?"
NOTES "Base: Those who answered";
Further documentation: CREATE MULTIPLE DICHOTOMY WITH RECODE command
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Multiple Response)
In the following example, question 2 asked about cola brand consideration ("Which of the following brands of cola would you consider purchasing?"). You may want to create a new variable where each of the subvariables is recoded for missingness to account for when a respondent was not aware of the brand (question 1 in the study).
This is achieved by defining a VALID expression for each brand/subvariable with the criteria for that subvariable's base.
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 VALID q1_1 = 1 LABEL "Coca-Cola",
SELECT q2_2 = 1 VALID q1_2 = 1 LABEL "Diet Coke",
SELECT q2_3 = 1 VALID q1_3 = 1 LABEL "Coke Zero",
SELECT q2_4 = 1 VALID q1_4 = 1 LABEL "Pepsi",
SELECT q2_5 = 1 VALID q1_5 = 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 VALID q1_6 = 1 LABEL "Pepsi Max"
END
AS q2_rc
TITLE "Consideration"
DESCRIPTION "Which of the following brands of cola would you consider purchasing?"
NOTES "Base: Those aware of the respective brand";
Alternatively, this Crunch Automation command allows you to specify the MISSING condition if that is preferred (i.e., the reverse logic of the above example).
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 MISSING q1_1 != 1 LABEL "Coca-Cola",
SELECT q2_2 = 1 MISSING q1_2 != 1 LABEL "Diet Coke",
SELECT q2_3 = 1 MISSING q1_3 != 1 LABEL "Coke Zero",
SELECT q2_4 = 1 MISSING q1_4 != 1 LABEL "Pepsi",
SELECT q2_5 = 1 MISSING q1_5 != 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 MISSING q1_6 != 1 LABEL "Pepsi Max"
END
AS q2_rc
TITLE "Consideration"
DESCRIPTION "Which of the following brands of cola would you consider purchasing?"
NOTES "Base: Those aware of the respective brand";
Further documentation: CREATE MULTIPLE DICHOTOMY FROM CONDITIONS command
Recode All Subvariables to the Same Condition (Multiple Response)
In the following example, question 2 asked about cola brand consideration ("Which of the following brands of cola would you consider purchasing?"). You may want to create a new variable where all of the subvariables are being recoded with no missingness, which is often referred to as "rebasing to the total."
Since you are setting the same valid criteria for all of the subvariables, it is possible to define it on one line after the END command. In this example, the function is_valid() is being used with a variable that has no missing data (e.g., gender, age). This means any respondent who does not receive a "Selected" outcome will receive a valid (non-missing) "Other" response.
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
MAPPING
SELECT q2_1 = 1 LABEL "Coca-Cola",
SELECT q2_2 = 1 LABEL "Diet Coke",
SELECT q2_3 = 1 LABEL "Coke Zero",
SELECT q2_4 = 1 LABEL "Pepsi",
SELECT q2_5 = 1 LABEL "Diet Pepsi",
SELECT q2_6 = 1 LABEL "Pepsi Max"
END
VALID is_valid(gender)
AS q2_rc
TITLE "Consideration"
DESCRIPTION "Which of the following brands of cola would you consider purchasing?"
NOTES "Base: All respondents";
Alternatively, this Crunch Automation command allows you to specify the MISSING condition in the same way (i.e., the reverse logic of the above example) which can utilize a similar is_missing() function.
Further documentation: CREATE MULTIPLE DICHOTOMY FROM CONDITIONS command
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Categorical Array)
In the following example, question 4 was a rating scale of how much the respondent liked each brand of cola (e.g., Like very much, Somewhat like, Do not like at all). You may want to create a new array variable where each of the subvariables is recoded for missingness to account for when a respondent was not aware of the brand (question 1 in the study).
Notes:
- This task requires multiple steps. You will need to create new individual categorical variables with the correct bases, and then combine them into a new array variable.
- You can include HIDE INPUTS in the array definition to automatically hide the individual variables from the variable sidebar if they were created solely for this purpose.
CREATE CATEGORICAL
CASE
WHEN q1_1 = 1 THEN VARIABLE q4_1
ELSE INTO NULL
END
AS q4_1_rc
DESCRIPTION "Coca-Cola";
CREATE CATEGORICAL
CASE
WHEN q1_2 = 1 THEN VARIABLE q4_2
ELSE INTO NULL
END
AS q4_2_rc
DESCRIPTION "Diet Coke";
CREATE CATEGORICAL
CASE
WHEN q1_3 = 1 THEN VARIABLE q4_3
ELSE INTO NULL
END
AS q4_3_rc
DESCRIPTION "Coke Zero";
CREATE CATEGORICAL ARRAY
q4_1_rc, q4_2_rc, q4_3_rc
LABELS COPY(DESCRIPTION)
HIDE INPUTS
AS q4_array_rc
TITLE "Cola Brand Ratings"
DESCRIPTION "How much do you like each brand of cola?"
NOTES "Base: Those aware of the respective brand";
Further documentation: CREATE CATEGORICAL CASE command and CREATE CATEGORICAL ARRAY command
Recode All Subvariables to the Same Condition (Categorical Array)
In the following example, question 4 was a rating scale of how much the respondent liked each brand of cola (e.g., Like very much, Somewhat like, Do not like at all). You may want to create a new array variable where each of the subvariables is recoded to only those respondents who drink soda daily (question 2 in the study).
Notes:
- This task requires multiple steps. You will need to create new individual categorical variables with the correct bases, and then combine them into a new array variable.
- In this example, the condition after WHEN will be the same for each subvariable.
- You can include HIDE INPUTS in the array definition to automatically hide the individual variables from the variable sidebar if they were created solely for this purpose.
CREATE CATEGORICAL
CASE
WHEN q2 = 1 THEN VARIABLE q4_1
ELSE INTO NULL
END
AS q4_1_rc
DESCRIPTION "Coca-Cola";
CREATE CATEGORICAL
CASE
WHEN q2 = 1 THEN VARIABLE q4_2
ELSE INTO NULL
END
AS q4_2_rc
DESCRIPTION "Diet Coke";
CREATE CATEGORICAL
CASE
WHEN q2 = 1 THEN VARIABLE q4_3
ELSE INTO NULL
END
AS q4_3_rc
DESCRIPTION "Coke Zero";
CREATE CATEGORICAL ARRAY
q4_1_rc, q4_2_rc, q4_3_rc
LABELS COPY(DESCRIPTION)
HIDE INPUTS
AS q4_array_rc
TITLE "Cola Brand Ratings"
DESCRIPTION "How much do you like each brand of cola?"
NOTES "Base: Those who drink cola daily";
Further documentation: CREATE CATEGORICAL CASE command and CREATE CATEGORICAL ARRAY command
Recode Values to Missing Based on Another Question (Numeric)
In the following example, question 5 (alias q5) was a numeric open-ended question asking the respondent to enter the number of colas they drink in a typical week ("How many colas do you drink in a typical week?"). It was asked of everyone in the study, though for analysis purposes, it should be based on those who answered "Yes" (option 1) in question 1 - "Do you drink cola?". Since a survey skip was not included in the programming, you may want to recode certain cases as missing for question 5.
CREATE NUMERIC
CASE
WHEN q1 = 1 THEN VARIABLE q5
ELSE INTO NULL
END
AS q5_rc
TITLE "Number of Colas Per Week"
DESCRIPTION "How many colas do you drink in a typical week?"
NOTES "Base: Those who drink cola";
Further documentation: CREATE NUMERIC CASE command
Recode Missing Values to Zero (Numeric)
In the following example, question 5 (alias q5) was a numeric open-ended question asking the respondent to enter the number of colas they drink in a typical week ("How many colas do you drink in a typical week?"). The survey had a skip such that it was only asked of those who answered "Yes" (option 1) in question 1 - "Do you drink cola?".
For analysis purposes, you may want to recode the missing values in q5 to become valid as zeros, so that you can determine the average number of colas of the total sample, rather than the proportion of those who said that they drink cola.
CREATE NUMERIC
CASE
WHEN is_missing(q5) THEN 0
ELSE VARIABLE q5
END
AS q5_rc
TITLE "Number of Colas Per Week"
DESCRIPTION "How many colas do you drink in a typical week?"
NOTES "Base: All survey respondents";
Further documentation: CREATE NUMERIC CASE command
Recode All Subvariables Based on Another Question (Per Subvariable Level) (Numeric Array)
In the following example, question 6 was a series of numeric open-ended questions asking the respondent to enter the number of each brand of cola they drink in a typical week. You may want to create a new array variable where each of the subvariables is recoded for missingness to account for when a respondent was not aware of the brand (question 1 in the study).
Notes:
- This task requires multiple steps. You will need to create new individual numeric variables with the correct bases, and then combine them into a new array variable.
- You can include HIDE INPUTS in the array definition to automatically hide the individual variables from the variable sidebar if they were created solely for this purpose.
CREATE NUMERIC
CASE
WHEN q1_1 = 1 THEN VARIABLE q6_1
ELSE INTO NULL
END
AS q6_1_rc
DESCRIPTION "Coca-Cola";
CREATE NUMERIC
CASE
WHEN q1_2 = 1 THEN VARIABLE q6_2
ELSE INTO NULL
END
AS q6_2_rc
DESCRIPTION "Diet Coke";
CREATE NUMERIC
CASE
WHEN q1_3 = 1 THEN VARIABLE q6_3
ELSE INTO NULL
END
AS q6_3_rc
DESCRIPTION "Coke Zero";
CREATE NUMERIC ARRAY
q6_1_rc, q6_2_rc, q6_3_rc
LABELS COPY(DESCRIPTION)
HIDE INPUTS
AS q6_array_rc
TITLE "Number of Colas by Brand Per Week"
DESCRIPTION "How many of each brand of cola do you drink in a typical week?"
NOTES "Base: Those aware of the respective brand";
Further documentation: CREATE NUMERIC CASE command and CREATE NUMERIC ARRAY command
Recode All Subvariables to the Same Condition (Numeric Array)
In the following example, question 6 was a series of numeric open-ended questions asking the respondent to enter the number of each brand of cola they drink in a typical week. You may want to create a new array variable where each of the subvariables is recoded to only those respondents who drink soda daily (question 2 in the study).
Notes:
- This task requires multiple steps. You will need to create new individual numeric variables with the correct bases, and then combine them into a new array variable.
- In this example, the condition after WHEN will be the same for each subvariable.
- You can include HIDE INPUTS in the array definition to automatically hide the individual variables from the variable sidebar if they were created solely for this purpose.
CREATE NUMERIC
CASE
WHEN q2 = 1 THEN VARIABLE q6_1
ELSE INTO NULL
END
AS q6_1_rc
DESCRIPTION "Coca-Cola";
CREATE NUMERIC
CASE
WHEN q2 = 1 THEN VARIABLE q6_2
ELSE INTO NULL
END
AS q6_2_rc
DESCRIPTION "Diet Coke";
CREATE NUMERIC
CASE
WHEN q2 = 1 THEN VARIABLE q6_3
ELSE INTO NULL
END
AS q6_3_rc
DESCRIPTION "Coke Zero";
CREATE NUMERIC ARRAY
q6_1_rc, q6_2_rc, q6_3_rc
LABELS COPY(DESCRIPTION)
HIDE INPUTS
AS q6_array_rc
TITLE "Number of Colas by Brand Per Week"
DESCRIPTION "How many of each brand of cola do you drink in a typical week?"
NOTES "Base: Those who drink cola daily";
Further documentation: CREATE NUMERIC CASE command and CREATE NUMERIC ARRAY command