This article is part of The Definitive Guide to Uploading and Preparing Data.
There's often a need with surveys to recode values at the case level to valid or missing. This can involve individual responses to another question in the survey (ie: a case-level condition).
- For example, everyone in the survey was asked which US State they live in, but the study covered China, the UK and the USA.
- For example, all respondents were asked which of the following brands they would consider purchasing, but really they should only be exposed to the brands if they were aware of them (ie: like a brand funnel).
The best tool for this task is Crunch Automation, which is designed to deal with recoding at the case-level.
Recode task + command(s) to use
For each of the following recoding tasks, what is the current command to use, and where might we need more/better commands for greater efficiency. An example(s) is given for each use case. In parentheses is the common DP term for the tasks.
Variable Type |
Recoding use case |
Command to use |
Categorical |
Categorical variable - recode system missing to total |
CREATE CATEGORICAL CASE… THEN VARIABLE |
Categorical |
Categorical variable - recode values to missing based on another question |
CREATE CATEGORICAL CASE… THEN VARIABLE |
Categorical |
Categorical variable - setting categories within variable as valid or missing |
SET/UNSET MISSING (for proportions) SET/UNSET VALUES (for scale means) |
Multiple response |
Multiple response - recode subvariables to missing if not selected anything in the multiple response |
CREATE MULTIPLE DICHOTOMY WITH RECODE… EXCLUDE EMPTY |
Multiple response |
Multiple response - recode all subvariables to answers on another question on a per subvariable level |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS |
Multiple response |
Multiple response - recode all subvariables to the same condition |
CREATE MULTIPLE DICHOTOMY FROM CONDITIONS
|
Categorical array |
CREATE CATEGORICAL CASE... THEN VARIABLE
|
|
Categorical array |
Categorical array - recode all subvariables to the same condition (‘rebase’) - this could be “to the total” or to a particular survey skip that applies to the entire array |
CREATE CATEGORICAL ARRAY CASE FROM
|
Categorical array |
Categorical array - setting categories within the array as valid or missing |
SET/UNSET MISSING (for proportions) SET/UNSET VALUES (for scale means) |
Numeric |
Numeric - recode values to missing based on another question |
CREATE NUMERIC CASE... THEN VARIABLE |
Numeric |
Numeric - recode missing values to zero |
CREATE NUMERIC CASE... THEN VARIABLE |
Numeric array |
Numeric array - recode subvariables based on any condition |
CREATE NUMERIC CASE... THEN VARIABLE |
Use Cases
Categorical variable - recode system missing to total
In the following example, question 7 is a categorical variable (q7) representing a single selection of your type of employment (3 categories: full-time, part-time, other). The survey had a skip such that it was only asked of those who said they had employment at question 6 (ie: q6 = "Yes, I am employed in some form"). For analysis purposes, we want to recode the missing values in q7 to become valid, so that the analyst can determine the proportion amongst the total sample who have full-time employment.
CREATE CATEGORICAL CASE
WHEN is_missing(q7) THEN "Not employed" CODE 99
ELSE VARIABLE q7
END
AS q7_rc
TITLE "Employment status"
NOTES "Base: amongst all survey respondents";
Categorical variable - recode values to missing based on another question
In the following example, question 9 is a single-select rating of how much you like cola drinks. It was asked of everyone in the study, whereas it should only have been asked of those who actually drink cola (which was at question 1 - "Do you drink cola?"). In other words, a survey skip was not included in the programming, and we want to recode certain cases to missing for question 9.
CREATE CATEGORICAL CASE
WHEN q1 = 1 THEN VARIABLE q9
ELSE INTO NULL
END
AS q9_rc
TITLE "Affect towards colas"
NOTES "Base: amongst those who drink cola";
Multiple response - recode subvariables to missing if not selected anything in the multiple response
In the following example, question 2 asked about consideration - which of the following brands of cola would you consider purchasing? A new variable is being created whereby all new subvariables are being recoded to missing if a case does not provides a SELECTED (1 ="Yes") response on any of the source variables. This is commonly referred to as "rebasing to those answered". The EXCLUDE EMPTY argument is the key here.
CREATE MULTIPLE DICHOTOMY WITH RECODE
q2_1 (SELECTED 1 LABEL "Coke"),
q2_2 (SELECTED 1 LABEL "Diet Coke"),
q2_3 (SELECTED 1 LABEL "Coke Zero"),
q2_4 (SELECTED 1 LABEL "Pepsi"),
q2_5 (SELECTED 1 LABEL "Diet Pepsi"),
q2_6 (SELECTED 1 LABEL "Pepsi Max")
EXCLUDE EMPTY
AS q2_rc
TITLE "Consideration"
NOTES "Base: Amongst those who answered";
Multiple response: recode all subvariables to answers on another question on a per subvariable level
TBA
Multiple response - recode all subvariables to the same condition
TBA
Categorical array - recode all subvariables to answers on another question on a per subvariable level
In the example below, question 1 was about awareness (of 6 brands). Question 3 was a rating scale of how much they like each brand. A new array variable q3_rc is being created such that each subvariable is only non-missing if they are aware of the respective brand.
CREATE CATEGORICAL CASE
WHEN q1_1 = 1 THEN VARIABLE q4_1
ELSE INTO NULL
AS q4_rc_1
DESCRIPTION "Coke";
CREATE CATEGORICAL CASE
WHEN q1_2 = 1 THEN VARIABLE q4_2
ELSE INTO NULL
AS q4_rc_2
DESCRIPTION "Pepsi";
CREATE CATEGORICAL CASE
WHEN q1_3 = 1 THEN VARIABLE q4_3
ELSE INTO NULL
AS q4_rc_3
DESCRIPTION "Fanta";
CREATE CATEGORICAL ARRAY
q4_rc_1, q4_rc_2, q4_rc_3
LABELS COPY(DESCRIPTION)
AS q4_array_rc
TITLE "Rating of brand"
NOTES "Base: Aware of each respective brand";
Categorical array - recode all subvariables to the same condition
In the example below, question 1 was whether someone drank cola or not (1 = Yes, 2 = No). Question 3 was rating 6 cola brands on a scale (it is a categorical array question, q3_array). A new variable q3_array_rc is being created such that each subvariable is only non-missing if they drink cola (generally). In other words, the recoding use case here is 'fixing a survey skip' because only those who drink cola (generally) should be asked their opinions on the colas.
CREATE CATEGORICAL ARRAY CASE FROM
q3_array
WHEN my_responses1 == 1 THEN "NA" CODE 99 MISSING
END
AS q3_array_rc
TITLE "Ratings of brands"
NOTES "Base: amongst cola drinkers";