Capping numeric variables
This article is part of The Definitive Guide to Uploading and Preparing Data
Sometimes you want to cap numeric variables at a certain amount (perhaps to remove the effect of outliers). For example, you have a distribution of numeric scores with most in the range 0 to 500, but there are some above 500 including into the 1000's. So you want create a variable that caps (records) the scores at 500.
In the web app
- Not available for capping
- Although you can set a range of missing values under Variable Properties. For example, rather than capping anything over 500 as 500, you could set those scores to missing.
Crunch Automation
The best way to achieve capping is via a derived numeric case variable. It keeps your original source variable intact and creates a new variable. The new variable, being a derivation, will update if you add more rows (eg: in the context of a tracker).
The following cap takes a numeric variable (num) and caps it at a value of 10.
CREATE NUMERIC CASE
WHEN num > 10 THEN 10
ELSE VARIABLE num
END
AS num_capped
NAME "Num capped";