Capping numeric variables
This article is part of The Definitive Guide to Uploading and Preparing Data
Sometimes you want to cap numeric variables at a certain amount (perhaps to remove the effect of outliers). For example, you have a distribution of numeric scores with most in the range 0 to 500, but there are some above 500 including into the 1000's. So you want create a variable that caps (records) the scores at 500.
In the web app
- Not available for capping
- Although you can set a range of missing values under Variable Properties. For example, rather than capping anything over 500 as 500, you could set those scores to missing.
Using R
- The best way to achieve capping is via a derived numeric case variable. It keeps your original source variable intact and creates a new variable. The new variable, being a derivation, will update if you add more rows (eg: in the context of a tracker).
- The following cap takes a numeric variable and caps it at a value of 10.
# Dev version of Crunch may be required
remotes::install_github("Crunch-io/rcrunch")ds$num_capped <- makeCaseWhenVariable(
ds$num > 10 ~ 10,
TRUE ~ ds$num,
name = "Num capped"
)
Crunch Automation
Example (same as the R example above) - capping a variable num at 10.
CREATE NUMERIC CASE
WHEN num > 10 THEN 10
ELSE VARIABLE num
END
AS num_capped
NAME "Num capped";