Overview
The term moving average (or rolling average) is used a lot in research to describe an approach to smoothing time-series data.
There are many different approaches to time-series smoothing, which may involve averaging of the results in a table in special ways (see this example). Because of this, Crunch is committed to best practice techniques for time-series smoothing (e.g., providing smoother lines directly on charts).
A traditional approach to time-series smoothing is to create a group of variables that represent overlapping bands of time. For example, if you have a variable that encodes the month, your "3-monthly moving average variable" may contain the categories Jan-Mar, Feb-April, Mar-May, and so on. When you use that as a crossbreak (columns), it reduces the differences between the periods (because adjacent columns contain the overlapping sample).
This article shows you how to create the overlapping banded time periods in Crunch (i.e., the traditional method). The following two options are described:
- Using the Crunch web app — The easiest way, but takes longer due to a lot of clicking.
- Using R — Faster, but you need to know how to use and adapt the R code for your dataset. For more information on how to use R, see this article.
How to create banded time variables in the Crunch web app
The Crunch web app provides the simplest and most straightforward way to produce variables. Though it may involve a lot of clicking, the upside is that you have visual control over what you're doing without having to know R code. However, Crunch recommends using R code instead if you need more control over automation.
For example, suppose you want to create banded time intervals, such that there are 3-months banded together at 1-month increments. Over a year period, it appears as:
- Jan–Mar
- Feb–Apr
- Mar–Jun
- Apr–Jul
- and so on
To summarize:
- Create a separate variable for each banded time period (repeat this step as many times as necessary).
- Put all the variables together as a multiple response variable.
Creating a separate variable for each banded time period
To create a separate variable:
- Click the Create Variable + button in the lower left of the screen.
- Select Categorical Variable.
- Select the time variable in the variable list.
- Check the periods you want to count.
- Click Save (giving it an appropriate name, such as "Jan–Mar").
Repeat steps 1 through 5 as many times as you need to create all the variables.
Creating a multiple response question with your variables
To create a multiple response question with your variables:
- Click the Create Variable + button in the lower left of the screen.
- Select Multiple Response.
- Select all the variables you created in the previous section.
- Select Selected to count on the right-hand side.
- Click Save (giving it an appropriate name).
After you save the Multiple Response, you can now see a variable as a crossbreak.
Though it requires a lot of clicking (especially if you have a lot of date/time variable to create), the Crunch web app is simple to use. For quicker results, you can instead use R to derive a new categorical variable that is set to the periods you need (e.g., monthly, weekly, quarterly), which is described in the following section.
Using R to expedite the process if you have a date/time variable
If you have a date/time variable, step one in the above process can take multiple clicks (possibly 60 or more) just for one of the sub-variables. For example, you may need to click on every day over a 3-month period, which is a cumbersome and time-consuming process.
To save time, you can instead use R to create a categorical version of the time-variable (rolled up into whichever time period is appropriate, e.g., monthly, weekly, quarterly, and so on).
The following describes how to use R code in this scenario:
- Log in to your dataset with R (e.g., with R Studio).
- Use the rollup() function with library(crunch) package, as shown in the following example:
library(crunch) ds$montly_roll <- rollup(ds$date, resolution = "M")
where:
- ds$monthly_roll
- ds is the dataset.
- monthly_roll is the alias of the variable you want to store as. You could call this whatever you want (quarterly_roll, cat_date, and so on).
- rollup()
- is the function we're using. The full documentation explains it in more detail.
- ds$date
- ds is the dataset.
- date is the alias of the date/time variable in your dataset (in this example, the source variable's alias was "date"). Remember, you can find (and change) the alias of your variable by going to the Properties of a variable card (Variable Summaries view of the web app).
- resolution
- is the banding you want to do. "M" means monthly. You have other options such as "Q" (quarterly), "W" (weekly), and so on. The full documentation is here.
Using R for further automation
See the following article for further information on how to automate (i.e., no clicking) the creation of multiple response variables: