How to create a custom tabs automatically using R
The tab book export feature in Crunch (from Multitables view) produces a tab book for a single multitable, with a single filter (optional) and/or single weight (optional) applied. Whereas you can specify which variables you like, the export tab book feature in the web app doesn't allow you to create a series of custom tab books.
With R, however, you can create a series of custom tabs (as many as you like), which you can recombine into the one master custom tab book. This allows you to flexibly choose:
- whether you want one or more questions on each tab (going down the rows)
- whether you want a specific filter applied to a specific tab
- whether you want a specific weight applied to a specific tab
- allowing you to name a specific tab
The process is broken into distinct phases as follows:
- Set up your dataset: variable organizing, variable creation, multitable creation, filter creation, weight creation.
- Creating your tab book specification (in CSV) from the template attached here
- Running the R code in this article
- Recombining all the separate tabs into the one tab book (some suggestions for this in this article)
The first step - setting up your dataset - is covered sufficiently in many articles on this Help Center, so let's assume you've done that.
Creating your tab book specifications (in CSV template)
Creating your tab book spec is easy - simple download this CSV template: tab_book_spec_example.csv
- Each row (starting from the second row) is a separate tab.
- In the rows columns you will specify which variables you want in the rows. In the R scripts provided there is a choice of specifying by Name (as per the image above) or by Alias. See below for the R Scripts. If you want multiple variables going down the rows on the one tab, simply using a comma to separate it. So in the example picture above, 3 variables are specified by name on row 2.
- In the multitable column, specify the name of your multitable
- Likewise for the filter and weight columns.
- The final column, title, can be whatever you like - this is simply the name of the output Excel file, which will have a number in front of it. So, for example, the second row in the above example will come out as "1 - My first table.xlxs"
Remember: R is cAsE SENitive... so be sure to be exact with upper and lower case characters and the use of spaces. You can't have any commas in the name of the variable (obviously).
Running the R code
Once you save the CSV into your working directory (which you can set from R Studio under the Session menu), then you need to run this code in the attached R scripts. There are two R scripts here. One allows you to specify the rows variables by Name and the other by Alias.
Next, as always, log in and load your dataset.
You change this line of code to reflect your CSV name.
tab_spec <- "tab_book_spec_example.csv"
And then, as per the instructions in the R script, run the rest of the code.
The result will be a folder within your working directory, will all of the tabs as separate Excel files. The folder's name is set automatically to be the same as your dataset.
If you want to further customize the code to include different statistics, this is possible. Please see the notes below.
Recombining the individual tabs into a master tab book
Once you have all the individual tabs as Excels, it's up to you to recombine them into a tab book.
There are number of ways you can do this, and it is up to you which you prefer.
If you wish to continue to use R, there may be packages that you wish to explore available on CRAN.
One approach is to use an extension plug-in to Excel. An example of this is KuTools. It's very easy to use - with selecting all the files (or the folder) and then asking it to combine each file as a separate tab.
Customizing the tab outputs
If you want to customize which statistic(s) is being shown, this is possible.
Within the tabBook() function, you can include the following lines of code to customize the tabs. Some of the most commonly requested customizations are below, but you can see the full list here.
The following line allows you to specify which statistics to include or not include:
fields = list("col_percent", "count_unweighted")
The following line allows you to specify whether you want the statistics on top of each other, rather than side-by-side:
page_layout = list(measure_layout = "long")
You can also set specific statistics to appear in the top and bottom rows as part of the page layout. This is the rows list that is a subset of the page_layout. The rows can include the column base sizes as well as scale means, standard deviations and standard errors of scales. These are shown in the example below in blue text and a further explanation given at the end of this article.
The following line allows you to specify at which level of significance you want column-wise comparison letters. In the example below, using list() allows you to specify two-levels of significance with UPPER and lower case letters.
sig_threshold = list(0.05, 0.01)
So using both examples above, the lines of code included in the tabBook() function (line 66) would be:
multitable = mt,
dataset = ds[, aliases(variables(ds)[vars])],
weight = weight,
output_format = "xlsx",
file = title_str,
format = list(decimal_places = list(percentages = 1, other = 2)),
doc_layout = list(variable_sheets="one_sheet"),
filter = filter,
fields = list("col_percent", "count_unweighted"),
page_layout = list(
measure_layout = "long",
rows = list(
top = list("base_weighted", "base_unweighted"),
bottom = list("scale_mean", "scale_std_dev", "scale_std_err")
sig_threshold = list(0.05, 0.01)
Regarding the options available for the rows argument:
- Rows can contain a “top” array and a “bottom” array which specify header/footer rows to appear; both are optional.
- “top” can contain any or all of “base_unweighted” and “base_weighted”. Ordering of the count summary-rows is respected.
- “bottom” can contain any or all of “base_weighted”, “base_unweighted”, “significant_columns”, “scale_mean”, “scale_std_dev”, “scale_std_err” and “scale_median”.
- Ordering of the count summary-rows is respected.
- Specifying either “base_weighted” or “base_unweighted” in either the “top” or “bottom” array will suppress the automatic appearance of a default base summary row at the bottom. (The default base summary row is either the weighted or unweighted base, depending on the measure.) In that case, only explicitly specified count summary-rows will appear.