The following is an overview of the hypothesis testing in Crunch. For a detailed review, including watch-out’s, formulae and other considerations, please see this page.
We’ll start by outlining the different types of statistical testing available and then show how to interpret them, via worked example.
Crunch is Power Made Simple. The goal is to help identify possible differences that may be of interest in forming a story about the data. The goal of Crunch is not to replicate the statistical testing of other platforms (for which the methods can be outdated). We automatically adjust and apply the most appropriate statistical testing to the data at hand. Our team of expert statisticians ensure that the most appropriate test is applied in every case.
The different types of tests
Crunch offers hypothesis testing within the web app, and on export of the data into tabs/tab books (under the multitable function).
Within the web app, the asterisk icon in the Display Controller, tells you hypothesis testing is on.
For Tables
When hypothesis testing is on, cells are shaded according to the p-value. You’ll notice the scale at the bottom that describes the shading. In essence, the darker the shade, the lower the p-value (ie: the greater the level of significance). Green is positive, red is negative.
1) Cell comparisons in the web app (shading based on Z-scores)
The default view of the data is called cell comparisons. Cell comparisons are different to pair-wise column comparisons, which many researchers are familiar with from traditional tab books. In traditional tab books, column comparisons are represented by letters (A, B, C, D, etc) to show how one column compares to another (more on this below). Cell comparisons, by contrast, work differently by taking into account the whole table, as we’ll explain below.
2) Column comparisons in the web app (shading based on pairwise column tests)
If you click under a column when the shading is on, it will allow you to Set Comparisons. This compares that column pairwise to the other columns.
3) Column comparisons in tab books (letters that indicate pair-wise tests)
If you’re exporting a multitable (banner table) to a tab book, you have the choice of the shading options (as per the above), but also to use traditional tab book column comparison letters. Tab books offer other customizations as well.
How to interpret the Z-score shading (and why it's awesome)
Consider the table below, showing Internet Usage at Work by 4-category Age. The figures within the cells are column percentages: each column sums to 100%, and cells are showing the percentage of respondents within that column that fall in each category of the row variable. The marginal or All column shows the percentages of each category overall.
In this table most of the cells are shaded, but what do they mean? 43% of 18-29 year olds are spending over an hour a day online. We’ve indicated this cell as being significantly higher (p < .001) than if work internet usage were independent of Age.
If the row and column variables were completely independent or uncorrelated, the column percentages for every column of Age would exactly match those of the All column.
How to interpret?
In the case of column percentages, it is saying that the 43% is significantly higher than everyone who is not 18-29 years old. It is effectively a comparison between the 43% and the row marginal 38%,
So the 43% is actually being tested against the 37%, and is significantly higher (hence the dark green).
How cell comparisons are shaded
Cell comparisons are based of the Z-score. You can turn the Z-score on for a particular table (using the Display Controller), as per the below.
You can see that the Z-score is 10.80 for the 18-29 year olds, and it is the same but negative for the complementary cell (those who are NOT 18-29 years old, that is, 30+).
The Z-score is known as the standardized residual. Without getting into the computation, Z-scores are a measure of how different the cell is from what we would expect the cell to be based on the row and column average. You can read up more about Z-scores here if you wish.
Why are Z-scores so cool? Because unlike column-comparison letters, the Z-scores take into account the whole table. It’s not just looking at all the different combinations of column pairs. This makes it easier to spot interesting differences, and see trends, especially when you are glancing through lots of (big) tables.
Although many researchers are familiar with column comparison letters. Part of the reasons they have persisted is because, up until now, tab books were static and didn’t offer the interactivity if you wanted to do a specific pairwise column test. But you can do that at a click of a button with Crunch, which is the topic of the next section.
Does it matter if you have row or column-wise percentages?
No. The other benefit of using the Z-score is that it works both row-wise and column-wise. Look what happens if the rows of the table are collapsed as well into two categories:
You can see a perfect symmetry. So scores and shading indicate that there are disproportionately more 18-29 year olds amongst those who use the Internet over an hour a day, than there are who use the Internet less than an hour per day. The below table shows the row percentages.
So the 27% here is being indicated as statistically higher than the 22%. If we expand the row categories back out, the 27% is interpreted against everything that is not over an hour a day.
So even though 27% is lower than the 33% and 49%, that is not the point. There are lot of respondents (N=20,518) in the “Never” category. So that 14% is dragging the average of “Not over an hour a day” right down (to 22%, as per the above).
Z-scores, therefore, don’t change if you swap the rows and columns. As stated, they make it easier to spot trends. There’s a clear trend between age and use of the Internet at work in the example above. The reverse if true if you look at a table (this time of column percentages) of internet usage at HOME by age.
You can see that in the above, it is suggested that a greater proportion of the 65+ age groups are much more likely to be using the internet at home over an hour a day (presumably, because they are at home more!).
A word of caution though – with very large base sizes (as in the above), it is common for cells to be shaded as significant. Does that mean it’s an important result? Not necessarily. The 83% is not that much higher than the other age groups, and the vast majority of the 18-29 year olds are still using the internet for more than an hour a day. Please see our full article for a discussion on effect size, and how that plays a role in interpreting results alongside significance.
For Graphs
On graphs, Crunch shows confidence intervals instead of using colors since color on graphs usually has different semantic meaning.
What is a Confidence Interval?
A confidence interval can be thought of as an estimate plus-or-minus a certain amount. It’s a way to show the uncertainty around a survey result. For example, if you see a bar that shows a black vertical line (the “point estimate”) at 50%, and the confidence interval is plus-or-minus 5%, that means we’re reasonably sure (95% confident) that the ‘true’ population value lies between 45 and 55. “50%±5” or “[45, 55]” can be hard to read in a table, but works well in a graph. The richly colored bar around the black vertical line (the “confidence interval) represents this range.
Why are Confidence Intervals Important?
- Judge the Precision: The plus-or-minus value helps you grasp how precise the survey result is. A smaller plus-or-minus number means the result is more precise. Note that precision is not the same as being correct. A poorly worded question or a biased sample can still give misleading results, no matter how precise the estimate.
- Informed Decision Making: Knowing the range within which the true value probably falls (the confidence interval) can guide better decisions based on your survey data.
How to Interpret Confidence Intervals in Crunch
- Look at the Range: Each bar will have a black vertical line (the point estimate), which sits at the center of the confidence interval. This range is where the actual value is likely to be (95% confidence). Note that the whole range is equally likely, given the data. The interval is constructed around the point estimate, but the center cannot be considered any more more likely than either extreme.
- Check for Overlap: If the confidence intervals of two bars overlap, it approximately* means there’s not a statistically significant difference between the two groups or values you are comparing. You can see this overlap happening between Brand C and Brand D in the example above. Note, though, that statistical best practice is to not treat this significance distinction as binary. The size of the intervals and the degree of overlap are relevant (don’t hold a ruler up to the screen; that defeats the purpose of showing the whole interval).
- Consider the Width: The width of the confidence interval (the total range of the plus-or-minus values) can tell you a lot. A narrower interval (±2%, for example) means you can be more confident in the accuracy of the survey result than if the interval is wider (±10%).
Summary
- Confidence intervals are essentially your “best guess plus-or-minus a certain amount.”
- They provide a handy way to understand the accuracy and reliability of your data.
- If two intervals overlap, it suggests that the difference between the two values may not be significant at 95% confidence.
- The size of the plus-or-minus range offers additional clues about how much confidence you can have in the results.
✳️ Technical note: The overlap between independent confidence intervals is not exactly the same as the margin of error of the difference between the two. In general, the margin of error of the difference is larger because it considers the variance of both together rather than each on their own. If they look like they overlap, there is more than a 5% chance they are not actually different.