Occasionally, it is useful to compare analyses from different sources. A common example is to define “benchmarks” for a given analysis, so that you can quickly compare an analysis to an established target. These are, in effect, one analysis laid over another in such a way that at least one of their dimensions lines up (and typically, using the same measures). These are also therefore defined in terms of cubes: one set which defines the base analyses, and another which defines the overlay.
For example, if we have an analysis over two categorical variables “88dd88” and “ee4455”:
{
"dimensions": [
{"variable": "../variables/88dd88/"},
{"variable": "../variables/ee4455/"}
],
"measures": {"count": {"function": "cube_count", "args": []}}
}
then we might obtain a cube with the following output:
1 | 2 | -1 | |
1 | 15 | 12 | 9 |
2 | 72 | 8 | 3 |
3 | 23 | 4 | 17 |
Let’s say we then want to overlay a comparison showing benchmarks for 88dd88 as follows:
1 | 2 | -1 | benchmarks | |
1 | 15 | 12 | 9 | 20 |
2 | 72 | 8 | 3 | 70 |
3 | 23 | 4 | 17 | 10 |
Our first pass at this might be to generate the benchmark targets in some other system, and hand-enter them into Crunch. To accomplish this, we need to define a comparison. First, we need to define the “bases”: the cube(s) to which our comparison applies, which in our case is just the above cube:
{
"name": "My benchmark",
"bases": [{
"dimensions": [{"variable": "88dd88"}],
"measures": {"count": {"function": "cube_count", "args": []}}
}]
}
Notice, however, that we’ve left out the second dimension. This means that this comparison will be available for any analysis where “88dd88” is the row dimension. The base cube here is a sort of “supercube”: a superset of the cubes to which we might apply the comparison. We include the measure to indicate that this comparison should apply to a “cube_count” (frequency count) involving variable “88dd88”.
Then, we need to define target data. We are supplying these in a hand-generated way, so our measure is simply a static column instead of a function:
{
"overlay": {
"dimensions": [{"variable": "88dd88"}],
"measures": {
"count": {
"column": [20, 70, 10],
"type": {"function": "typeof", "args": [{"variable": "88dd88"}]}
}
}
}
}
Note that our overlay has to have a dimension, too. In this case, we simply re-use variable “88dd88” as the dimension. This ensures that our target data is interpreted with the same category metadata as our base analysis.
We POST the above to datasets/{id}/comparisons/ and can obtain the overlay output at datasets/{id}/comparisons/{comparison_id}/cube/. See the Comparisons endpoint reference for details.