Crunch Lakehouse Format Data Model

This article describes the data layout in parquet files and contains the row level respondent data for surveys defined in the Crunch Logical Schema.

Data is stored in long form — aka row oriented instead of column oriented. As a result, this format scales to any size dataset..

Data columns

column	type	required
row_id	INT	true
var_name	STRING	true
var_type	STRING	true
axis	LIST<STRING>	false
categorical_value	STRING	false
numeric_value	DOUBLE	false
datetime_value	TIMESTAMP[us]	false
text_value	STRING	false

Examples

The first four examples are of atomic types. To save space, we show only a single non-null value field, though in the first row, categorical_value is "male" and the other three value fields are null. For age, numeric_value is 42 and the other three value fields are null:

row_id	var_name	var_type	axis	categorical_value
1	gender	categorical	NULL	"male"
1	age	numeric	NULL	42.0
1	firstname	text	NULL	"Joe"
1	birthdate	datetime	NULL	1970-01-01 00:00:00

row_id	var_name	var_type	axis	categorical_value
1	awareness	categorical	["apple"]	"yes"
1	awareness	categorical	["borland"]	"yes"
1	awareness	categorical	["corel"]	"no"

For array variables, axis is an array. The first example is a one dimensional categorical array for awareness of Apple, Borland, and Corel.

A two-dimensional categorical array (brand metric) is represented as:

row_id	var_name	var_type	axis	categorical_value
1	rating	categorical	["apple", "value"]	"1"
1	rating	categorical	["apple", "quality"]	"2"
1	rating	categorical	["borland", "value"]	"2"
1	rating	categorical	["borland", "quality"]	"3"

This requires support for variable length arrays in the axis column. (In a system without this feature, you could support up to a fixed number of dimensions by having separate fields for, say, the first and second axes.)

Help Center

Data columns

Examples

See also

Help Center

Data columns

Examples

See also

Related articles