This document specifies Shoji, a JSON-based data interchange format.
Introduction
When we think of "objects" in reality, we often think of them as having a number of "attributes": an apple might have a red or green color, taste sweeter or more tart, be a Fuji or a Gala, be mine or yours, be made into cider or dipped in caramel.
When it comes to representing such objects, we find that these attributes are often used to categorize them into groups or to order them within a group. We might group apples by ownership and then order by sweetness.
Shoji is a JSON-based document format to represent large numbers of objects efficiently. "Entities" are used to represent individual objects which are collected into "catalogs" and arranged by "orders". For example, a "users" Catalog might refer to zero or more "user" Entities; each "user" Entity, for example, might possess "login_name" and "address" attributes, while the Catalog possesses "full name" and "status" attributes. Data projections are exposed via "views": arbitrary arrangements of values.
However, the attributes of an object are *not* always all exposed in a Shoji Entity. Attributes that are amenable to simple GET and PUT, which define the display and behavior of the object in isolation, are generally best represented in a Shoji Entity object. Attributes that govern how to arrange an object in relation to other objects of the same class are better represented in a Shoji Catalog. Attributes which are function outputs, especially from multiple objects as inputs (that is, extended projections), are best placed in a Shoji View.
This is somewhat contrary to the point of view of most object-oriented programming environments; for example, an Employee in an object-oriented program would likely be an instance of a class with the attributes, "name", "birthday", and "hours_worked()" as peers. But these would be better served as a Catalog that included the "name" attribute from all Employees, a separate set of Entities, each with a "birthday" attribute, and a single View (or configurable set of Views) showing "hours worked" calculated across multiple Employees.
In addition, a group of objects may itself be an object with particular attributes. For example, a set of "files" may be arranged in a hierarchy of "folders", each with a "name" that is editable and therefore distinct from its unique, permanent identifier (URI). These may then also be collected into larger groups with their own attributes.
Examples
A simple Shoji Catalog Object:
{
"element": "shoji:catalog",
"self": "<http://example.org/users/>",
"index": {
"1/": {},
"75/": {},
"133/": {},
}
}
A simple Shoji Entity Object:
{
"element": "shoji:entity",
"self": "<http://example.org/users/1/>",
"body": {
"last_modified": "2003-12-13 18:30:02Z",
"name": {"first": "Katsuhiro", "last": "Shoji"},
"sold_count": 387
}
}
A simple Shoji View Object:
{
"element": "shoji:view",
"self": "<http://example.org/users/1/sold_count>",
"value": 387
}
A more complex Shoji Catalog Object:
{
"element": "shoji:catalog",
"self": "<http://example.org/users/>",
"body": {
"title": "Users Catalog",
"description": "The set of user entities for this application.",
"updated": "2003-12-13T18:30:02Z"
},
"catalogs": {
"bills": "bills/",
"sellers": "sellers/",
"sellers by sold count": "sellers/{?sold_count}"
},
"orders": {
"default": "default_order"
},
"index": {
"1/": {"tags": ["active", "contacted"]},
"75/": {"tags": []},
"133/": {"tags": ["active"]},
},
"views": {
"Sold Counts": "sold_counts/"
}
}
...with an Order:
{
"element": "shoji:order",
"self": "<http://example.org/users/default_order>",
"graph": ["75/", "133/", "1/"]
}
...and a more complex Shoji View Object:
{
"element": "shoji:view",
"self": "<http://example.org/users/sold_counts/>",
"value": [[387, 18843], [3478, 999], [1, 18]]
}
Namespace and Version
For convenience, this format may be referred to as "Shoji 2.1". This specification uses "Shoji" internally.
Notational Conventions
This specification describes conformance in terms of four artifacts: Shoji Category Objects, Shoji Entity Objects, Shoji View Objects, and Shoji Order Objects.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [RFC2119], as scoped to those conformance targets.
The grammatical rules in this document are to be interpreted as described in ABNF [RFC4234]. Any tokens not defined herein are the same as those in JSON [RFC4627].
Design Goals
Shoji is designed to:
- Support delivery of arbitrary JSON payloads.
- Declare resource relations via hyperlinks.
- Ease client construction of hyperlinks.
- Allow linking to, and other metadata descriptions of, other media types.
- Support both complete dataset exposure and sparse, restricted exposure.
- Encourage caching in a way that minimizes synchronization issues.
- Have payloads be persisted to filesystems and databases without loss of information.
- Be independent of any particular device or implementation language.
Shoji Objects
This specification describes four kinds of Shoji Objects: Shoji Catalog Objects, Shoji Entity Objects, Shoji View Objects, and Shoji Order Objects.
- A Shoji Catalog Object is a representation of a Shoji catalog, including an index that maps entity IRIs to catalog attributes, metadata about the catalog itself, and references to any sub-catalogs and views associated with it.
- A Shoji Entity Object represents exactly one Shoji entity, including attributes contained within it, metadata about the entity itself, and references to any catalogs, fragments, and views associated with it.
- A Shoji View Object represents a read-only set of related attributes, which may be drawn from entities or catalogs, obtained via one of their views.
- A Shoji Order Object represents a (possibly partial and/or strict) order over a set of strings, often the URI keys of a catalog index.
All four kinds of Shoji Objects are specified in terms of JavaScript Object Notation ("JSON", specified in [RFC4627]). Shoji Objects MUST be well-formed JSON objects.
Common structures
Many of Shoji's objects share a few common structures. This section defines those structures and their requirements for convenient reference by the appropriate element definitions.
When a construct is identified as being of a particular kind, it inherits the corresponding requirements from that construct's definition in this section.
Element Declarations
All four kinds of Shoji Object (Catalog, Entity, View, Order) may be present in a given document. Parsers need a means of distinguishing them. To provide this, an "element" object member MUST be used to indicate the type of the object:
namespace = *char
local_name = *char
qualified_name = DQUOTE namespace ":" local_name DQUOTE
element_member = DQUOTE "element" DQUOTE name-separator qualified_name
For example, Shoji Catalog Objects are objects which include the member: {"element": "shoji:catalog"}. This is a generic, namespaced element declaration approach that we hope will be adopted by other JSON-based media types.
Attributes and Tuples
Catalog, Entity, and View objects utilize the same basic data structures, consisting of "attributes" and "tuples" from the relational model:
shojiAttribute = string
shojiAttributeValue = value
shojiAttributeMember = shojiAttribute name-separator shojiAttributeValue
shojiTuple =
begin-object
*1([ shojiAttributeMember *( value-separator shojiAttributeMember ) ])
end-object
Identifiers
An Identifier is a textual value whose content MUST conform to one of the productions in the IRI specification [RFC3987], encompassed in double-quotes:
shojiIRI = DQUOTE IRI DQUOTE
shojiIdentifier = DQUOTE IRI-reference DQUOTE
Example Identifiers:
"<http://www.example.com/listings/836759/views>"
"/listings/836759"
"/"
Note that every URI [RFC3986] is also an IRI, so a URI may be used wherever an IRI is specified. There are two special considerations:
- when an IRI that is not also a URI is given for dereferencing, it MUST be mapped to a URI using the steps in Section 3.1 of [RFC3987], and
- when an IRI is serving as a "self" value (see Section 2.5.1), it MUST NOT be so mapped, so that the comparison works as described in Section 2.1.3.1.
Comparing Identifiers
Because of the risk of confusion between IRIs that would be equivalent if they were mapped to URIs and dereferenced, the following normalization strategy SHOULD be applied when generating shojiIdentifier values:
* Provide the scheme, if included, in lowercase characters.
* Provide the host, if any, in lowercase characters.
* Only perform percent-encoding where it is essential.
* Use uppercase A through F characters when percent-encoding.
* Prevent dot-segments from appearing in paths.
* For schemes that define a default authority, use an empty authority
if the default is desired.
* For schemes that define an empty path to be equivalent to a path
of "/", use "/".
* For schemes that define a port, use an empty port if the default is
desired.
* Preserve empty fragment identifiers and queries.
* Ensure that all components of the IRI are appropriately character
normalized, e.g., by using NFC or NFKC.
Instances of shojiIdentifer values can be compared to determine whether an entity or catalog is the same as one seen before. Processors MUST compare shojiIdentifer values on a character-by-character basis (in a case-sensitive fashion). Comparison operations MUST be based solely on the IRI character strings and MUST NOT rely on dereferencing the IRIs or URIs mapped from them.
As a result, two IRIs that resolve to the same resource but are not character-for-character identical will be considered different for the purposes of identifier comparison.
For example, these are four distinct identifiers, despite the fact that they differ only in case:
<http://www.example.org/thing>
<http://www.example.org/Thing>
<http://www.EXAMPLE.org/thing>
<HTTP://www.example.org/thing>
Likewise, these are three distinct identifiers, because IRI %-escaping is significant for the purposes of comparison:
<http://www.example.com/~bob>
<http://www.example.com/%7ebob>
<http://www.example.com/%7Ebob>
IRI Patterns
In addition to IRI and IRI-reference tokens, Shoji processors MUST be prepared to parse IRI Pattern expansions:
op = "/" / ";" / "?"
varname = 1*( iunreserved )
required = "!"
vardefault = *( iunreserved / pct-encoded )
var = varname [ required / ( "=" vardefault ) ]
vars = var *( "," var )
expansion = "{" [ op ] vars "}"
And just to save someone looking up [RFC3987] yet again:
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
An IRI Pattern is a sequence of characters that contains any number of embedded expansions. Each expansion references one or more variables whose values are used when determining the substitution value for an expansion. An IRI Pattern becomes an IRI when all the expansions are substituted with their values (see 2.4.2). The generated IRI will be an IRI reference, i.e. either an absolute IRI or a relative reference.
shojiIRIPattern = DQUOTE IRI-Pattern DQUOTE
shojiIRIPatternArray =
begin-array
*1([ shojiIRIPattern *( value-separator shojiIRIPattern ) ])
end-array
shojiIRIPatternMember = string name-separator shojiIRIPattern
shojiIRIPatternObject =
begin-object
*1([ shojiIRIPatternMember *( value-separator shojiIRIPatternMember ) ])
end-object
IRI Pattern Object Names
When IRI Patterns are contained within a shojiIRIPatternObject, the name portion of each shojiIRIPatternMember identifies the semantics of the link. For example, an Entity that represents a User resource might include a single shojiCatalog member; the identified resource returns a Catalog, to be sure, but the IRIPattern which identifies that resource may be opaque. By supplying the member name "invoices", the Entity declares that the Catalog resource returns Invoice resources related to the current User resource. In this way, shojiIRIPatternObject names function similarly to link relation types found in other specifications. Authors MAY use existing registries of link relation types but are not bound to.
IRI Pattern Substitution
Shoji client processors MUST NOT dereference an IRI Pattern directly; instead, they MUST construct an IRI reference to dereference. This section describes, in prose, the algorithm for doing so.
For each var, if the corresponding variable is defined, substitute its value. The value MUST be a Unicode string. Processors are free to use any data types internally but MUST convert them to a representation as a Unicode string before substitution.
A given variable MAY appear in more than one expansion within a single IRI Pattern; if so, it MUST substitute the same value each time.
If a value for a given variable is not defined, and the var is required, the processor SHOULD raise an error and MUST NOT generate an IRI. If not defined, and the var is not required, substitute the default value if provided, otherwise omit the var from the production entirely.
If no operator is declared, join the substituted values within the expansion with no intermediate characters. For example, the IRI Pattern "{a!,b,c=3}" plus the value a=1 would produce the IRI-reference "13".
If the segment operator "/" is declared, join the substituted values within the expansion with "/" as a prefix character for each. For example, the IRI Pattern "foo{/a!,b,c=3}" plus the value a=1 would produce the IRI-reference "foo/1/3".
If the parameter operator ";" is declared, join each varname within the expansion with its substituted value using the "=" character; if the value is equal to the empty string, the "=" character is omitted. Then join each name-value pair with ";" as a prefix character. For example, the IRI Pattern "foo{;a!,b,c=3}" plus the value a=1 would produce the IRI-reference "foo;a=1;c=3".
If the querystring operator "?" is declared, join each varname within the expansion with its substituted value using the "=" character; the "=" character is included even if the value is the empty string. Then join each name-value pair with "&" as an intermediate character. Finally, prefix the entire production with the "?" character. For example, the IRI Pattern "foo{?a!,b,c=3}" plus the value a=1 would produce the IRI- reference "foo?a=1&c=3". If no vars are substituted (i.e. if they are all undefined and no default values are given), the "?" prefix MUST still be output. This allows IRI Patterns of the form "a{?b}c=3" to produce valid IRI's even where b is undefined (that is, "a?c=3", not "ac=3").
In order to assist HTTP caches in reducing variant copies, the order of query string variables in the expanded IRI SHOULD follow the order of variables declared in the expansion. Shoji servers MAY require this.
IRI Pattern Matching
Shoji servers MAY employ IRI Patterns to match IRI's to handlers. This section describes an algorithm (based on POSIX Extended Regular Expressions) for doing so. All slashes ("\") are literal.
Create two empty lists for storing discovered (name, default) pairs. Find expansions within the IRI Pattern using "\{[^}]+\}". Replace each one with a regular expression according to the first character after the "{". If the first character is one of the "op" characters, pop and store it. Split the rest of the expansion by the "," character, then split each atom on the "=" character, and store the resultant (name, default) pairs in order, one list for "?" arguments and another for all other operators, or no operator. If no default is given, store None or another sentinel. Depending on whether the variable is required (i.e. ends with the "!" character) or not, replace the expansion according to the following table:
Operator Required (!) Not Required
---------------------------------------------
/ (/[^/]*) (/[^/]*)?
; (;%s=?[^/;]*) (;%s=?[^/;]*)?
? Empty string Empty string
No operator (.+) (.+)?
The "%s" in either ";" expression in the above table must be replaced with the (escaped) name of the variable.
Once all expansions have been substituted, the resultant regular expression can be used to match the path portion of incoming Request-URI's. The query string portion can be matched against the stored list for "?" expansions in an order-neutral fashion. Shoji servers MAY require that query string substitutions be performed in the order given in the IRI Pattern, in order to reduce HTTP cache variants and to facilitate mapping substituted IRI Pattern values to positional arguments in the matching handler.
Reserved IRI members
Shoji objects may employ a variety of reserved object member names to declare that a given member's value is an IRI, an IRI-reference, or an array or object of IRI-references. Shoji objects MUST NOT include members with these reserved names with different semantics. Individual API documentation MAY denote additional IRI's with other reserved names or in other ways.
self
Shoji objects may employ a reserved object member to declare the IRI of the object itself:
shojiSelfIRI = DQUOTE "self" DQUOTE name-separator shojiIRI
The object member name "self" is reserved for this purpose. The value MUST be a shojiIRI. Note that the definition of "IRI" excludes relative references.
When a Shoji object is relocated, migrated, syndicated, republished, exported, or imported, the content of its self-member MUST NOT change. Put another way, a self member pertains to all instantiations of a particular Shoji entity or catalog; revisions retain the same content in their self members. It is suggested that the self-member be stored along with the associated resource.
The content of a shojiSelfIRI member MUST be created in a way that assures uniqueness.
Application authors should be aware of the many details of relative URI resolution described in Section 5.2 of [RFC2396]; in particular, that "all but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (rightmost) slash character, if any, are excluded." The two URI's "/users" and "/users/", for example, are different, may point to different resources, and will be combined with relative URI's differently.
For Shoji documents, this implies two different approaches depending on whether trailing slashes are preferred. If catalog resources employ trailing slashes, then their links to subcatalogs, entities, views, and orders may be simpler. For example, the Catalog resource at "/users/" may include a "bills" subcatalog. But if the Catalog is instead exposed at the URI "/users" (no trailing slash), then the subcatalog IRI-reference must be written as "users/bills" in order to be resolved correctly. Both forms are permitted by this specification; application authors are advised to use HTTP redirects to help enforce one or the other as canonical.
catalogs
Shoji Catalog and Entity objects may employ a reserved object member to declare the IRI's of related catalogs:
shojiCatalogs = DQUOTE "catalogs" DQUOTE name-separator shojiIRIPatternObject
A GET on any of the dereferenced IRI's in the array MAY result in a Shoji Catalog Object, or MAY result in a response with any other media type. Implementers are encouraged to use Shoji Catalog objects to gain their benefits where possible; however, not all resources are a good fit for the Shoji media type. The presence of an IRI in a "catalogs" member is merely an indication to the consumer that the IRI is considered to refer to a collection that is related to the current object.
views
Shoji Catalog and Entity objects may employ a reserved object member to declare the IRI's of related views:
shojiViews = DQUOTE "views" DQUOTE name-separator shojiIRIPatternObject
A GET on any of the dereferencedIRI's in the object MAY result in a Shoji View Object or any other media type. The inclusion of the IRI in the "views" member is a hint to the processor that the identified resource is a projection of data found elsewhere, and therefore should not be subject to the same expectations of atomicity, cache coherence, or mutability. Unlike body members (section 2.1.7), which are designed to return mostly a scalar or object as single attribute values, views are preferred for returning tabular data; that is, arrays of arrays.
orders
Shoji Catalog objects may employ a reserved object member to declare the IRI's of related orders:
shojiOrders = DQUOTE "orders" DQUOTE name-separator shojiIRIPatternObject
A GET on any of the dereferencedIRI's in the object MAY result in a Shoji Order Object or any other media type.
Note that the Catalog does not reserve a member for a single "default order" for either a link or an embedded Order object. Applications are free to define their own as desired; however, they should carefully weigh the benefits of embedding a potentially large order by default (which some applications may ignore), as well as the eventual pressure to provide multiple orders. Constraining the design to follow links in the "orders" object from the beginning eases these concerns immensely.
graph
Shoji Catalog and Order objects employ a reserved object member to declare the graph of the order they represent:
shojiGroupName = string
shojiGroup = begin-object
shojiGroupName name-separator shojiGroupArray
end-object
shojiGroupMember = string / shojiGroup
shojiGroupArray =
begin-array
*1([ shojiGroupMember *( value-separator shojiGroupMember ) ])
end-array
shojiGraph = DQUOTE "graph" DQUOTE name-separator shojiGroupArray
For example, the order represented by the following directed acyclic graph:
+------+--------+
V V V
f {X} {Y}
| |
| +---+---+
V V V V
d b {Q} a
|
+---+---+
V V
e c
...where the set members a, b, c, d, e, and f are various members of groups X, Y, and Q, is represented in a Shoji Order as:
{
"element": "shoji:order",
"graph": [
"f",
{"X": ["d"]},
{"Y": [
"b",
{"Q": ["e", "c"]},
"a"
]}
]
}
The "graph" member starts with an array of "top-level" members of the order. This allows orders with multiple root nodes to avoid creating a "dummy" root node, and also allows orders that have a single root node to name that node. Strings from the original set are included as is. Groups are represented as an object containing one member: the name of the group mapped to a further group array. Both may be interleaved in the graph as desired.
This is purposely generic in order to facilitate multiple orders over the same set of identifiers. For example, applications often allow each user to maintain their own order over a given set of entities. When an interface grows from one fixed order to multiple orders, it is easier to make the transition with a separate Order object.
Orders MAY be partial over a subset of a larger set as applications see fit. Members of the larger set which are not present in the order MAY be ignored, included in a "default" group, or follow some other processing rule; this specification does not govern such rules. Note that such members will have an undefined order, even if included in a group by default.
This specification does not define whether identifiers or groups of identifiers may belong to multiple groups within a given order. Applications MAY impose additional constraints on this.
body
Shoji Catalog and Entity objects may employ a reserved object member to declare a separation of data from metadata in order to reduce namespace collisions and ease parsing:
shojiBody = DQUOTE "body" DQUOTE name-separator shojiTuple
In general, when a body member is present in a document, processors SHOULD expect the attributes present within the body to be potentially mutable (via an HTTP PUT, PATCH, or POST, for example) and SHOULD expect any members outside the body member to be immutable: potentially writable on insert but not on update. Servers are free, of course, to allow or disallow mutability of any part of the resource. But the presence of a "body" member carries with it strong hints regarding the mutability of data both within and without that member.
Shoji Object Definitions
The "Catalog" Object
The "shojiCatalog" object acts as a container for metadata and data associated with the catalog. Its members consist of:
- a "self" member,
- other metadata members,
- an optional catalogs object (with references to related catalog documents),
- an optional views object (with references to related view documents), and
- an optional orders object (with references to related order documents):
- an optional body object (with attributes of the catalog itself),
- an optional index object (with references to its collected entity documents),
- an optional graph member (providing a default order to the index),
shojiCatalogElement = DQUOTE "element" DQUOTE name-separator "shoji:catalog" shojiCatalog = begin-object shojiCatalogElement value-separator shojiSelfIRI *1( value-separator shojiCatalogs ) *1( value-separator shojiViews ) *1( value-separator shojiOrders ) *1( value-separator shojiBody ) *1( value-separator shojiIndex ) *1( value-separator shojiGraph ) *( value-separator member ) end-object
Shoji catalog objects MAY follow the order of members given above; however, the JSON specification defines the "object" type as an unordered set of members, and Shoji Processors MUST be prepared to find self, entities, catalogs, views, orders, or other members in any order.
The "other metadata members" MAY include schemas for, or prototypes of, Shoji Entity objects, or IRI's identifying such. Their syntax is not mandated nor constrained by this specification.
Note in particular that the "index" member is optional. A common use of catalogs is as a collection of sub-catalogs or views, without itself referring to any entities directly.
index
Shoji Catalog objects may employ a reserved object member to declare the IRI's of collected entities, together with attributes that index those entities according to various arrangements:
shojiIndexMember = shojiIdentifier name-separator shojiTuple
shojiIndexObject =
begin-object
*1([ shojiIndexMember *( value-separator shojiIndexMember ) ])
end-object
shojiIndex = DQUOTE "index" DQUOTE name-separator (shojiIndexObject / null)
Catalog objects MAY collect zero, one, or any number of resources by presenting their identifiers in the "index" object. A dereference of any of the IRI's in the index object MAY result in a Shoji Entity Object, or MAY result in a response with any other media type. Implementers are encouraged to use Shoji Entity objects to gain their benefits where possible; however, not all resources are a good fit for the Shoji media type. Indeed, part of the motivation for placing attributes in each "index" tuple is to allow a Catalog to decorate a resource of a different media type with attributes in the Shoji format.
Each IRI-reference in the index maps to a shojiTuple. The attributes in these tuples are intended to be used for comparison and ordering by applications, in order to offer to the user or user-agent an interface for selecting one or more entities and navigating to their individual resources. For example, a collection of Project entities might include "name" and "department" attributes in an index, so that one application may display the list of projects grouped by department, while another application displays them in alphabetical order by name. The user or user-agent then selects a particular Project based on these attributes and dereferences its IRI to reach the next state.
This specification assigns no significance to the order of references within the index object. Applications are free to order and employ their collected attributes for their own purposes. Additional members or resources may also be used to provide order for the references in the index.
The use of null in a shojiIndex is only meaningful for use with HTTP PATCH (see 2.2.1.4 below). If spuriously present in a GET or PUT, the receiver SHOULD treat the given index member as if it were not present.
Collection versus containment
Although Catalog objects often "contain" entities, in the sense that the entities are considered to have no existence outside of the containing catalog, this is not always the case. In many circumstances, multiple catalogs MAY refer to the same entity. In other situations, a catalog will refer to one or more entities without containment; that is, the entity resources are considered to exist whether the referring catalog exists or not. We say the catalog "collects" the resources rather than "containing" them. The entity IRI might not share any path segments, or even host or scheme, with the catalog IRI. This difference can have profound effects on the design of an API.
For example, a "containing" catalog might be designed such that an HTTP POST of a Shoji Entity object to the catalog IRI results in a new entity resource being created, and its IRI appended to the catalog's "index" member. However, a different "collecting" catalog may be designed to be updated directly with an HTTP PATCH to the catalog IRI to add a new reference to a resource created via HTTP PUT or some other means. Others may be updated internally with no user-facing interface to alter them at all. A catalog merely indicates that the entities which it references are related somehow. Catalogs should indicate the process by which a client updates their index and the referenced entities (for example, in a "description" member documented by the API), but such mechanisms are outside the scope of this specification.
Representing objects at scale
A server or application may combine attributes from the index and attributes from an entity in order to obtain a unified internal representation of an "object" if it desires, but attributes that are duplicated complicate this immensely, especially in the presence of caching. By keeping them separate, clients can also remain unaware of whether the entities referenced by a given catalog reside on the same system or not, and the server is then free to distribute them to multiple partitions as it sees fit.
When a catalog contains only a few entities, all managed by the same host, then it seems simple to return a JSON object which maps ids to complete entities. But when the number of entities grows large, then the design must be modified to divide the entities among separate URIs, often on separate hosts.
However, it then quickly becomes prohibitively expensive for the server or client to gather data from each entity, especially if the entities are distributed among multiple hosts. A catalog of even 100 entities might then refer to 100 different hosts. The catalog resource's server-side implementation, or worse, the calling application, could try to poll every entity in every partition to obtain additional data, but that quickly degrades performance when the number of partitions grows large.
A server might instead try duplicating entity attributes in its own index, but that duplicated information then gets out of sync with the actual entities, especially in the presence of aggressive cache networks. An application might attempt to align catalog partitions with implementation partitions, but this is a design choice that is not universally applicable, such as the case where multiple catalogs are used to implement additional alternate indices. Instead, including attributes in the index allows the Catalog to function much like an "index-only query" in a relational database, where the request can be served in a single read rather than a separate read per object.
Although the catalog's "index" contains tuples and the entity's "body" member contains tuples, it is important for scale-agnostic applications that catalog tuples do not duplicate any attributes found in the referenced entities' tuples. This is in contrast to the typical use of an index in a relational database: there, one may use transactions to maintain integrity as the entity and index are updated with the same information. Although both two-phase commit and "eventually consistent" confirmation protocols do exist for web APIs, it is much easier to avoid the synchronization problem completely simply by not duplicating data. The IRI itself is the only exception to this strategy since it functions as the "primary key"; all the IRI's together constitute the primary index.
Using a Catalog with HTTP PATCH
The Shoji Catalog object is its own HTTP PATCH [RFC5789] format. When a server allows the PATCH method on a resource, the Catalog MAY be an acceptable payload with the following intent:
* This specification does not govern whether PATCH is allowed on a
non-existent resource or not. The server MAY allow this (according
to the rules below) or not as it sees fit.
* Only the "index", "body", or "graph" member is considered when mutating
the stored representation. No other members shall be altered using this
format. Other members MAY be sent in the request payload but SHOULD be
ignored.
* Any index tuple present in the stored representation but not in the
request payload is not altered.
* Any index tuple present in both the stored representation and the
request payload has its stored attributes overwritten by those in
the payload. This MAY result in simultaneous updates to an associated
Entity resource or other side-effects as the server desires. Any
attributes not mentioned in the request payload are not altered.
There is no facility for removing individual attributes.
There is no facility for updating only partial contents of an attribute.
* Any attribute present in the "body" tuple of the stored representation
but not in the request payload is not altered.
* Any attribute present in the "body" tuple of the request payload but not
in the stored representation shall be added. The server MUST accept any
attributes in the tuple which it expects, and SHOULD store any attributes
it does not expect, in order to allow clients to evolve independently.
The server is free to provide default values for attributes which are
not mentioned when PATCH operates on a null representation.
* Any body attribute present in both the stored representation and
the request payload is overwritten by the value in the payload.
There is no facility for removing individual attributes.
There is no facility for updating only partial contents of an attribute.
Adding and removing entities from a Catalog requires two different strategies depending on whether the Catalog is expected to contain its referenced entities or merely collect them (see 2.2.1.2 above).
If the Catalog collects its referenced entities, the following additional rules are applied to the index sent via PATCH:
* Any index tuple present in the request payload but not in the stored
representation shall be added. The server MUST accept any attributes
in the tuple which it understands, and SHOULD store any attributes it
does not understand, in order to allow clients to evolve independently.
The server is free to provide default values for attributes which are
not mentioned. This addition MAY result in additional Entity resources
being simultaneously created, or other side-effects as the server desires.
* If the value of the member in the request payload is JSON null instead
of a JSON object, then the entire tuple is removed from the stored
representation. This removal MAY result in the removal of associated
Entity resources, or other side-effects as the server desires.
The server SHOULD NOT raise an error if no such tuple exists
(use ETags with If-Match to manage lost updates in this case instead).
If the Catalog contains its referenced entities, POST a Shoji Entity object to the catalog URI instead to add a new entity. The server is free to remove any attributes from the submitted Entity and publish them in the Catalog index instead.
Note that the HTTP PATCH method is atomic, as per [RFC5789]. If a normative requirement is violated by a Shoji document sent with PATCH, or if an operation is not successful, evaluation of the document SHOULD terminate and application of the entire patch document SHALL NOT be deemed successful.
See [RFC5789], Section 2.2 for considerations regarding handling errors when Shoji is used with the HTTP PATCH method, including suggested status codes to use to indicate various conditions.
The "Entity" Object
The "shojiEntity" object acts as a container for data associated with the entity; that is, its contained attributes, sub-catalog identifiers, fragment identifiers, and view identifiers:
shojiEntityElement = DQUOTE "element" DQUOTE
name-separator "shoji:entity"
shojiEntity = begin-object
shojiEntityElement
value-separator shojiSelfIRI
*1( value-separator shojiCatalogs )
*1( value-separator shojiFragments )
*1( value-separator shojiViews )
*( value-separator member )
*1( value-separator shojiBody )
end-object
fragments
Shoji Entity objects may employ a reserved object member to declare the IRI's of fragment entities:
shojiFragments = DQUOTE "fragments" DQUOTE name-separator shojiIRIPatternObject
A GET on any of the dereferencedIRI's in the object MAY result in a Shoji Entity Object, or MAY result in a response with any other media type. The exact semantics of what constitutes a "fragment" of an entity of any media type is not defined by this specification.
Shoji fragment entities are entities that contain a subset of the attributes for a parent Entity. For example, if a single "user" Entity has public and private attributes, a parent Entity at /users/1234/ might contain the public attributes and possess a fragment at /users/1234/private/ containing private attributes; in this example, the parent Entity would possess a "fragments" member with the value {"Private members": "private/"}.
Parent Entities MAY include tuple values which are complete fragment Entity objects, or MAY include none, any or all of the attributes of a fragment in their own body tuple. However, implementers should be aware of the cache-invalidation problems which arise when duplicating data in multiple resources, and trade simplicity for convenience sparingly.
Using an Entity with HTTP PATCH
The Shoji Entity object is its own HTTP PATCH [RFC5789] format. When a server allows the PATCH method on a resource, the Entity MAY be an acceptable payload with the following intent:
* This specification does not govern whether PATCH is allowed on a
non-existent resource or not. The server MAY allow this (according
to the rules below) or not as it sees fit.
* Only the "body" member is considered when mutating the stored
representation. No other members shall be altered using this format.
Other members MAY be sent in the request payload but SHOULD be ignored.
* Any attribute present in the "body" tuple of the stored representation
but not in the request payload is not altered.
* Any attribute present in the "body" tuple of the request payload but not
in the stored representation shall be added. The server MUST accept any
attributes in the tuple which it expects, and SHOULD store any attributes
it does not expect, in order to allow clients to evolve independently.
The server is free to provide default values for attributes which are
not mentioned when PATCH operates on a null representation.
* Any body attribute present in both the stored representation and
the request payload is overwritten by the value in the payload.
There is no facility for removing individual attributes.
There is no facility for updating only partial contents of an attribute.
Note that the HTTP PATCH method is atomic, as per [RFC5789]. If a normative requirement is violated by a Shoji document sent with PATCH, or if an operation is not successful, evaluation of the document SHOULD terminate and application of the entire patch document SHALL NOT be deemed successful.
See [RFC5789], Section 2.2 for considerations regarding handling errors when Shoji is used with the HTTP PATCH method, including suggested status codes to use to indicate various conditions.
The "View" Object
The "shojiView" object acts as a container for a projection of data from multiple catalogs or entities:
shojiViewElement = DQUOTE "element" DQUOTE
name-separator "shoji:view"
shojiView = begin-object
shojiViewElement
value-separator shojiSelfIRI
*1( value-separator shojiViews )
*( value-separator member )
*1( value-separator shojiValue )
end-object
value
Shoji View objects may employ a reserved object member to declare a separation of data from metadata in order to reduce namespace collisions and ease parsing:
shojiValue = DQUOTE "value" DQUOTE name-separator value
Views generally contain generated or calculated output, and therefore are not designed to be used with HTTP PUT, POST, PATCH, or DELETE. The use of a "value" member in a View is simply to allow documents to distinguish between data and metadata, along whatever lines the domain requires.
The "value" member of a Shoji View Object contains a single JSON value, although that value may be an arbitrarily-complex object or array. It MAY contain embedded Catalog or Entity objects; however, authors MUST still follow the "self" requirement for these objects and should keep in mind that Catalog and Entity member values carry with them implied independent IRI's of their own. If an embedded Catalog or Entity is not intended to "stand on its own" outside the View (that is, answer to various HTTP methods on "self" and its member values' IRI's), authors are encouraged to emit tabular data rather than lists of Catalog or Entity objects, or move attributes into the appropriate Catalog index or Fragment body. The lack of a common idiom for returning arrays of objects is intentional because they lead to implementations that heavily duplicate data and rely on fine-grained queries for network efficiency. Shoji is designed to rely on caching for network efficiency.
The "Order" Object
The "shojiOrder" object provides an order over a set of strings using named subsets. It is primarily intended to provide an ordering over a catalog's index keys, such as "tagging" the entities to produce an ordered tree of groups. However, it may be used to order any set of string identifiers as an application sees fit.
shojiOrderElement = DQUOTE "element" DQUOTE
name-separator "shoji:order"
shojiOrder = begin-object
shojiOrderElement
*1( value-separator shojiSelfIRI )
*( value-separator member )
shojiGraph
end-object
IANA Considerations
The "application/shoji+json" media-type
A Shoji Document, when serialized as JSON, MAY be identified with the following media type:
- MIME media type name: application
- MIME subtype name: shoji+json
- Mandatory parameters: None.
- Optional parameters: None.
- Encoding considerations: Identical to those of "application/json" as described in [RFC4627], Section 3.
- Security considerations: As defined in this specification.
- Interoperability considerations: There are no known interoperability issues.
- Published specification: This specification.
- Applications that use this media type: No known applications currently use this media type.
Additional information:
- Magic number(s): None.
- File extension: .shoji
- Base URI: As specified in [RFC2396], Section 6.
- Macintosh File Type code: TEXT
- Person and email address to contact for further information: Robert Brewer fumanchu@aminus.org
- Intended usage: COMMON
- Restrictions on usage: None.
- Change controller: Robert Brewer fumanchu@aminus.org
Security Considerations
URIs
Shoji Processors handle URIs. See Section 7 of [RFC3986].
IRIs
Shoji Processors handle IRIs. See Section 8 of [RFC3987].
Spoofing
Shoji Processors should be aware of the potential for spoofing attacks where the attacker publishes a Shoji document with self or index members, perhaps with a falsified shojiIRI value duplicating the identity of another document. For example, a Shoji Processor could suppress the display of duplicate entries by displaying only one entry from a set of entries with identical self values. In that situation, the Shoji Processor might also take steps to determine whether the entries originated from the same publisher before considering them duplicates.
Shoji Document Parsing
Shoji Processors MUST NOT use Javascript eval() or similar mechanisms to parse Shoji Documents; doing so could expose them to malicious code from untrusted sources or XSS vectors. JSON requires shipped documents to be of top-most type "object" or "array", which ameliorates these vectors somewhat.
References
Normative References
- [RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels”, BCP 14, RFC 2119, March 1997.
- [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource" Identifiers (URI): Generic Syntax.", RFC 2396, August 1998.
- [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1”, RFC 2616, June 1999.
- [RFC3023] Murata, M. St.Laurent, S. Kohn, D. "XML Media Types", RFC 3023, January 2001.
- [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax”, STD 66, RFC 3986, January 2005.
- [RFC3987] Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs)”, RFC 3987, January 2005.
- [RFC4627] D. Crockford, "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006.
- [RFC5789] L. Dusseault, J. Snell, "PATCH Method for HTTP", RFC 5789, March 2010.
Informative References
- [RFC2434] Narten, T. and H. T. Alvestrand, “Guidelines for Writing an IANA Considerations Section in RFCs”, BCP 26, RFC 2434, October 1998.
- [RFC4234] Crocker, D. and Overell, P., “Augmented BNF for Syntax Specifications: ABNF”, RFC4234, October 2005.
Contributors
The following people contributed to preliminary versions of this document:
- Robert Brewer
Collected ABNF
This appendix is informative.
shojiIRI = DQUOTE IRI DQUOTE
shojiIdentifier = DQUOTE IRI-reference DQUOTE
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
op = "/" / ";" / "?"
varname = 1*( iunreserved )
required = "!"
vardefault = *( iunreserved / pct-encoded )
var = varname [ required / ( "=" vardefault ) ]
vars = var *( "," var )
expansion = "{" [ op ] vars "}"
shojiIRIPattern = DQUOTE IRI-Pattern DQUOTE
shojiIRIPatternArray =
begin-array
*1([ shojiIRIPattern *( value-separator shojiIRIPattern ) ])
end-array
shojiIRIPatternMember = string name-separator shojiIRIPattern
shojiIRIPatternObject =
begin-object
*1([ shojiIRIPatternMember *( value-separator shojiIRIPatternMember ) ])
end-object
shojiAttribute = string
shojiAttributeValue = value
shojiAttributeMember = shojiAttribute name-separator shojiAttributeValue
shojiTuple =
begin-object
*1([ shojiAttributeMember *( value-separator shojiAttributeMember ) ])
end-object
shojiIndexMember = shojiIdentifier name-separator shojiTuple
shojiIndexObject =
begin-object
*1([ shojiIndexMember *( value-separator shojiIndexMember ) ])
end-object
shojiSelfIRI = "self" name-separator shojiIRI
shojiCatalogs = "catalogs" name-separator shojiIRIPatternObject
shojiIndex = "index" name-separator (shojiIndexObject / null)
shojiFragments = "fragments" name-separator shojiIRIPatternObject
shojiViews = "views" name-separator shojiIRIPatternObject
shojiOrders = "orders" name-separator shojiIRIPatternObject
namespace = *char
local_name = *char
qualified_name = DQUOTE namespace ":" local_name DQUOTE
element_member = DQUOTE "element" DQUOTE name-separator qualified_name
shojiViewElement = DQUOTE "element" DQUOTE
name-separator "shoji:view"
shojiValue = DQUOTE "value" DQUOTE name-separator value
shojiView = begin-object
shojiViewElement
value-separator shojiSelfIRI
*1( value-separator shojiViews )
*( value-separator member )
*1( value-separator shojiValue )
end-object
shojiEntityElement = DQUOTE "element" DQUOTE
name-separator "shoji:entity"
shojiBody = DQUOTE "body" DQUOTE name-separator shojiTuple
shojiEntity = begin-object
shojiEntityElement
value-separator shojiSelfIRI
*1( value-separator shojiCatalogs )
*1( value-separator shojiFragments )
*1( value-separator shojiViews )
*( value-separator member )
*1( value-separator shojiBody )
end-object
shojiOrderElement = DQUOTE "element" DQUOTE
name-separator "shoji:order"
shojiGroupName = string
shojiGroup = begin-object
shojiGroupName name-separator shojiGroupArray
end-object
shojiGroupMember = string / shojiGroup
shojiGroupArray =
begin-array
*1([ shojiGroupMember *( value-separator shojiGroupMember ) ])
end-array
shojiGraph = DQUOTE "graph" DQUOTE name-separator shojiGroupArray
shojiOrder = begin-object
shojiOrderElement
*1( value-separator shojiSelfIRI )
*( value-separator member )
shojiGraph
end-object
shojiCatalogElement = DQUOTE "element" DQUOTE
name-separator "shoji:catalog"
shojiOrders = DQUOTE "orders" DQUOTE name-separator shojiIRIPatternObject
shojiCatalog = begin-object
shojiCatalogElement
value-separator shojiSelfIRI
*1( value-separator shojiCatalogs )
*1( value-separator shojiViews )
*1( value-separator shojiOrders )
*1( value-separator shojiBody )
*1( value-separator shojiIndex )
*1( value-separator shojiGraph )
*( value-separator member )
end-object