Open
Description
CBOR (RFC7049, http://cbor.io/) is considered a binary version of JSON, however it implements a superset of functionality, including native dates, byte (octet) strings (JSON is UTF), integers, URIs, and different storage formats for floating point and fixed and variable sized integers.
For draft-5, we need not add features specific to CBOR, but consider the ways that CBOR might be used, and make sure there's no definitions in outright opposition to this goal.
Activity
yoshuawuyts commentedon Aug 30, 2015
This sounds reasonable.
handrews commentedon Sep 25, 2016
As I understand it, there would be only a few things involved in basic "support" for CBOR
It would be good to have even that very basic support in draft 05 so that people are encouraged to start playing with the concept.
The media type is particularly important to claim- if you're building a system that relies on standardized media types, you need one declared so that you can reasonably use it.
epoberezkin commentedon Nov 22, 2016
I don't understand what is the value. If CBOR is only used just an alternative way to encode JSON, why does it need to be mentioned in spec in any way? Why not YAML then? Anything that maps to JSON can be used instead of it, why bother mentioning it in the spec?
awwright commentedon Nov 23, 2016
YAML doesn't have a formal standard that we're able to reference - we'd have to reference the webpage, but I think if I picked one over another, CBOR has a well defined mapping to JSON, and it also demonstrates diversity a bit better because it's binary.
epoberezkin commentedon Nov 23, 2016
@awwright you are missing the point. I was just using YAML as equally non-sensical example. Why include in the standard the things that have nothing to do with it?
awwright commentedon Nov 23, 2016
As it relates to this issue, it helps avoid a monoculture -- JSON is a lot better than other media types for a lot of reasons, but other file formats might be better for different use cases.
Some might support comments, or be more human readable in general.
Some might be more compact or faster to parse, like a binary representation.
Some might be very more compact -- The EXI people have been in contact and interested in applying JSON Schema to JSON, where EXI is normally only used for XML.
So there's a variety of reasons you might want to support an alternate form of JSON-encodable data.
Why reference CBOR in particular? Because I think it's appropriate to issue an example that shows the breadth of what is supported, and CBOR is a standardized media type explicitly similar to JSON, that targets a different audience with a binary encoding.
epoberezkin commentedon Nov 23, 2016
@awwright I completely understand the desire to use other formats to represent JSON data. I disagree with the need to include it in JSON schema spec. JSON schema is JSON data. Users can use any format they wish that maps to JSON. It's a much more general question that needs to be discussed in this spec. Why not keep it simple? Everybody seems to like simple ...
epoberezkin commentedon Nov 24, 2016
To clarify: is there any aspect of using CBOR etc. to represent JSON schema that make it different from using it to represent any other JSON data? If there is, then it belongs to this spec. If there isn't, it would just litter the spec with trivial and general observations.
handrews commentedon Nov 24, 2016
@epoberezkin I think the change is to indicate the applicability of JSON Schema to media types other than JSON by citing the closest related interesting media type rather than encoding JSON Schema in CBOR (which needs no explanation).
I work with/have worked with teams that are very sensitive to performance constraints. Most of the time when I get people to actually measure JSON "overhead", it turns out to be insignificant. But occasionally it is a significant factor, or the environment is so constrained that nearly any improvement is significant (which is basically what CBOR was designed for).
I generally try to push people towards CBOR rather than protobuf or other RPC-oriented serialization formats, and I usually have to show a lot of evidence of it as a broadly accepted media type for JSON-compatible binary environments.
So for me, the presence of CBOR in the JSON Schema spec strengthens my hand when making that case, so this change is on that I consider very valuable. And it's a tiny statement that costs us basically nothing.
epoberezkin commentedon Nov 24, 2016
It increases the word count without changing anything. These applicability statements belong to separate publications rather than to the spec, because there are hundreds of other use cases and because they don't change anything from the spec perspective. But I'll leave this argument.
14 remaining items
handrews commentedon Feb 15, 2017
@mkovatsc thanks for the catch, I updated the comment
@awwright perhaps we should have a separate issue for encoding JSON Schema in CBOR? This seemed both important and reasonable at the WoT conference.
@mkovatsc could you file it as a separate issue and mention that use case? It's good to get things filed from people other than the usual suspects here :-)
mkovatsc commentedon Feb 24, 2017
I created #259 for concice encodings of JSON Schema, e.g., CBOR.
The use case I mentioned is for this issue here: JSON Schema used to describe CBOR documents.
awwright commentedon Mar 3, 2020
Now that JSON Schema is a bit more mature with vocabularies, I think I can narrow our considerations down to two things:
(1) Some parts of CBOR are cosmetic. For example, like how JSON allows you to represent the same number multiple ways (
4000
or4e3
), CBOR lets you encode numbers in different ways as well.JSON Schema could consider these purely cosmetic differences, and not enforce a difference between them. Most values in CBOR can be represented in JSON this way; some exceptions exist:
We would have to decide if implementations are allowed to accept a superset of JSON values for a "type" keyword, or if you must use a new keyword (since broadening the range of a type could cause problems—applications expect "number" to be real, and exclude NaN/inf).
(2) For the cases where the distinction really does matter for some strange reason, for example, a value must use a specific tag; we can build a $vocabulary and/or meta-schema that describes the requirement.
Also, https://tools.ietf.org/html/rfc8610 CDDL is now an RFC.
jtbandes commentedon Dec 15, 2022
Are there any common best practices today for representing CBOR-specific types (such as binary data) in JSON Schema?
gregsdennis commentedon Dec 15, 2022
@jtbandes at the moment, I think the best you have is using the
content*
keywords, which will produce annotations. You'd then need to read those annotations and deal with them in your application.jtbandes commentedon Dec 15, 2022
Yes, we've considered that. However, the spec states that the
content*
keywords are for data encoded as strings, which CBOR binary is not.gregsdennis commentedon Dec 15, 2022
I imagine if you have CBOR data, and you can get it into the JSON data model, any existing validator could handle it. (You don't need to translate the CBOR binary to JSON text, just get it into the data model in memory.) Outside of that, I don't see it being done without some special handling.
gregsdennis commentedon Dec 15, 2022
(But CBOR is a bit out of my knowledge space. I'm just reading up on it.)
jtbandes commentedon Dec 15, 2022
We don't need to get it into the JSON data model, really. The context is a web application that can parse and visualize data in many different encodings, of which JSON is one, but also Protobuf, FlatBuffers, CDR, and soon CBOR. Because JavaScript is such a dynamic language, a schema is not strictly required for all data, but the application has several features which are made available when schemas are known in advance of the data itself. (Currently we do not perform explicit validation using the schemas, but we assume the decoded data conforms to the given schema. In the case of JSON data with JSON Schema, we can also use the schema to know that we should treat a string with
contentEncoding: base64
as a binary buffer.) We support schema representations appropriate to each encoding, and are trying to determine if there is a schema description format that's appropriate for CBOR. JSON Schema seems like a good candidate except that it doesn't havetype
s to represent some of CBOR's features.gregsdennis commentedon Dec 15, 2022
You can always write a vocabulary with new keywords to support what isn't already. For example, you could have a
cborType
keyword that gives you what you need. You'd need to be working with an implementation that supports vocabs (or custom keywords at a minimum, but preferrably$vocabulary
).awwright commentedon Mar 28, 2023
I think this can be closed out by #1390 if we simply specify that a non-JSON format can be validated against a schema, if it defines an instance equality function that maps the format's non-JSON values to JSON values.
Note that once non-JSON formats get involved e.g. CBOR, sometimes instances that are not considered equal in CBOR will be considered equal in JSON instance equality, and this may be counter-intuitive (so think of it less like "equals" and more like "is distinguishable").