Using serde to produce Schema-like type Information. #345

eqv · 2016-05-27T13:48:17Z

Hey, is it possible to use the information serde already has to produce a type description.

e.g. lets say we have the follwing enum:

enum Foo {
   Bar{a: u64},
   Bla{b: u64}, 
}

If we encode Foo::Bla{a: 123} with msgpack we get something like [1,[123]].
I would like to get all the information that I would need to recover type and field names in a dynamic language. For example by calling something like describe!(Foo) which would yield something like:

{
    type: "enum", 
    name: "Foo", 
    cases:{
         0 => {
                      type: "enum_val", 
                      name: "Bar", 
                      fields:[{name: "a", type: "u64"}], 
                 },
         1 => {
                      type: "enum_val", 
                      name: "Bla", 
                      fields:[{name: "b", type: "u64"}], 
                 },
       } 
}

Is there an easy way of doing this with serde?

The text was updated successfully, but these errors were encountered:

oli-obk · 2016-05-27T14:03:35Z

Is there an easy way of doing this with serde?

Nope. Serde works with values of concrete types. You want to work with concrete types directly. Usually it's easier to simply wrap a macro around your type definitions. If you want to do it for types you don't have access to you need to go into rustc-plugin territory.

You can take inspiration from the serde_codegen crate (+ serde_macros) on how to work with aster to take away the heavy lifting.

eqv · 2016-05-27T14:12:40Z

Thank you! Too bad, this would make interaction with other languages A LOT easier.

oli-obk · 2016-05-27T14:14:05Z

If we encode Foo::Bla{a: 123} with msgpack we get something like [1,[123]].
I would like to get all the information that I would need to recover type and field names in a dynamic language.

That's just a msgpack thing. You can easily write a serializer with much more information. json and xml are much more verbose.

eqv · 2016-05-27T14:18:12Z

True. I just have the problem that including the names for each field in each and every instance is prohibitive expensive, so maybe that's just my problem.

erickt · 2016-05-29T19:35:17Z

Hi there! I reopened this because I've thought about doing this because this might be required for serializing into Avro and maybe Parquet, but I haven't really looked into it yet. Your use case is definitely interesting. If you end up trying to implement this, I think this would be very appropriate for it to land in serde.

dtolnay · 2017-01-24T04:19:25Z

Another use case of schema information is auto-generated documentation: #712.

adamvoss · 2017-03-19T20:33:08Z

I think this would also be useful as a JSON Schema Schema Generator.

vitiral · 2017-03-23T21:32:22Z

json schema is missing a few major features, which is enum types and defining custom types that can be composed together.

I have developed a specification for json types which I intend to use for jsoncmd, a specification which enables command line programs to be as simple to interact with as jsonrpc programs (actually easier -- since they are typed!)

The rough draft of the specification can be found here I think something like the "types" argument is the schema we should pursue. I am very open to feedback on changing it if others think there is a better way to represent types.

I would love to have this feature as part of serde, as it would make implementing jsoncmd type declaration trivial for rust users!

adamvoss · 2017-03-24T02:46:05Z

/Off-topic
@vitiral I am not experienced here, and you are probably right that JSON Schema does not meet your needs. ~~I did want to throw out that JSON Schema does have oneOf which I believe functions in place of enum (which existed in earlier drafts).~~ My understanding of enum and oneOf were incorrect, see my later comment. Or better yet see @handrews' comment since they are much more informed than I am (they contribute to the spec 👍).

vitiral · 2017-03-24T03:05:35Z

@vossad01 I didn't see that! It has allOf anyOf AND oneOf. I honestly can't see the value of the first two but it definitely meets my needs.

I want to point out that oneOf is not in the spec:
http://json-schema.org/latest/json-schema-core.html#rfc.section.5.1

I'd like to know where it's officially defined.

adamvoss · 2017-03-24T19:10:03Z

/Off-topic
@vitiral I stand corrected. enum is still there. oneOf (, allOf, and anyOf) is for checking a an object against schemas so I believe allOf could be used to represent type composition.

You linked to the core specification, whereas the members we are discussing are in the validation specification. My previous link was the latest (unreleased) validation specification which is on GitHub.

For further discussion, I'd recommend either opening an issue at json-schema-org/json-schema-spec or starting a conversation on the google group so we don't completely hijack this serde issue.

handrews · 2017-03-24T19:10:54Z

@vitiral that's the core spec. You want the validation spec: http://json-schema.org/latest/json-schema-validation.html We actually also support "enum" directly, with no restriction on the types of the values (they can be the same type or different types, including null).

Note that the current drafts (draft-wright-json-schema*-00) are about to be replaced by draft-wright-json-schema*-01, which will add "const" and "propertyNames" as well as a few more formats.

There is also the hyper-schema spec: http://json-schema.org/latest/json-schema-hypermedia.html but if you are interested in that I really recommend waiting for draft-wright-json-schema-hyperschema-01 as -00 had a number of problems.

vitiral · 2017-03-24T19:59:48Z

@handrews @vossad01 thanks! I would much prefer to use a pre-existing specification. That one looks good and I would love to see it in serde.

gavento · 2017-05-07T11:44:34Z

I would also like to have a similar feature (namely with formats like Parquet and ORC, for schema checking, creation etc.) but the main problem I see here is that the "schema" with custom de/serializers can depend on the values in an arbitrary way, e.g. now you can serialize Option as the tuples "(0,)" and "(1, number)". Wilder things than tagged enums are possible, of course (imagine encoding the number 6 as the depth of: [[[[[[]]]]]]). Another problem of any sufficiently general "schema language" even without these quirks is any kind of recursion which is actually required for any document-like structure.

I can imagine a trait declaring a simple "record" type only consisting of primitive types, homogenous arrays, fixed type tuples and records (with element names?), and possibly simply tagged enums. But think about the many schema specifications and imagine coming up with a new one being compatible with most of them.

Anything we would invent there would be highly opinionated and more of a project of its own. I am looking at making something in the spirit of Hive types, Parquet types or ORC types.

Another advantage to a new library would be the ability to gear it towards columnar storage, where each "record member" comes from a different stream rather than them all being serialized in one chunk.

handrews · 2017-05-07T16:55:25Z

@gavento

Another problem of any sufficiently general "schema language" even without these quirks is any kind of recursion which is actually required for any document-like structure.

What sort of problems do you see with recursion? JSON Schema supports it via "$ref".

gavento · 2017-05-10T23:09:54Z

@handrews

What sort of problems do you see with recursion? JSON Schema supports it via "$ref".

Thanks for the pointer! My point was more that with custom serializers (even possibly depending on the data), no schema may be able to capture the resulting structure. And even if the custom serde output is well-behaved, you need to specify the schema by hand rather than auto-derive it (imagine a type like Option)

My proposal would be to create another library with a trait like "WithSchema" that would let you declare or derive a json schema for your struct (and output it), and that also implies serde traits for it (giving you the serialization code itself almost for free).

This can be independent of serde and anyone can make a similar library for another schema spec. For example, I would be happier with a non-recursive type spec (to keep things simpler), and this way we can both have it.

radix · 2017-10-10T15:41:13Z

Since this ticket's discussion has gotten bogged down in specific details of specific schema formats, I think the best first step for this would be to have some methods that extract runtime structures describing the struct that had Serialize derived, instead of directly generating some serialized schema format.

The important part is that this runtime information would need to include information about the Serde attributes that were applied to each field/variant/type. This is important because these attributes affect the serialization format which you would need to know when generating a schema or auto-generating client code or whatever. This is why Serde itself probably needs to be the thing that supports this feature, instead of another more general "DeriveGeneric" crate.

At that point, people can write tools that process the type-info of Serde types into whatever schema formats (or auto-generate client code directly in other languages with this data), leaving these tricky format-specific problems separate from the core functionality of reflecting the type information.

An underspecified strawman:

enum SerdeType {
    Struct(SerdeStruct),
    Enum(SerdeEnum)
}
struct SerdeStruct {
    name: String,
    attributes: Vec<SerdeAttribute>,
    fields: Vec<SerdeField>
}
struct SerdeField {
    name: String,
    type: TypeDesc, // or maybe SerdeType itself? not sure exactly what this would contain
    attributes: Vec<SerdeFieldAttribute>,
}

then calling Serialize::type_info<MySerializeType>() would return a SerdeType that describes MySerializeType.

oli-obk · 2018-01-30T11:48:25Z

Unfortunately there's no way to add this backwardscompatibly without adding a default implementation. This would mean that all types that implement Serialize without derive would need to return a dummy value.

Not sure if we should return an Option or add an Unimplemented variant to SerdeType

then calling Serialize::type_info() would return a SerdeType that describes MySerializeType.

Actually that would be MySerializeType::type_info() or <MySerializeType as Serialize>::type_info().

dtolnay · 2018-05-07T08:03:38Z

I would like to see schema-based serialization explored in a high quality Avro or Parquet or other schema-requiring format library first. At that point we can see whether it makes sense to generalize across formats in a way that involves Serde. I would be interested to see a PR but I am not planning to pursue schema-like type information directly.

oooutlk · 2018-07-02T10:19:27Z

The reflection crate includes a test case to generating similar output:

https://github.com/oooutlk/reflection/blob/master/reflection_test/src/lib.rs#L151

FYI.

Ploppz · 2020-01-09T15:57:27Z

This feature would be extremely useful.
It's a pity if it's impossible to add it to serde in a backward-compatible way, because serde already has its nice data model. I predict that a separate effort to make a library for emitting type information might mirror the data model of serde to some extent.

That said, I think it's important to stress that any such effort, whether in serde or outside, should also be fully extensible to implement any format that describes types. For example typescript, and what I'm working with right now for example: typedload in Python.

The approach using reflection seems viable and could be useful for many applications. But there is one limitation that stops me from using it in my project: In the structs that I serialize to JSON which is given to both python and javascript parts of my application, I use serde attributes liberally. For example #[serde(flatten)] and #[serde(tag = "type")], but as you all know many more attributes are possible. I believe firmly that these attributes really need to be taken into account, and thus scheme/type definition generation is intimately tied to serde.

ma2bd · 2020-05-01T20:54:55Z

@Ploppz My colleagues and I gave it a shot and just published this crate: https://crates.io/crates/serde-reflection

Ploppz · 2020-05-05T15:06:14Z

Thanks a lot for this effort @MatBD! I really like your approach without proc macros - I had imagined proc macros might be needed but it seems like we can come a long way using the Deserialize trait.
I wonder if we can support #[serde(flatten)] as well.

eqv closed this as completed May 27, 2016

erickt reopened this May 29, 2016

dtolnay added the enhancement label Jun 6, 2016

dtolnay mentioned this issue Dec 24, 2016

Output JSON schema during build process serde-rs/json#176

Closed

oli-obk mentioned this issue Jan 23, 2017

Documenting serde types #712

Closed

adamvoss mentioned this issue Mar 24, 2017

Add specification links as page on website json-schema-org/json-schema-org.github.io#87

Merged

dtolnay closed this as completed May 7, 2018

BelfordZ mentioned this issue Feb 7, 2019

Rpc should implement OpenRPC service discovery spec etclabscore/emerald-vault#2

Closed

Timmmm mentioned this issue Nov 6, 2019

Code generation for other languages #1667

Closed

Ploppz mentioned this issue Apr 27, 2020

Introspection in serde #1785

Closed

therealfrauholle mentioned this issue Jun 2, 2023

Use Cases for a "self describing postcard" jamesmunns/postcard#92

Open

wiiznokes mentioned this issue Mar 20, 2024

[Question] Config File for Cosmic pop-os/cosmic-epoch#216

Open

Uh oh!

Using serde to produce Schema-like type Information. #345

Using serde to produce Schema-like type Information. #345

Comments

eqv commented May 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

oli-obk commented May 27, 2016

Uh oh!

eqv commented May 27, 2016

Uh oh!

oli-obk commented May 27, 2016

Uh oh!

eqv commented May 27, 2016

Uh oh!

erickt commented May 29, 2016

Uh oh!

dtolnay commented Jan 24, 2017

Uh oh!

adamvoss commented Mar 19, 2017

Uh oh!

vitiral commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamvoss commented Mar 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vitiral commented Mar 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamvoss commented Mar 24, 2017

Uh oh!

handrews commented Mar 24, 2017

Uh oh!

vitiral commented Mar 24, 2017

Uh oh!

gavento commented May 7, 2017

Uh oh!

handrews commented May 7, 2017

Uh oh!

gavento commented May 10, 2017

Uh oh!

radix commented Oct 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oli-obk commented Jan 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtolnay commented May 7, 2018

Uh oh!

oooutlk commented Jul 2, 2018

Uh oh!

Ploppz commented Jan 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ma2bd commented May 1, 2020

Uh oh!

Ploppz commented May 5, 2020

Uh oh!

eqv commented May 27, 2016 •

edited

Loading

vitiral commented Mar 23, 2017 •

edited

Loading

adamvoss commented Mar 24, 2017 •

edited

Loading

vitiral commented Mar 24, 2017 •

edited

Loading

radix commented Oct 10, 2017 •

edited

Loading

oli-obk commented Jan 30, 2018 •

edited

Loading

Ploppz commented Jan 9, 2020 •

edited

Loading