Skip to content

Using serde to produce Schema-like type Information. #345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eqv opened this issue May 27, 2016 · 23 comments
Closed

Using serde to produce Schema-like type Information. #345

eqv opened this issue May 27, 2016 · 23 comments

Comments

@eqv
Copy link

eqv commented May 27, 2016

Hey, is it possible to use the information serde already has to produce a type description.

e.g. lets say we have the follwing enum:

enum Foo {
   Bar{a: u64},
   Bla{b: u64}, 
}

If we encode Foo::Bla{a: 123} with msgpack we get something like [1,[123]].
I would like to get all the information that I would need to recover type and field names in a dynamic language. For example by calling something like describe!(Foo) which would yield something like:

{
    type: "enum", 
    name: "Foo", 
    cases:{
         0 => {
                      type: "enum_val", 
                      name: "Bar", 
                      fields:[{name: "a", type: "u64"}], 
                 },
         1 => {
                      type: "enum_val", 
                      name: "Bla", 
                      fields:[{name: "b", type: "u64"}], 
                 },
       } 
}

Is there an easy way of doing this with serde?

@oli-obk
Copy link
Member

oli-obk commented May 27, 2016

Is there an easy way of doing this with serde?

Nope. Serde works with values of concrete types. You want to work with concrete types directly. Usually it's easier to simply wrap a macro around your type definitions. If you want to do it for types you don't have access to you need to go into rustc-plugin territory.

You can take inspiration from the serde_codegen crate (+ serde_macros) on how to work with aster to take away the heavy lifting.

@eqv
Copy link
Author

eqv commented May 27, 2016

Thank you! Too bad, this would make interaction with other languages A LOT easier.

@oli-obk
Copy link
Member

oli-obk commented May 27, 2016

If we encode Foo::Bla{a: 123} with msgpack we get something like [1,[123]].
I would like to get all the information that I would need to recover type and field names in a dynamic language.

That's just a msgpack thing. You can easily write a serializer with much more information. json and xml are much more verbose.

@eqv
Copy link
Author

eqv commented May 27, 2016

True. I just have the problem that including the names for each field in each and every instance is prohibitive expensive, so maybe that's just my problem.

@eqv eqv closed this as completed May 27, 2016
@erickt erickt reopened this May 29, 2016
@erickt
Copy link
Member

erickt commented May 29, 2016

Hi there! I reopened this because I've thought about doing this because this might be required for serializing into Avro and maybe Parquet, but I haven't really looked into it yet. Your use case is definitely interesting. If you end up trying to implement this, I think this would be very appropriate for it to land in serde.

@dtolnay
Copy link
Member

dtolnay commented Jan 24, 2017

Another use case of schema information is auto-generated documentation: #712.

@adamvoss
Copy link

I think this would also be useful as a JSON Schema Schema Generator.

@vitiral
Copy link

vitiral commented Mar 23, 2017

json schema is missing a few major features, which is enum types and defining custom types that can be composed together.

I have developed a specification for json types which I intend to use for jsoncmd, a specification which enables command line programs to be as simple to interact with as jsonrpc programs (actually easier -- since they are typed!)

The rough draft of the specification can be found here I think something like the "types" argument is the schema we should pursue. I am very open to feedback on changing it if others think there is a better way to represent types.

I would love to have this feature as part of serde, as it would make implementing jsoncmd type declaration trivial for rust users!

@adamvoss
Copy link

adamvoss commented Mar 24, 2017

/Off-topic
@vitiral I am not experienced here, and you are probably right that JSON Schema does not meet your needs. I did want to throw out that JSON Schema does have oneOf which I believe functions in place of enum (which existed in earlier drafts). My understanding of enum and oneOf were incorrect, see my later comment. Or better yet see @handrews' comment since they are much more informed than I am (they contribute to the spec 👍).

@vitiral
Copy link

vitiral commented Mar 24, 2017

@vossad01 I didn't see that! It has allOf anyOf AND oneOf. I honestly can't see the value of the first two but it definitely meets my needs.

I want to point out that oneOf is not in the spec:
http://json-schema.org/latest/json-schema-core.html#rfc.section.5.1

I'd like to know where it's officially defined.

@adamvoss
Copy link

/Off-topic
@vitiral I stand corrected. enum is still there. oneOf (, allOf, and anyOf) is for checking a an object against schemas so I believe allOf could be used to represent type composition.

You linked to the core specification, whereas the members we are discussing are in the validation specification. My previous link was the latest (unreleased) validation specification which is on GitHub.

For further discussion, I'd recommend either opening an issue at json-schema-org/json-schema-spec or starting a conversation on the google group so we don't completely hijack this serde issue.

@handrews
Copy link

@vitiral that's the core spec. You want the validation spec: http://json-schema.org/latest/json-schema-validation.html We actually also support "enum" directly, with no restriction on the types of the values (they can be the same type or different types, including null).

Note that the current drafts (draft-wright-json-schema*-00) are about to be replaced by draft-wright-json-schema*-01, which will add "const" and "propertyNames" as well as a few more formats.

There is also the hyper-schema spec: http://json-schema.org/latest/json-schema-hypermedia.html but if you are interested in that I really recommend waiting for draft-wright-json-schema-hyperschema-01 as -00 had a number of problems.

@vitiral
Copy link

vitiral commented Mar 24, 2017

@handrews @vossad01 thanks! I would much prefer to use a pre-existing specification. That one looks good and I would love to see it in serde.

@gavento
Copy link

gavento commented May 7, 2017

I would also like to have a similar feature (namely with formats like Parquet and ORC, for schema checking, creation etc.) but the main problem I see here is that the "schema" with custom de/serializers can depend on the values in an arbitrary way, e.g. now you can serialize Option as the tuples "(0,)" and "(1, number)". Wilder things than tagged enums are possible, of course (imagine encoding the number 6 as the depth of: [[[[[[]]]]]]). Another problem of any sufficiently general "schema language" even without these quirks is any kind of recursion which is actually required for any document-like structure.

I can imagine a trait declaring a simple "record" type only consisting of primitive types, homogenous arrays, fixed type tuples and records (with element names?), and possibly simply tagged enums. But think about the many schema specifications and imagine coming up with a new one being compatible with most of them.

Anything we would invent there would be highly opinionated and more of a project of its own. I am looking at making something in the spirit of Hive types, Parquet types or ORC types.

Another advantage to a new library would be the ability to gear it towards columnar storage, where each "record member" comes from a different stream rather than them all being serialized in one chunk.

@handrews
Copy link

handrews commented May 7, 2017

@gavento

Another problem of any sufficiently general "schema language" even without these quirks is any kind of recursion which is actually required for any document-like structure.

What sort of problems do you see with recursion? JSON Schema supports it via "$ref".

@gavento
Copy link

gavento commented May 10, 2017

@handrews

What sort of problems do you see with recursion? JSON Schema supports it via "$ref".

Thanks for the pointer! My point was more that with custom serializers (even possibly depending on the data), no schema may be able to capture the resulting structure. And even if the custom serde output is well-behaved, you need to specify the schema by hand rather than auto-derive it (imagine a type like Option)

My proposal would be to create another library with a trait like "WithSchema" that would let you declare or derive a json schema for your struct (and output it), and that also implies serde traits for it (giving you the serialization code itself almost for free).

This can be independent of serde and anyone can make a similar library for another schema spec. For example, I would be happier with a non-recursive type spec (to keep things simpler), and this way we can both have it.

@radix
Copy link

radix commented Oct 10, 2017

Since this ticket's discussion has gotten bogged down in specific details of specific schema formats, I think the best first step for this would be to have some methods that extract runtime structures describing the struct that had Serialize derived, instead of directly generating some serialized schema format.

The important part is that this runtime information would need to include information about the Serde attributes that were applied to each field/variant/type. This is important because these attributes affect the serialization format which you would need to know when generating a schema or auto-generating client code or whatever. This is why Serde itself probably needs to be the thing that supports this feature, instead of another more general "DeriveGeneric" crate.

At that point, people can write tools that process the type-info of Serde types into whatever schema formats (or auto-generate client code directly in other languages with this data), leaving these tricky format-specific problems separate from the core functionality of reflecting the type information.

An underspecified strawman:

enum SerdeType {
    Struct(SerdeStruct),
    Enum(SerdeEnum)
}
struct SerdeStruct {
    name: String,
    attributes: Vec<SerdeAttribute>,
    fields: Vec<SerdeField>
}
struct SerdeField {
    name: String,
    type: TypeDesc, // or maybe SerdeType itself? not sure exactly what this would contain
    attributes: Vec<SerdeFieldAttribute>,
}

then calling Serialize::type_info<MySerializeType>() would return a SerdeType that describes MySerializeType.

@oli-obk
Copy link
Member

oli-obk commented Jan 30, 2018

Unfortunately there's no way to add this backwardscompatibly without adding a default implementation. This would mean that all types that implement Serialize without derive would need to return a dummy value.

Not sure if we should return an Option or add an Unimplemented variant to SerdeType

then calling Serialize::type_info() would return a SerdeType that describes MySerializeType.

Actually that would be MySerializeType::type_info() or <MySerializeType as Serialize>::type_info().

@dtolnay
Copy link
Member

dtolnay commented May 7, 2018

I would like to see schema-based serialization explored in a high quality Avro or Parquet or other schema-requiring format library first. At that point we can see whether it makes sense to generalize across formats in a way that involves Serde. I would be interested to see a PR but I am not planning to pursue schema-like type information directly.

@dtolnay dtolnay closed this as completed May 7, 2018
@oooutlk
Copy link

oooutlk commented Jul 2, 2018

The reflection crate includes a test case to generating similar output:

https://github.com/oooutlk/reflection/blob/master/reflection_test/src/lib.rs#L151

FYI.

@Ploppz
Copy link

Ploppz commented Jan 9, 2020

This feature would be extremely useful.
It's a pity if it's impossible to add it to serde in a backward-compatible way, because serde already has its nice data model. I predict that a separate effort to make a library for emitting type information might mirror the data model of serde to some extent.

That said, I think it's important to stress that any such effort, whether in serde or outside, should also be fully extensible to implement any format that describes types. For example typescript, and what I'm working with right now for example: typedload in Python.

The approach using reflection seems viable and could be useful for many applications. But there is one limitation that stops me from using it in my project: In the structs that I serialize to JSON which is given to both python and javascript parts of my application, I use serde attributes liberally. For example #[serde(flatten)] and #[serde(tag = "type")], but as you all know many more attributes are possible. I believe firmly that these attributes really need to be taken into account, and thus scheme/type definition generation is intimately tied to serde.

@ma2bd
Copy link

ma2bd commented May 1, 2020

@Ploppz My colleagues and I gave it a shot and just published this crate: https://crates.io/crates/serde-reflection

@Ploppz
Copy link

Ploppz commented May 5, 2020

Thanks a lot for this effort @MatBD! I really like your approach without proc macros - I had imagined proc macros might be needed but it seems like we can come a long way using the Deserialize trait.
I wonder if we can support #[serde(flatten)] as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests