-
-
Notifications
You must be signed in to change notification settings - Fork 827
Using serde to produce Schema-like type Information. #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Nope. Serde works with values of concrete types. You want to work with concrete types directly. Usually it's easier to simply wrap a macro around your type definitions. If you want to do it for types you don't have access to you need to go into rustc-plugin territory. You can take inspiration from the |
Thank you! Too bad, this would make interaction with other languages A LOT easier. |
That's just a msgpack thing. You can easily write a serializer with much more information. |
True. I just have the problem that including the names for each field in each and every instance is prohibitive expensive, so maybe that's just my problem. |
Hi there! I reopened this because I've thought about doing this because this might be required for serializing into Avro and maybe Parquet, but I haven't really looked into it yet. Your use case is definitely interesting. If you end up trying to implement this, I think this would be very appropriate for it to land in serde. |
Another use case of schema information is auto-generated documentation: #712. |
I think this would also be useful as a JSON Schema Schema Generator. |
json schema is missing a few major features, which is enum types and defining custom types that can be composed together. I have developed a specification for json types which I intend to use for jsoncmd, a specification which enables command line programs to be as simple to interact with as jsonrpc programs (actually easier -- since they are typed!) The rough draft of the specification can be found here I think something like the "types" argument is the schema we should pursue. I am very open to feedback on changing it if others think there is a better way to represent types. I would love to have this feature as part of serde, as it would make implementing jsoncmd type declaration trivial for rust users! |
/Off-topic |
@vossad01 I didn't see that! It has allOf anyOf AND oneOf. I honestly can't see the value of the first two but it definitely meets my needs. I want to point out that oneOf is not in the spec: I'd like to know where it's officially defined. |
/Off-topic You linked to the For further discussion, I'd recommend either opening an issue at json-schema-org/json-schema-spec or starting a conversation on the google group so we don't completely hijack this serde issue. |
@vitiral that's the core spec. You want the validation spec: http://json-schema.org/latest/json-schema-validation.html We actually also support Note that the current drafts (draft-wright-json-schema*-00) are about to be replaced by draft-wright-json-schema*-01, which will add There is also the hyper-schema spec: http://json-schema.org/latest/json-schema-hypermedia.html but if you are interested in that I really recommend waiting for draft-wright-json-schema-hyperschema-01 as -00 had a number of problems. |
@handrews @vossad01 thanks! I would much prefer to use a pre-existing specification. That one looks good and I would love to see it in serde. |
I would also like to have a similar feature (namely with formats like Parquet and ORC, for schema checking, creation etc.) but the main problem I see here is that the "schema" with custom de/serializers can depend on the values in an arbitrary way, e.g. now you can serialize Option as the tuples "(0,)" and "(1, number)". Wilder things than tagged enums are possible, of course (imagine encoding the number 6 as the depth of: [[[[[[]]]]]]). Another problem of any sufficiently general "schema language" even without these quirks is any kind of recursion which is actually required for any document-like structure. I can imagine a trait declaring a simple "record" type only consisting of primitive types, homogenous arrays, fixed type tuples and records (with element names?), and possibly simply tagged enums. But think about the many schema specifications and imagine coming up with a new one being compatible with most of them. Anything we would invent there would be highly opinionated and more of a project of its own. I am looking at making something in the spirit of Hive types, Parquet types or ORC types. Another advantage to a new library would be the ability to gear it towards columnar storage, where each "record member" comes from a different stream rather than them all being serialized in one chunk. |
What sort of problems do you see with recursion? JSON Schema supports it via |
Thanks for the pointer! My point was more that with custom serializers (even possibly depending on the data), no schema may be able to capture the resulting structure. And even if the custom serde output is well-behaved, you need to specify the schema by hand rather than auto-derive it (imagine a type like Option) My proposal would be to create another library with a trait like "WithSchema" that would let you declare or derive a json schema for your struct (and output it), and that also implies serde traits for it (giving you the serialization code itself almost for free). This can be independent of serde and anyone can make a similar library for another schema spec. For example, I would be happier with a non-recursive type spec (to keep things simpler), and this way we can both have it. |
Since this ticket's discussion has gotten bogged down in specific details of specific schema formats, I think the best first step for this would be to have some methods that extract runtime structures describing the struct that had Serialize derived, instead of directly generating some serialized schema format. The important part is that this runtime information would need to include information about the Serde attributes that were applied to each field/variant/type. This is important because these attributes affect the serialization format which you would need to know when generating a schema or auto-generating client code or whatever. This is why Serde itself probably needs to be the thing that supports this feature, instead of another more general "DeriveGeneric" crate. At that point, people can write tools that process the type-info of Serde types into whatever schema formats (or auto-generate client code directly in other languages with this data), leaving these tricky format-specific problems separate from the core functionality of reflecting the type information. An underspecified strawman: enum SerdeType {
Struct(SerdeStruct),
Enum(SerdeEnum)
}
struct SerdeStruct {
name: String,
attributes: Vec<SerdeAttribute>,
fields: Vec<SerdeField>
}
struct SerdeField {
name: String,
type: TypeDesc, // or maybe SerdeType itself? not sure exactly what this would contain
attributes: Vec<SerdeFieldAttribute>,
} then calling |
Unfortunately there's no way to add this backwardscompatibly without adding a default implementation. This would mean that all types that implement Not sure if we should return an
Actually that would be |
I would like to see schema-based serialization explored in a high quality Avro or Parquet or other schema-requiring format library first. At that point we can see whether it makes sense to generalize across formats in a way that involves Serde. I would be interested to see a PR but I am not planning to pursue schema-like type information directly. |
The reflection crate includes a test case to generating similar output: https://github.com/oooutlk/reflection/blob/master/reflection_test/src/lib.rs#L151 FYI. |
This feature would be extremely useful. That said, I think it's important to stress that any such effort, whether in serde or outside, should also be fully extensible to implement any format that describes types. For example typescript, and what I'm working with right now for example: typedload in Python. The approach using |
@Ploppz My colleagues and I gave it a shot and just published this crate: https://crates.io/crates/serde-reflection |
Thanks a lot for this effort @MatBD! I really like your approach without proc macros - I had imagined proc macros might be needed but it seems like we can come a long way using the |
Uh oh!
There was an error while loading. Please reload this page.
Hey, is it possible to use the information serde already has to produce a type description.
e.g. lets say we have the follwing enum:
If we encode
Foo::Bla{a: 123}
with msgpack we get something like[1,[123]]
.I would like to get all the information that I would need to recover type and field names in a dynamic language. For example by calling something like describe!(Foo) which would yield something like:
Is there an easy way of doing this with serde?
The text was updated successfully, but these errors were encountered: