Expose user-defined meta-information via introspection API in form of directives #300

Open

Expose user-defined meta-information via introspection API in form of directives#300

Labels

With growing popularity of IDL as a way to define a GraphQL schema, I think it would be quite beneficial to expose directive information via introspection API.

From what I can tell, the directive information is the only missing piece of information that is not obtainable via introspection API. For example in this schema definition:

type User {
  id: ID!
  name: String
  admin: Boolean! @important
}

type Query {
  user: User
}

@important directive is only available at schema materialization time, but discarded as soon as schema is materialized from AST definitions.

One can see directives as a way to instruct server to construct the schema is specific way. But I would argue that directives have also a lot of potential as a way to expose additional meta-information about the schema. This may include things like: field cost/complexity (the use case I'm quite interested in), auth information, caching characteristics, and in general any kind of structured meta-information that is specific to a particular application or subset of applications. I think a lot of interesting use-cases and scenarios can emerge as soon as this meta-information is exposed via introspection. I can also imagine community working together on a set of common directives that may be adopted by client tooling and frameworks like apollo and relay. These common directives (exposed via introspection API) may provide deeper insights on the schema, relations between fields and objects, etc.

I described this feature in context of IDL, but in fact it's quite independent from it (though integrates with it very naturally). I was thinking about different ways how this kind of user-defined meta-information can be exposed via introspection API and I feel that directive-based approach is the most natural and well integrated way of doing it.

I would love to hear you opinions on this topic!

calebmer

field cost/complexity (the use case I'm quite interested in)

I had some experiments on this that I didn’t release. I’d love to hear your thoughts 😊

I agree that being able to put arbitrary information into introspection is incredibly powerful, but I don’t think that we should be translating directives one-to-one into the introspection. Directives are meta instructions for the tools which consume the IDL. Making them a first class part of introspection reveals too many implementation details. It would also be very tough to type the directive arguments well.

I’d rather see tools that can interpret directives and then translate that to fields in the introspection 😊. For example, a server could:

extend type __Field {
  important: Boolean
}

…and then no matter where you define your schema whether it be in the GraphQL IDL, GraphQL.js, or some other GraphQL server framework this flag can be set.

I don’t like the idea of making the IDL the one source of truth when creating a GraphQL server, but I do really like the idea of allowing users to extend the introspection types with arbitrary extra information.

smolinari

I don’t like the idea of making the IDL the one source of truth when creating a GraphQL server

Amen!

Scott

OlegIlyenko

ContributorAuthor

I don’t think that we should be translating directives one-to-one into the introspection

I agree, the subset of directives that are used in the IDL may be completely different from subset of directives that are exposed via introspection (they may not even overlap)

I don’t like the idea of making the IDL the one source of truth when creating a GraphQL server

100% agree on this one. The whole idea is quite unrelated to IDL schema definition. Though if meta-information is exposed in a directive format, then some interesting scenarios can emerge. For example this end-to-end scenario falls naturally out of it:

"Internal Service 1" may use completely different set of directives at creation time than directives that are exposed via introspection to assemble the "Gateway" IDL. But using a directives is quite convenient since they are easily translated to/from IDL.

IDL aside, directives have an advantage that they are also easily introspectable though the API. But in general don't have a very strong opinion on the actual meta-information format. My main motivation is to somehow expose additional user-defined meta-information though introspection API.

Though I have a concern about the format you proposed:

extend type __Field {
  important: Boolean
}

If it is not defined with IDL, then server implementation somehow need to provide a way for user to define additional meta-fields on all GrapohQL schema-related classes and the type information about these fields somewhere on the schema. I think it can become a bit complex, especially for type-safe languages, also considering that with directives one can already cover the second part (directives already provide the type definitions for this meta-information, so there is no need to introduce a new concept for it)

smolinari

I think I know where you are heading with this, and I agree wholeheartedly, however, the solution isn't in GraphQL's own metadata injection system. Trying to extend it to cover more business use cases is the wrong direction. Up to this point, I've heard suggestions on authorization, validation, and of course, the data modeling itself (since it is part of GraphQL it is why so many are looking to GraphQL solutions to solve business domain problems).

I am going to go out on a limb here. The way I see it, Facebook has offered us a really cool way to build a gateway into our data. However, I am almost certain, they are only telling a partial story. I am convinced that they are doing metadata based development, where the metadata is the business logic itself, and GraphQL only offers (to those who should see it) access to that particular kind of data. When I see Lee Byron push back on suggestions like this and others, it is sort of dawning on me that Facebook is coming from another world of thinking and IMHO, it can only be metadata driven development.

Why is metadata driven development good? Because it puts the power of any change to business logic in the hands of the business.

In other words, once the metadata is set and known, then getting the business model (the domain model) together, programatically, is a matter of building it from the metadata. Tools can be offered to non-programmers to change the metadata. The same build-from-metadata goes for GraphQL endpoints too. In other words, metadata is the driver, not GraphQL schema. From the metadata, it would be a matter of translation into definitions for GraphQL, protobuffers, etc. The single source of truth is then the one set of metadata.

So, I guess what I am trying to say is, instead of trying to stuff all kinds of metadata inside GraphQL, we should be thinking about how we can let the metadata drive defining GraphQL schema.

Scott

rmosolgo

👍 I like the idea, I've had half a mind to implement it in Ruby anyways, since the IDL isn't showing signs of ever leaving RFC stage 😆

Thanks for sharing those thoughts about metadata-driven development. That's something interesting to think about, as the Ruby library grows, the Ruby API for schema definition is becoming more a hindrance than a help.

My thought has been to make the GraphQL schema be the metadata. Otherwise I have to invent yet another metadata structure which maps to GraphQL 😖

rmosolgo

I worried about portability, since different schema parsers might handle these inputs differently, but I thought I could just include a function to parse a schema then dump it without the custom directives.

smolinari

the Ruby API for schema definition is becoming more a hindrance than a help.

Yeah, it seems many people would like to turn their GraphQL system into a "God API", whereas it clearly should only be a relatively simple, but totally awesome gateway into the business logic layer of the application.

My thought has been to make the GraphQL schema be the metadata. Otherwise I have to invent yet another metadata structure which maps to GraphQL.

Yes, but the metadata can be the source of truth for the whole application (or applications), including the API. Think about validation, authorization, workflow, models, and whatever else is business driven. And, your answer tells me you are also still thinking in the wrong direction. The GraphQL API would be modeled after the metadata, not the other way around. 😄

Loopback does something similar to what I am talking about with its "API modeling" according to the modeled data.

Scott

OlegIlyenko

ContributorAuthor

@smolinari you brought ups some very interesting points. Though my original intention was more about exposing additional information, rather then a way to aid the data modeling. I would definitely agree, directives indeed expose domain concerns. Even if we generate GraphQL schema based on some other data modeling tool, I think it's still very helpful to be able expose some meta-information via introspection API. Let's stick to this example with a gateway. Recently there was a great video by @bryan-coursera from Coursera on this topic. in particular, I found "Linking the resources" part quite interesting:

https://youtu.be/r-SmpTaqpDM?t=1479

If I understood it correctly, their internal services expose additional meta-information about relations between different models. I think directives can be quite useful in this respect for assembler/gateway service. For example schema of 2 internal services can look like this (I used IDL for demonstration, but it would be accessed via introspection in the actual app):

# "courses" service

type Course {
  id: ID!
  name: String
  subscriber: [ID] @reference(service: "users", rootField: "users", type: "User")
}

# "users" service

type Query {
  users(ids: [ID]): [User]
}

Gateway service then will discover these schemas via introspection API and expose Course type like this (with knowledge on how to resolve it correctly and efficiently using 2 other services):

# "gateway" service

type Course {
  id: ID!
  name: String
  subscriber: [User]
}

When it comes to data modeling, I think GraphQL IDL syntax can be a very good candidate for it. Over the years I saw quite a few tools and formats to declaratively model the data and the domain. Though looks like there is no tool that have seen very wide widespread. I feel that MDD (Model-Driven Development) has it's own set of challenges. I saw it taken quite a bit too far (in quite big production setups) where it becomes real pain to deal with (instead working application code, people a writing generator/tool code which adds additional layers of indirection and frustration). I feel that declarative data modeling fits quite well where the domain itself is very well established and understood.

Recently I saw several examples where GraphQL IDL is used in vary interesting data modeling scenarios. First example is graphql-up. Given this IDL type definition:

type User {
  id: ID!
  name: String
}

It will create a GraphQL schema that will contain User input/output types, relay API to read users and create/update new one, etc. So the IDL that you provide to graphql-up and a GraphQL schema that you end up with are very different. Using GraphQL IDL syntax to model the data in this case (actually any other syntax/language will do the trick in this scenario) has quite a few advantages:

There is already huge amount of tooling available for GraphQL, so it's easy to work with it (especially pragmatically), visualize it and do other interesting things to it
The syntax is familiar and well established, so the learning curve is much shorter, especially considering how nicely it correlates with the end result

Another very interesting adoption of GraphQL IDL syntax is contraband (Contraband is a description language for your datatypes and APIs, currently targeting Java and Scala. It would be part of the next version of scala build tool). As you can see, they adopted the IDL syntax, but changed it in a few interesting ways (including introduction of namespaces, YAY :)).

I see these two examples as a good validation of an idea that GraphQL IDL can be a viable tool for data modeling.

smolinari

Though my original intention was more about exposing additional information, rather then a way to aid the data modeling.

I understand. My intention also isn't really about aiding data modelling, but rather automatic generation of the API from a set of metadata. If you have that kind control over the metadata, and the metadata is also persisted in some manner, you can also control as much or little introspection of any of the "view" of any data you want. I realize this is getting quite esoteric, but try to think inside-out or rather, think that the API is something far, far away from a single source of truth. The API should be a window into the application's business layer in that it is only modelled after the domain models, which are (must be) defined elsewhere in the application. I am not saying this translation of metadata is easier, but overall, it is a lot easier than bending the API to all our business needs.

Right now, GraphQL is so cool and allows for so much, it is so flexible, people are starting to want to "model" everything in it, including the logic of what users can introspect. 😉 Whereas, these decisions of what to see or not, (no matter what is being controlled) is basically authorization logic and that is 100% business logic. Thus, it has or should have nothing to do with the internal workings of the API, except that there could be models burnt in metadata for the authorization too, which can also be generated as GraphQL schema, which can be made introspective ( or not, since we'll hopefully be able to generate schema/ the API automatically).

My simple and hard to fathom point is, the single source of truth cannot be the API/ the schema itself. It should only be fashioned after the applications single source of truth, and that is the business/ domain logic.

I know I have butted into similar discussions in other places about this. I might be getting on people's nerves because of it (who are also definitely loving GraphQL and its scene/ community). So, I think I've clarified my point as best I can here. I'll bow out now and let the conversation continue. Just let me warn everyone that making the API "too smart" is dumb and unnecessary. The hard work needs to go somewhere else in the depths of the server stack, which in the end, will make working with GraphQL overall, much easier. 😄

Scott

OlegIlyenko

ContributorAuthor

@smolinari thanks a lot for a very detailed explanation! I think I can now better understand your point. I would definitely agree with it, there is much more to business logic of an application than what API can/should expose. I think it's also a good point to keep in mind as discussion around this issue progresses.

wincent

Contributor

Interesting discussion. Thanks for starting it @OlegIlyenko. As you know, the role of directives as currently defined in the spec is pretty narrow; they are intended to:

[D]escribe alternate runtime execution and type validation behavior in a GraphQL document.

Exposing them via introspection (beyond __schema { directives { ... } }) would be a pretty large extension which we would want to evaluate carefully. My initial instinct is that exposing them like this would be overloading their purpose in a way that would increase the conceptual burden in an undesirable way, and I'd like to see some more exploration of specific use cases where having schema directives exposed via introspection would make things that are currently very difficult (or impossible) to do via other means significantly easier (or possible).

@OlegIlyenko: for example, you mentioned "field cost/complexity". Can you tell us more about that? We've certainly built tooling around that internally at FB, but it exists outside the schema (consuming the schema, developer queries/fragments, and runtime metrics as inputs).

IvanGoncharov

Member

Expose IDL directive information via introspection API

@OlegIlyenko IMHO, IDL word in the title makes people think that the only way to expose this meta-information will be defining it inside IDL document. But nothing prevents you from specifying applied directives if you define the schema in the source code (with support from the server-side lib). So how about renaming it to:

Expose values of applied directives via introspection API

or something similar?

My initial instinct is that exposing them like this would be overloading their purpose in a way that would increase the conceptual burden in an undesirable way

@wincent I think it's a good solution to spec bloat. For example, according to the graphql-js implementation, you can deprecate field by using @deprecated directive, but in introspection, it is exposed through isDeprecated and deprecationReason fields. That means if I decide to have something like @deprecationDate I am forced to define new fields inside introspection, e.g. deprecationDate. The only way to safely achieve this will be pushing such directives and fields into the spec and this will lead to spec bloat.

To sum it up: GraphQL introspection should support mechanism for vendor extensions inside introspection and exposing applied directive values is a good solution for that.

I'd like to see some more exploration of specific use cases where having schema directives exposed via introspection would make things that are currently very difficult (or impossible) to do via other means significantly easier (or possible).

Here are a few examples from the top of my head:

@localizeName for enum values. I like that spec is limiting such names to ASCII but at the same time, there should be a possibility to specify localized name and use them on the client.
@relayMaxSliceSize which specify maximum number you can pass to first/last. It will allow implementing zero-config pagination
@examples for field arguments which can be used to generate better documentation (e.g. show them somewhere in graphiql when you type field arguments)

calebmer

@OlegIlyenko have you considered introducing only a single directive in the IDL that maps well to introspection that would allow users to provide metadata? Something like @metadata. Users could then define (or extend) a __FieldMetadata type, or __FieldMetadata could be a scalar which accepts any JSON object. This could be represented in the IDL as:

type __FieldMetadata { important: Boolean }
# Or...
scalar __FieldMetadata

# We may also have a `__TypeMetadata` perhaps.
directive @metadata(field: __FieldMetadata)

type User {
  id: ID!
  name: String
  admin: Boolean! @metadata(field: { important: true })
}

(I may be getting the directive syntax wrong, feel free to edit this comment if it is wrong)

Or in the introspection query this would be modeled as:

{
  __type(name: "User") {
    fields {
      metadata { important }
    }
  }
}

This balances the need for attaching metadata to a GraphQL schema with the desire to not introducing special behavior around all directives in the IDL.

OlegIlyenko

changed the title ~~[-]Expose IDL directive information via introspection API[/-]~~ Expose user-defined meta-information via introspection API in form of directives

on May 3, 2017

OlegIlyenko

ContributorAuthor

@wincent

would be a pretty large extension which we would want to evaluate carefully

I definitely agree with this! Seeing all these great comments made me think a lot about the concept and it's soundness :) Now I discovered some new interesting perspectives on it.

you mentioned "field cost/complexity". Can you tell us more about that?

assuming that complexity calculation is a simple and static algorithm (like the one I used), it can be replicated in a client-side tooling given that the information about complexity of individual fields is available in some way (ideally though the introspection API).

This feature saved us already several times from unintentional expensive client queries. But when we start a dialog about why query was rejected by server and what query complexity/cost means, people always get confused since from a client perspective it's hard to predict (at least in more non-trivial cases) the overall complexity of the query in advance without communicating to the server (and then tweak it in order to ensure that complexity is below the threshold). I believe that by making this information more transparent we can avoid a lot of confusion around complexity estimation and help developers to write efficient queries. If this information is available though the introspection API, then the complexity calculation can be implemented as query validation rule which then can be used by existing linting tools (no modification is necessary to the tool itself). If we take this idea even further, one can develop a GraphiQL plugin that shows complexity of individual fields and field + nested selection set on mouseover. I think these kind of insights will be very helpful to client and server developers.

this would be overloading their purpose

I also share this concern. I think directives are convenient since after this change it would very easy to fully represent an introspection data in IDL syntax. I'm open to different syntax/encoding of this information. My main goal in this particular issue is to prove/disprove my hypothesis that it is useful/beneficial to expose user-defined meta-information via introspection API and benefits are worth added complexity. I just thought that it would be helpful to have some concrete syntax in examples.

@IvanGoncharov

It's an excellent point about deprecation! I haven't thought about it in this way, but now that you mentioned it, it makes perfect sense. Also if we want to, for instance, add a deprecation feature on other things, we can just update the directive itself without any changes to the introspection API. E.g.:

- directive @deprecated(reason: String) on FIELD_DEFINITION | ENUM_VALUE
+ directive @deprecated(reason: String) on FIELD_DEFINITION | ENUM_VALUE | OBJECT

I also like your other examples. I think they all are valid use-cases. Totally agree about the title, I think it caused quite a bit of confusion. I updated it to better reflect the original motivation.

@calebmer

I think it is an interesting idea and definitely worth considering. Though I personally would prefer not to mix disjointed concerns in a single type. With this approach we can end up with type like this one:

type __FieldMetadata {
  localizedName: LocalizedString
  complexity: Float
  example: String
}

I would rather prefer to see these as independent entities (like with the directives). This will also require introduction of 11 new types (__FieldMetadata, __EnumMetadata, __EnumValueMetadata, __ScalarMetadata, etc.).

163 remaining items

benjie

Member

One issue with the defaultValue stringified literal approach is that it requires a GraphQL-capable parser to make sense of the data (JSON.parse() is not sufficient, and adding a parser will increase bundle size for web clients), and even then that's not sufficient since custom scalars can have their own parse/deparse rules which are not encoded via the schema. There's @specifiedBy but that currently doesn't provide a machine-readable parser. defaultValue itself also suffers this issue, but it's extremely rare that any application client will actually use the defaultValue for anything, typically it's just rendered in docs/GraphiQL/etc and the fact or "has default" is enough for most application logic.

I definitely agree that sharing validation logic is one of the use cases of metadata directives 👍

martinbonnin

Contributor

Has there been any talks about modeling GraphQL values as JSON objects in introspection?

GraphQL value:

{ a: [0, 1, 2], b: 3.14, c: "foo" }

Introspection result:

{
  "__typename": "__ObjectValue",
  "value": {
    "a": {
      "__typename": "__ListValue",
      "value": [
        { "__typename":  "__IntValue", "value":  0},
        { "__typename":  "__IntValue", "value":  1},
        { "__typename":  "__IntValue", "value":  2}
      ]
    },
    "b": {
      "__typename": "__FloatValue",
      "value": 3.14
    },
    "c": {
      "__typename": "__StringValue",
      "value": "foo"
    }
  }
}

Sure it's verbose but JSON is already verbose so it might be ok? And clients that don't need the actual values but just the presence of a value could just not request it.

benjie

Member

@martinbonnin You're essentially talking about "boxed types" here, or, arguably, an AST with type annotations.

I hope that the struct RFC solves this in a more elegant fashion: https://github.com/graphql/graphql-wg/blob/main/rfcs/Struct.md

Essentially it allows the defaultValue of { a: [0, 1, 2], b: 3.14, c: "foo" } to be output as the JSON object { "a": [0, 1, 2], "b": 3.14, "c": "foo" } and clients can still interpret it type-safely since they know the underlying types from the schema. This is much more GraphQL-y than an AST approach IMO, since it's more what the client would desire to deal with.