-
Notifications
You must be signed in to change notification settings - Fork 2k
GraphQL doing unnecessary/repetitious requests/resolves. #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The GraphQL execution engine on its own does not do any caching for you, but simply provides the hooks to determine what level of caching is appropriate for your application. For example, the production GraphQL server at Facebook does not talk to a database directly, but instead talks to a read-through cache. |
Even still caching isn't the only issue here: what about query execution planning/simplification (which pretty much any database will do for you)? And, just to be sure, you're familiar with which parts of the GraphQL whitepaper I'm referring to (in regards to the 'selling')? Also, what particularly does GraphQL do to mitigate N+1 scenarios (as claimed)? Further, it does seem like GraphQL is essentially the front-end to a database in the traditional sense and thus it would make most sense to do many (if not all of) these things there so are there any plans to move in this direction? [Updated] |
GraphQL is more of a structured API surface. I would put GraphQL closer to REST than to SQL. It's designed to be a very thin layer of structure atop arbitrary application code, and thus there is often not enough information for the GraphQL execution engine itself to perform query simplification for you. Database front-ends which do this are able to make assumptions about the underlying storage and fetching (e.g. SQL) that GraphQL simply cannot do. What GraphQL can do, however, is provide the structured query to your application before and during an execution so that these hooks may be used for query planning and caching. GraphQL-JS itself is very young and so while these hooks exist, not much has been built to take advantage of them yet. One exciting example of this exploration is the reducing executor (#26) which seeks to help plan efficient SQL-style queries from a GraphQL input. |
Sorry, I'm not totally certain, no. |
Oh, my apologies: looks like it was on the announcement/introduction of Relay+GraphQL found here: Under the first bullet labeled "Performance":
This version of it mentioned "N+1" particularly: So, apparently this 'performance'-centric portion of the system falls under Relay's purview rather than GraphQL's? Anyway, great work! I'm finding the GraphQL language itself to be very well thought out/designed (and surely it will be easy to extend the reference (or other) implementation(s) with these additional features as desired). |
@johanatan I believe that the performance "selling point" refers to the communication/request between client and server (e.g mobile-app using Relay <--> API server with GraphQL as an "interface"), so it's not just Relay's pro, but Relay + GraphQL. Still, I believe that there's definitely a "market" for the generic solution that would sit between GraphQL server and the database/cache layer, optimizing the queries. The thing is, this seems to be very project-specific, so such solution needs to be easy to plug-in, at least that's my intuition. |
@iyn sums this up well. GraphQL itself is designed to be a thin mapping between a query and functions executed on the server. It has inherent possibility for exploiting parallelism and reaping great performance, however GraphQL does not do query planning for you as this will be different depending on your particular backend and product constraints. I think there is ripe opportunity for the development of libraries to assist in this kind of query planning for various backends that people care about. Unfortunately these don't exist yet, since the GraphQL community is still under 2 months old! We've built some of these at Facebook, but unfortunately they're in PHP and back our custom infrastructure. |
I understand your point however there are very basic optimizations available (if you can call them that-- they're really just about avoiding pathological behavior). Due to the semantics of the GraphQL language itself which necessarily involve interaction with a graph (regardless of which backend stores the data), there are high-level optimizations available-- e.g., detection of multiple identical or partially overlapping cycles in the request and recognition of the fact that you've already traversed the cycle (or parts of it) at least once and returning the same results as that first traversal. This is but one example of many I could give. I would encourage you to think in abstract terms and see what optimizations are available in that abstract apart from any storage mechanics (and I can assure you that there are many). It honestly caught me by surprise that this was not included in the language's design as it seems implied by its mere existence to me. Pushing this off on another layer is fine but in my view is just a side-step and a huge missed opportunity; in my opinion, the earliest point that something is possible (and I mean possible in the abstract-- i.e., for all possible downstreams) is where that something should occur. |
To be clear - I'm not saying that GraphQL should not contain these optimizations, just that at present it is premature to include them and potentially accidentally make some backend integrations impossible. Again, GraphQL is early technology and at this point in its life we're expecting experimentation around backend integration and optimization strategies - as some of these become clear, we'll want to consider integrating them into the core library. |
The biggest problems with optimization currently is that not all parts of the tree are known beforehand, eg due to the way typed fragments work with interface or union resolving. We did this full optimization before graphql-js was released, but we lacked such features in our type system. I believe that it's possible to do some optimization beforehand, though, but I think a more reliable solution at this point is optimization through a query proxy, like described, eg, here https://www.reddit.com/r/reactjs/comments/3flgnu/building_a_graphql_server_with_nodejs_and_sql/ctqudkn On Thu, Aug 27, 2015 at 10:19 PM, Lee Byron [email protected]
|
@leebyron The problem with that approach as I see it is that the translation from GraphQL syntax to some number of parallel .resolve() calls to the backend is destructive-- it is impossible to deduce from a number of in-flight [parallel] calls to a backend what query was input which resulted in those calls being made (which is why I suggested making these optimizations in the front end [at the earliest possible place] "in the abstract-- for all possible backends"). |
@freiksenet Would that approach not introduce some inherent latency (while the proxy waits to see if more values of a particular entity are required in close proximity to others)? Otherwise one would need some sort of |
@johanatan You can sort-of ab(use) the fact that looping over the fields is synchronous, so when the proxy will get control all the queries from current set of children fields will already be in the 'queue', so no big latency problems. |
What graphql-js could do without adding too much app-specific semantics is to provide var Partner = new GraphQLObjectType({
name: 'Partner',
fields: {
principal: {
type: Principal,
map: (partnerList) => {
// Fetch all principals for list of partners in one request and map them appropriately
// (when principal is null - just fill slot in mapped list with null)
}
}
}
}) When in non-list context - you could still pass list of one item to We plan to try this approach in https://github.com/webonyx/graphql-php some time soon |
In fact I don't think it should ! |
First of all, there is no "you" or "me" modeled in that snippet. And the snippet captures the following: Thus for any I suppose it would be possible to diverge from this [obvious] interpretation of that type language snippet however (i.e., for the 1-to-1 side to not be reflected in the 1-to-many side) but it would be highly unorthodox and extremely confusing to any readers/consumers. In fact, this may point to a gap in the GraphQL specification itself-- namely, the lack of explicit handling of bi-directional cardinality specifiers-- leaving it to each individual author to do what is normal/expected/orthodox without enforcing it. Regarding what is a temporary relationship or not-- I think it would be safe to assume that the underlying database is constant for the duration of a single GraphQL query request. So, your temporary principal at the beginning of the query execution is also your principal at the end of the query (conceptually of course). Otherwise there would seem to be inherent all sorts of race conditions (at such time as the subqueries are issued in parallel) and data integrity issues (regardless of parallelism). |
Also, not to mention, GraphQL must be (and is) aware of the ID of each of the entities it is requesting [ID is the value passed into [Notice the |
In fact I'm not talking about your case details, I was just saying that in those kind of recursive fetches you may have different kind of resolve... In graph-ql there is a context dependency that you hadn't in REST. In REST when you ask a resource, you give its id and it gives you data, it's idempotent... I'm just saying that I don't think GraphQL has to know about those things, and it's your execution context that matters. It's a new way of thinking, and I don't think this is a GraphQL's role to deal with this kind of optimizations, but your query executor. For example MySQL handles a cache... If you do the same query twice, the second one will be immediate. Relay do have this kind of considerations too... that's why it forces you to have global ids, And handles a store... I don't think that would be done in GraphQL... I think GraphQL is giving you a way to access resource into a given context, and not optimize the way you query/cache your data. |
I think you're missing my second point: for two equal contexts, the same Also, my case details are just an example-- try thinking about a graph as a static structure for the duration of a single query and I'm sure you can see how cycles are possible. |
I'm not saying it's not possible... Yes you are right, this kind of thing could happen... But what I'm saying, it's that keep a track of all resolve call with results and context, to deeply check context equality to avoid risks of calling twice the same resolve function... I'm not so sure it would be optimization... Moreover, this kind of cycles are so easy to track with a query executor... And I'm pretty sure your DB already do! :) Though, I think this thread is interesting because it clarifies borders of GQL, and points out why context is the big value GQL provides, besides graph queries... |
@johanatan it sounds like you're talking about a memoization strategy on a per-type basis. This is a fine suggestion and I've seen it employed before. Just as a word of caution, it is a trade-off between computation and memory usage. When used generically across all resolver functions, in my experience this becomes a negative trade. Most resolver functions are very cheap and do not benefit from memoization. There's also the real concern within JavaScript of the objects and arrays often used being technically mutable. If you're wrapping over database queries, this may never bite you, but if you're wrapping over some ORM library you could see weird effects. For resolver functions which truly are expensive - such as those which need to resolve to a database, I highly recommend using the memoization/caching layer of indirection you're talking about. The reason GraphQL does not bake this in for you is because there are so many different techniques for memoization, each with their own tradeoffs. If you happen to stumble upon one which could be universally beneficial, I'd love to discuss the merits over a pull request. |
Thanks, Lee. Good info. Note that memorization wasn't the only (or original) suggestion here-- I still think that high-level semantic analysis of what's really being requested from the graph would provide ample opportunities for optimization (i.e., never calling 'resolve' in the first place when it is obvious from abstract analysis that it isn't required). It will be interesting to see if the project heads in that direction or not. If you would welcome a pull request on that angle, I would definitely be interested in giving it a try.
|
It's definitely easier to discuss these ideas more concretely over code. I'm not sure what form this kind of analysis might take, but I'm interested to see your idea take shape |
@johanatan you may be interested in https://github.com/facebook/dataloader which was released today. It's intended to make your GraphQL type definitions easier to write while solving the primary issue you brought up here. |
Ahh, very nice! Thx! |
On the subject, I have written an article about the use of DataLoader to batch GraphQL requests. |
Hi,
Given the following type language schema:
and the following query:
we see the following requests for data on the backend:
[uids are generated per unique call to
resolve
].These repetitious requests came as a surprise given that one of GraphQL's biggest selling points is the elimination of N+1 queries and the introduction of intelligent caching/simplification/reduction etc etc etc
Is this a known issue?
The text was updated successfully, but these errors were encountered: