Description
Context
Hi there. We're using graphql-js
and serving subscriptions over WebSocket via graphql-ws
(as recommended by Apollo for both server and client).
In our subscriptions' subscribe
methods, we always return an AsyncIterable
pretty much right away. We typically do this either by defining our methods via async generator functions (async function*
), or by calling graphql-redis-subscriptions
's asyncIterator
method. Our subscribe
methods effectively never throw an error just providing an AsyncIterable
.
However, we occasionally hit errors actually streaming subscription events, when graphql-js
calls our AsyncIterable
's next()
method. E.g. Redis could be momentarily down, or an upstream producer/generator could fail/throw. So we sometimes throw
errors during iteration. And importantly, this can happen mid-stream.
Problem
graphql-js
does not try/catch/handle errors when iterating over an AsyncIterable
:
graphql-js/src/execution/mapAsyncIterable.ts
Lines 38 to 40 in 2aedf25
There's even a test case today that explicitly expects these errors to be re-thrown:
graphql-js/src/execution/__tests__/subscribe-test.ts
Lines 1043 to 1047 in 8a95335
graphql-ws
doesn't try/catch/handle errors thrown during iteration either:
As a result, when occasional errors happen like this, the entire underlying WebSocket connection is closed.
This is obviously not good! 😅 This interrupts every other subscription the client may be subscribed to at that moment, adds reconnection overhead, drops events, etc. And if we're experiencing some downtime on a specific subscription/source stream, this'll result in repeat disconnect-reconnect thrash, because the client also has no signal on which subscription has failed!!
Inconsistency
You could argue that graphql-ws
should try/catch these errors and send back an error
message itself. The author of graphql-ws
believes this is the domain of graphql-js
, though (enisdenjo/graphql-ws#333), and I agree.
That's because graphql-js
already try/catches and handles errors both earlier in the execution of a subscription and later:
-
Errors producing an
AsyncIterable
in the first place (the synchronous result of calling the subscription'ssubscribe
method, AKA producing a source event stream in the spec) are caught, and returned as a{data: null, errors: ...}
result:graphql-js/src/execution/execute.ts
Lines 1784 to 1793 in 2aedf25
-
Errors mapping iteration results to response events (the result of calling the subscription's
resolve
method) are caught, and sent back to the client as a{value: {data: null, errors: ...}, done: false}
event:graphql-js/src/execution/execute.ts
Lines 1726 to 1735 in 2aedf25
So it's only iterating over the AsyncIterable
— the "middle" step of execution — where graphql-js
doesn't catch errors and convert them to {data: null, errors: ...}
objects.
This seems neither consistent nor desirable, right?
Alternatives
We can change our code to:
- Have our
AsyncIterable
never throw innext()
(try/catch every iteration ourselves)- Have it instead always return a wrapper type, mimicking
{data, errors}
- Have it instead always return a wrapper type, mimicking
- Define a
resolve
method just to unwrap this type (even if we have no need for custom resolving otherwise)- And have this
resolve
methodthrow
anyerrors
orreturn data
if no errors
- And have this
Doing this would obviously be pretty manual, though, and we'd have to do it for every subscription we have.
Relation to spec
Given the explicit test case, I wasn't sure at first if this was an intentional implementation/interpretation of the spec.
I'm not clear from reading the spec, and it looks like at least one other person wasn't either: graphql/graphql-spec#995.
But I think my own interpretation is that the spec doesn't explicitly say to re-throw errors. It just doesn't say what to do.
And I believe that graphql-js
is inconsistent in its handling of errors, as shown above. The spec also doesn't seem to clearly specify how to handle errors creating source event streams, yet graphql-js
(nicely) handles them.
I hope you'll consider handling errors iterating over source event streams too! Thank you.
Activity
Update execute.ts
yaacovCR commentedon Mar 19, 2024
I agree that the spec is agnostic, and that it would be useful for graphql-js to be consistent and provide explanatory errors. I think the spec should also be improved.
For context, it seems from #918 that prior to that PR, all subscribe errors threw, and that the argument was made there that explanatory errors would be helpful in some cases. The parts of the PR that I skimmed through doesn't seem to indicate why explanatory errors to the client would not be helpful with iteration errors; my suspicion is that the PR was attacking the low-hanging fruit, and the authors/reviewers there would not necessarily object to even more explanatory errors. :)
@robzhu @leebyron
yaacovCR commentedon Mar 29, 2024
I think the next step would be to raise this topic at a working group meeting. @aseemk are you interested in championing this there? (I am potentially dangerously assuming that this hasn’t happened already…)
yaacovCR commentedon Oct 14, 2024
graphql/graphql-spec#1099 has editorial changes to the event stream that I am not sure are 100% clear on this point. The way forward I think still goes through a discussion at a WG meeting.
yaacovCR commentedon Nov 7, 2024
This was discussed at the November 2024 WG:
My interpretation of the conclusions:
graphql-js
should never throw, but that would have to be reflected in the spec, which is currently being worked on with respect to event streams at Editorial changes for Event Streams graphql-spec#1099 => so that this change in graphql-js may be able to move forward together with the spec language clarification therebenjie commentedon Nov 13, 2024
I, personally, think that event streams that raise an error during execution (typically after yielding results from the stream) should cause the client stream to be terminated via emitting an error - we should not terminate successfully with a GraphQL error payload.Thegraphql()
function itself should basically never throw,but successfully returning an async iterable which later yields an error is not the same.It's already possible (I think?) for users to wrap the async iterables that they return from
subscribe()
in such a way that errors are absorbed and the iterable terminates cleanly if such behavior is desired.yaacovCR commentedon Nov 14, 2024
Fwiw, the original poster @aseemk suggested this but was trying to avoid a per-subscription solution.
To your preference:
@benjie would you be able to elaborate more on your reasoning for this preference?
More specifically, I think the proposal would be to return a final
{ errors: <GraphQLErrror> }
as the form in which the iterator completes with an error.... rather than simply throwing.I think it's important to consider another failure mode. What is the
CreateSourceEventStream()
algorithm succeeds and produces the event stream, but theExecutionSubscriptionEvent()
algorithm throws a "request errors," emitting anerrors
payload of this form. If that did happen, I would think the service processing this stream should stop as soon as the first "request error" is generated.What might be an example of a "request error" of this type? Well, it would not happen with failure of variable coercion, because the variables have already been coerced, that's one of the main differences between
ExecutionSubscriptionEvent()
andExecuteQuery()
.But it could be that the[Not sure why I forgot that for subscriptions evenqueryType
doesn't exist (although that's pretty contrived). If the service did not stop on that request error, it would receive a "request error" for every event, and it would make sense to stop right away.ExecutionSubscriptionEvent()
runs the regular field resolvers from the subsciption root type also, not the query root type, meaning this type ofrequest error
cannot exist.]I can't think of a practical example of a "request error" for[As @benjie points out later, it doesn't seem like a request error can be made fromExecutionSubscriptionEvent()
, but actually for exactly that reason I think it's safe to dictate that a "request error" of any type is basically equivalent to the response stream "completing with error," and we do not have to reserve a separate failure mode to signify that the response stream has in fact "completed with error."ExecutionSubscriptionEvent()
at all!]Gaming out what might be driving your preference, maybe you are suggesting that even if that is the case now, it would be prudent to reserve the ability for services processing the response stream to distinguish between these two types of events, and so we should preserve the distinct failure modes. I think that's fair, but I would love to hear more about your exact motivation.
I would say that the way this should be handled should not be on a per-subscription basis, but just by response stream processing services like
graphql-ws
catching the errors and reporting them more cleanly. The original poster actually suggested that services likegraphql-ws
handle this, but it was argued that this might violate the spec. I'm not sure why that would be. Maybe @enisdenjo could chime in, I know it's been a while since this was first raised, but I am not sure if I have the answer from the comment over here. It sounds likegraphql-ws
would like to report the error as an event => is that limited in any way by whatgraphql-js
decides to do?benjie commentedon Nov 21, 2024
My concern was that completing a stream successfully (but with the final payload having errors) and completing it with error (due to an underlying stream error) implicitly have different meanings, and was concerned that a final payload with just
{ errors }
wouldn't be sufficient to differentiate this. However, more careful scrutiny of ExecuteSubscriptionEvent reveals thatdata
is always set (even if set tonull
in the case ofExecuteSelectionSet
handling an error), and thus not settingdata
in the final payload would be a clear signal this relates to the stream itself rather than the selection set, so I think this would be an acceptable (albeit potentially breaking) change.All that said, in the event of an error in a single subscription stream across a multiplexed protocol (such as
graphql-ws
), only that single stream in the multiplex should be terminated. This seems to align with theError
event in the graphql-ws protocol, so I'm surprised to hear that the implementation terminates the multiplex and not just the single stream? I think this specific issue is a separate (but related) concern.benjie commentedon Nov 21, 2024
Here's my first punt at this: graphql/graphql-spec#1126Here's a go at making this change on top of Lee's editorial changes:
sourceStream
errors, yield a{ errors: [...] }
response graphql-spec#1127sourceStream
errors, yield a{ errors: [...] }
response graphql/graphql-spec#1127benjie commentedon Nov 21, 2024
One important note here is that internal errors will still result in the stream closing with an error - this should still not terminate the entire multiplex, only the individual stream within it.
enisdenjo commentedon Nov 26, 2024
Just thinking of cases where internal errors span across the whole GraphQL instance - but couldnt think of any. That being said - I agree. graphql-ws should change to use the "error" message type for internal errors too.
15 remaining items