Skip to content

Add WorkflowUpdateRPCTimeoutOrCancelledError #548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 20, 2024

Conversation

cretz
Copy link
Member

@cretz cretz commented Jun 10, 2024

What was changed

  • Add temporalio.client.RPCTimeoutOrCancelledError that is a base exception class that can be used for RPC timeout/cancelled errors
  • Add temporalio.client.WorkflowUpdateRPCTimeoutOrCancelledError that extends RPCTimeoutOrCancelledError and is raised when any update RPC call times out or is cancelled
  • Tests to confirm

See temporalio/features#483. I will update current features repo Python CI branch just before merging this.

Checklist

  1. Closes Wrap GRPC::CANCELED and DEADLINE_EXCEEDED in an SDK Timeout exception for Update #529

@cretz cretz requested a review from a team as a code owner June 10, 2024 21:43
pass


class WorkflowUpdateRPCTimeoutOrCancelledError(RPCTimeoutOrCancelledError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like feedback on whether this is what we want to call this. Options:

  • WorkflowUpdateRPCTimeoutOrCancelledError
  • WorkflowUpdateRPCTimedOutOrCancelledError
  • WorkflowUpdateRPCTimeoutOrCancellationError

There may be some other options, but it's important that we include "rpc" in here so users know it's about the RPC call and not some potential future concept of update timeout/cancellation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd voted TimedOut, since that consistent with cancelled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Will leave open a bit to get more opinions (and confer w/ team before decision)

Copy link

@drewhoskins-temporal drewhoskins-temporal Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Timeout for search optimization despite grammatical oddness. It's what people in python expect.

it's important that we include "rpc" in here so users know it's about the RPC call and not some potential future concept of update timeout/cancellation

👎 For me, the main point of this exercise is to reduce the size and complexity of the handler block and the number of times users need to add logic. Thus we want to capture all timeouts into one exception. If we need to disambiguate in the future for power users, let's add fields.

Taking this idea further, please consider one more option. Given that we have the GRPC::Canceled inner exception, that also lets us keep the name simple. For example, we can just do WorkflowUpdateTimeoutError leaving us the ability to have a Canceled error when we add update cancels. But I don't feel strongly about this one.

Copy link

@drewhoskins-temporal drewhoskins-temporal Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chad and I chatted offline. He's thinking of when we add a "run timeout" to the update, and I'm convinced that we should distinguish "the client gave up" from "the workflow gave up".

Lemme clarify my thoughts. The thing that snags me about "RPC" specifically is it's a leaky abstraction, at least the way I read it. You could easily implement these patterns by polling or by having two RPC calls (start and then wait vs execute), in which case the meaning of "RPC" changes and could be ambiguous. If I put an rpc_timeout=30 but I have client retries, what does that mean? Does each RPC get 30 seconds, or do I as the user want to wait 30 seconds total?

What we really mean, semantically, is "the client gave up" so in the abstract I would prefer terms like ClientTimeoutError. (And maybe that's also general enough to encompass the "client canceled" meaning?) But since we already expose fields like rpc_timeout, we've already crossed that bridge, so I'm fine with the naming you chose if you like it better or feel it's more consistent with other parts of the product.

When we add durable-on-admitted, it'd be nice if we could add a field on this exception like last_wait_stage_confirmed={None, Admitted, Accepted}. (But perhaps people who want that should use start_update(wait_stage=Admitted) and then poll on the handle.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've already crossed that bridge

Not necessarily. This is a brand new thing to wrap these kinds of gRPC errors, so we can change. We can have ClientTimeoutError and WorkflowUpdateClientTimeoutError, but need to confirm that that also includes Cancelled status codes and asyncio.Cancelled cancellation or if we just want it as DeadlineExceeded (I am not sure that "timeout" is a good name for something that includes cancellation).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave off the RPC prefix if the exceptions are already in the client package. Otherwise I have no strong opinion

@@ -4456,6 +4478,24 @@ def cause(self) -> BaseException:
return self.__cause__


class RPCTimeoutOrCancelledError(temporalio.exceptions.TemporalError):
"""Error that occurs on some client calls that timeout or get cancelled."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Error that occurs on some client calls that timeout or get cancelled."""
"""Error that occurs on some client calls when they time out or get cancelled."""

pass


class WorkflowUpdateRPCTimeoutOrCancelledError(RPCTimeoutOrCancelledError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd voted TimedOut, since that consistent with cancelled

@cretz cretz force-pushed the timeout-or-canceled branch from c507760 to 55dfc66 Compare June 10, 2024 22:50
@drewhoskins-temporal
Copy link

drewhoskins-temporal commented Jun 11, 2024 via email

@cretz cretz merged commit 2331aa4 into temporalio:main Jun 20, 2024
12 checks passed
@cretz cretz deleted the timeout-or-canceled branch June 20, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrap GRPC::CANCELED and DEADLINE_EXCEEDED in an SDK Timeout exception for Update
4 participants