-
-
Notifications
You must be signed in to change notification settings - Fork 732
Add actors for stateful operation #2109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Strongly behind this idea, I think it should enable new classes of algorithms. I'll get back to you on the questions you've raised here. |
Quick question: for the case of a counter, one obvious model would be to hold it on the scheduler, like a spark "accumulator". That would involve either executing code in the scheduler (bad), or the scheduler making tasks every time the object is invoked - which is fine. Is it the data structure that's the most attractive here, or the potential for non-scheduler data-flow? |
These would live on workers. This would be to support workloads that need
non-trivial distributed state, and that want to operate at lower latencies
/ higher bandwidths than the centralized scheduler could manage.
…On Thu, Jul 12, 2018 at 9:40 AM, Martin Durant ***@***.***> wrote:
Quick question: for the case of a counter, one obvious model would be to
hold it on the scheduler, like a spark "accumulator". That would involve
either executing code in the scheduler (bad), or the scheduler making tasks
every time the object is invoked - which is fine. Is it the data structure
that's the most attractive here, or the potential for non-scheduler
data-flow?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2109 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszOaPoq9lJMhfJh7JCEvSLom61Qprks5uF1_xgaJpZM4VMw-5>
.
|
Often we want to maintain some state within a task during a computation, and have various other tasks interact with that state.
Currenlty our approach to handling this is to have long-running tasks that open up connections to other workers and communicate over queues, pub-sub, etc.. This works, but can be awkward, and is commonly a source of confusion even among advanced users.
I was playing with Ray recently and really enjoyed their new Actor model. I believe that we should shamelessly steal ideas from it :) (cc @robertnishihara)
User API
This involves creating a class
And then submitting this somehow to be an actor. Many API approaches here might be good. I'll do something dumb for now.
Functions on this class trigger something slightly lighter weight than a task
And this thing can be passed around to other tasks
Limitations
The introduction of state on a worker is powerful, but also limiting. Functionality would probably differ from tasks in the following way:
Implementation
We might first enable workers to directly ask each other to execute tasks for them. No dependencies here, just "run this function for me please and hold onto the result until I or someone else asks for it".
ActorFutures would be like normal futures except that the scheduler would have no record of them, and they would also include the address of the worker on which they live and the ID of the actor to whom they belong.
We'll need to relax various validation checks that we use in testing to allow for this unexpected data to exist.
The text was updated successfully, but these errors were encountered: