You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With Queue and Variable we now have two mechanisms to share small data or futures (pointing to actual data) between clients. Both of these systems route data through the central scheduler.
We might consider adding Pub and Sub to this family where all entries sent into a Pub arrive at all currently subscribed Subs on the same topic. This is somewhat different from the current Queue case, in which all elements are consumed by only one subscriber.
Additionally, we may want to send data directly between workers, rather than mediating through the scheduler. This would allow us to send large numeric datasets on PubSub systems with a single network hop and without saturating the scheduler bandwidth. This would be particularly useful for some advanced machine learning workloads.
As a first implementation the scheduler would track all clients currently subscribing to publishing on a particular topic. It would inform all clients whenever a new client subscribed or published. Clients would send data directly between each other using their attached workers' servers. This is brittle in a few ways, particularly when a single client is expected to broadcast large amounts of data to many workers, but we might be able to solve that in the future if it arises.
So to be clear there are two topics here:
A pub-sub model of inter-client communication
Direct worker/client-worker/client communication, bypassing the scheduler
With
Queue
andVariable
we now have two mechanisms to share small data or futures (pointing to actual data) between clients. Both of these systems route data through the central scheduler.We might consider adding
Pub
andSub
to this family where all entries sent into aPub
arrive at all currently subscribedSub
s on the same topic. This is somewhat different from the currentQueue
case, in which all elements are consumed by only one subscriber.Additionally, we may want to send data directly between workers, rather than mediating through the scheduler. This would allow us to send large numeric datasets on PubSub systems with a single network hop and without saturating the scheduler bandwidth. This would be particularly useful for some advanced machine learning workloads.
As a first implementation the scheduler would track all clients currently subscribing to publishing on a particular topic. It would inform all clients whenever a new client subscribed or published. Clients would send data directly between each other using their attached workers' servers. This is brittle in a few ways, particularly when a single client is expected to broadcast large amounts of data to many workers, but we might be able to solve that in the future if it arises.
So to be clear there are two topics here:
cc @MLnick
The text was updated successfully, but these errors were encountered: