Assess integrating ownership

Timely dataflow exclusively moves *owned* data around. Although operators are defined on `&Stream` references, the closures and such they take as arguments can rely on getting buffers of owned data. The `map` operator will, no matter what, be handed owned data, and if this means that the contents of a stream need to be cloned, timely will do that.

This seems a bit aggressive in cases where you want to determine the length of some strings, or where you have a stream of large structs (e.g. 75 field DB records) and various consumers want to pick out a few fields that interest them.

It seems possible (though not obviously a good idea) that this ownership could be promoted so that timely dataflow programmers can control cloning and such by having

1. Owned streams, `Stream<Data>` with methods that consume the stream and provide owned elements of type `Data`.

2. Reference streams, `&Stream<Data>` whose methods provide only access to `&Data` elements, which the observer could then immediately clone if they want to reconstruct the old behavior. Maybe a `.cloned()` method analogous to `Iterator`s `.cloned()` method?

I think this sounds good in principle, but it involves a lot of careful detail in the innards of timely. Disruption is fine if it fixes or improves things, but there are several consequences of this sort of change. For example, 

1. Right now `Exchange` pacts require shuffling the data, and this essentially *requires* owned data, when we want to shuffle to other threads in process. It does *not* require it for single thread execution (because the exchange is a no-op) nor for inter-process exchange (because we serialize the data, which we can do from a reference).

2. Similarly, `exchanged` data emerges as "unowned" references when it comes out of Abomonation. Most operators will still hit this with a `clone`, though for large value types the compiler can in principle optimize this down. To determine the length of a `String`, the code would almost certainly still clone the string to check its length (it does this already, so this isn't a defect so much as a missable opportunity).

3. Anything we do with references needs to happen as the data are produced at their source, or involve `Rc<_>` types wrapping buffers and preventing the transfer of owned data. This is because once we enqueue buffers for other operators, we shift the flow of control upward and lose an understanding of whether the references remain valid. This *could* be fine, because we can register actions the way we currently register "listeners" with the `Tee` pusher, but would need some thought.

4. Operators like `concat` currently just forward batches of owned data. If instead they wanted to move references around, it seems like they would need to act as a proxy for any interested listeners, forwarding their interests upstream to each of their sources (requiring whatever the closure used to handle data be cloneable or something, if we want this behavior).

5. Such a "register listeners" approach could work very well for operators like `map`, `filter`, `flat_map` and maybe others, where their behavior would be to wrap a supplied closure with another closure, so that a sequence of such operators turn in to a (complicated) closure that the compiler could at least work to optimize down.

Mostly, shifting around *where* computation happens, as well as whether parts of the timely dataflow graph are just fake elements that are actually a stack of closures, is a bit of a disruption for timely dataflow, but it could be that doing it sanely ends up with a system where we do less copying, more per-batch work with data in the cache, eager filtering and projection, and lots of other good stuff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assess integrating ownership #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assess integrating ownership #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions