API shape for creating a WHATWG Stream from a MediaStreamTrack #68

alvestrand · 2021-11-09T11:47:07Z

Two shapes have been proposed:

MediaStreamTrackProcessor: An object that takes a MediaStreamTrack as a constructor argument. https://alvestrand.github.io/mediacapture-transform/#track-processor
An attribute of MediaStreamTrack that is only visible on a worker. https://jan-ivar.github.io/mediacapture-transform/#track-readable

Arguments on which to base a decision are needed.

jan-ivar · 2021-11-09T16:33:05Z

Note this issue is solely over shape: A new interface object with a constructor track argument, vs. a new track attribute:

await new MediaStreamTrackProcessor({track}).readable.pipeThrough(transformer).pipeTo(writable);

vs.

await new track.readable.pipeThrough(transformer).pipeTo(writable);

Naming and exposure ([Exposed=DedicatedWorker] only, to reflect consensus) are orthogonal to this discussion.

jan-ivar · 2021-11-09T17:29:36Z

Audio exposure is also orthogonal (video-only for now to reflect consensus).

alvestrand · 2021-11-10T05:45:57Z

The rest of the platform follows the pattern of "we set a stream's destination to an object":

HTMLMediaElement.srcObject attribute, which takes a (changeable) Stream argument
MediaStreamAudioSourceNode, which takes a Stream argument to its constructor
RTCPeerConnection.AddTrack() and friends

In all those cases, a MediaStreamTrack that hasn't been attached generates nothing; a MediaStreamTrack can also be attached to multiple destinations simultaneously, and will generate content for all.

I think it is good to retain the pattern of "a MediaStreamTrack attaches to its destination, it isn't a destination".

(Please open separate issues on exposure and audio, where consensus has not been declared.)

jan-ivar · 2021-11-15T23:21:56Z

I agree a track is neither a source nor a sink. It's a connector. But so are WHATWG readables/writables.

a MediaStreamTrack that hasn't been attached generates nothing

Tracks exist to be "attached". They're square plugs built for the square receptacles in the MST sources and sinks APIs. Developers at a semantic level use them to describe a chain or a tree:

camera -.-> track1 -.-> pc1
        |           '-> pc2
        | 
        '-> track2 -.-> video1
                    '-> video2

Similarly, readables exist be be piped. They're round plugs built for round receptacles (writables). Again, developers use them to describe a chain or a tree:

readable --tee()--> readable1 ---> {writable, readable} ---> writable1
            | 
            '-----> readable2 ---> {writable, readable} ---> writable2

So we need an adapter. The question is: is it a brick, or a smart cable?

a MediaStreamTrack can also be attached to multiple destinations simultaneously

Yes, MSTP has this advantage over track.readable (I had a slide on track.createReadable()) — But the fan-out is also arguably a liability. As I believe @aboba observed regarding tee(): why solve it when we have track.clone()? Consider:

camera ---> track1 -.-> MSTP1 ---> readable --> {writable, readable} ---> writable1
                    '-> MSTP2 ---> readable --> {writable, readable} ---> writable2

vs.

camera -.-> track1 ---> readable --> {writable, readable} ---> writable1
        '-> track2 ---> readable --> {writable, readable} ---> writable2

aboba · 2021-11-16T13:36:43Z

@alvestrand To answer your original question, there are some differences which are worth discussing:

MSTP provides an init argument to the constructor, which contains maxBufferSize in addition to track. How would additional arguments like maxBufferSize be set when using track.readable instead? By creating a track property like track.maxBufferSize?
The need to solve the feedback problem remains. Would feedback be provided by adding track.writable?

guidou · 2021-11-16T14:16:45Z

a MediaStreamTrack can also be attached to multiple destinations simultaneously

Yes, MSTP has this advantage over track.readable (I had a slide on track.createReadable()) — But the fan-out is also arguably a liability. As I believe @aboba observed regarding tee(): why solve it when we have track.clone()? Consider:
camera ---> track1 -.-> MSTP1 ---> readable --> {writable, readable} ---> writable1
                    '-> MSTP2 ---> readable --> {writable, readable} ---> writable2
vs.
camera -.-> track1 ---> readable --> {writable, readable} ---> writable1
        '-> track2 ---> readable --> {writable, readable} ---> writable2

What fanout is a liability and why?
Having two MSTPs gives you more flexibility. For example, you can have configure them with different buffering policies, which might be useful for some applications (e.g., one with a small buffer for some heavy processing, and one with no buffering for self view).
You can fix this with a createReadable() method instead of a readable field.
A small advantage of MSTP over a method on MST is that it gives us a localized object to add additional fields if necessary (e.g., more configuration options), whereas a field or method in track would require adding them to track (or some other object).

So, to summarize:

I see createReadable() and MSTP as better than a readable field.
I see MSTP with a small potential advantage over createReadable(), but I would in principle be OK with both. One issue in favor of MSTP is that production code is written against it, so I would need to see a clear benefit of createReadable() to prefer it.

jan-ivar · 2021-11-16T20:49:39Z

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

For example, you can have configure them with different buffering policies

Or just clone the track (which mirrors advice given to avoid tee()). E.g. why have frame resolution control on one object and buffering on another?

Flexibility from a deeper tree has a complexity cost , so if we don't need it, we should avoid it, and KISS.

Counter-argument: cloning a track comes at a cost of having to manage constraints separately, so an application that didn't want that might appreciate the extra object. It also requires not forgetting to call stop() on all clones.

I think this comes down to ergonomics and preference. track.readable as a singular media handle seems simpler to reason about, and in most cases, individual control of constraints seems appropriate, since it will directly affect downstream bytes. Sharing these constraints with a video.srcObject, say, seems like a maintenance bug waiting to happen.

A readable is already lockable, so by default there's no resource overhead until a readable is used. However, I have some concerns about what happens when the readable becomes errored. MSTP doesn't appear to talk about that.

guidou · 2021-11-17T10:27:45Z

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

I disagree. The "complexity cost" you mention is negligible IMO, and the ability to have different processors with different configurations more than offsets that "cost".

For example, you can have configure them with different buffering policies

Or just clone the track (which mirrors advice given to avoid tee()). E.g. why have frame resolution control on one object and buffering on another?

How do you configure it differently with a readable field? Or are you suggesting adding any config params that control the readable field as additional fields in MST?
In the case of using clones you have the extra complexity of having to stop two tracks and two readables. Forcing the application to keep track of an additionaal objects that needs cleanup is clearly a much larger complexity cost than the concept of having a 1:N relationship between tracks and readable streams IMO.

Flexibility from a deeper tree has a complexity cost , so if we don't need it, we should avoid it, and KISS.

I don't know if I understand this sentence, but I don't see any deeper tree here. 1:N gives you a single tree of depth 2 if you want multiple readables. 1:1 with clone gives you N trees of depth 2. If anything, the latter is a lot more complex.

Counter-argument: cloning a track comes at a cost of having to manage constraints separately, so an application that didn't want that might appreciate the extra object. It also requires not forgetting to call stop() on all clones.

That's my point.

I think this comes down to ergonomics and preference. track.readable as a singular media handle seems simpler to reason about, and in most cases, individual control of constraints seems appropriate, since it will directly affect downstream bytes. Sharing these constraints with a video.srcObject, say, seems like a maintenance bug waiting to happen.

track.readable has a minor ergonomic advantage in some cases, at the cost of being a lot more complex for other nontrivial but very valid use cases. In terms of preference, I prefer the other point in the tradeoff.

alvestrand · 2021-11-17T14:23:02Z

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

Flag warning: As we've discussed a lot, tee() is not trivial, and not suitable as-is for video streams. I think @jan-ivar took the token to write (in Javascript) a suitable replacement that would be satisfactory in the video streams context, for possible incorporation into the Streams spec.

alvestrand · 2022-01-07T14:49:06Z

The consensus documented is that we have a MediaStreamTrackProcesor that takes a MediaStreamTrack as a constructor argument and produces a WHATWG stream.

dontcallmedom transferred this issue from w3c/mediacapture-extensions Jan 4, 2022

alvestrand closed this as completed Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

alvestrand commented Nov 9, 2021

jan-ivar commented Nov 9, 2021

Uh oh!

jan-ivar commented Nov 9, 2021

Uh oh!

alvestrand commented Nov 10, 2021

Uh oh!

jan-ivar commented Nov 15, 2021

Uh oh!

aboba commented Nov 16, 2021

Uh oh!

guidou commented Nov 16, 2021

Uh oh!

jan-ivar commented Nov 16, 2021

Uh oh!

guidou commented Nov 17, 2021

Uh oh!

alvestrand commented Nov 17, 2021

Uh oh!

alvestrand commented Jan 7, 2022 •

edited

Loading

Uh oh!

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

Comments

alvestrand commented Nov 9, 2021

jan-ivar commented Nov 9, 2021

Uh oh!

jan-ivar commented Nov 9, 2021

Uh oh!

alvestrand commented Nov 10, 2021

Uh oh!

jan-ivar commented Nov 15, 2021

Uh oh!

aboba commented Nov 16, 2021

Uh oh!

guidou commented Nov 16, 2021

Uh oh!

jan-ivar commented Nov 16, 2021

Uh oh!

guidou commented Nov 17, 2021

Uh oh!

alvestrand commented Nov 17, 2021

Uh oh!

alvestrand commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvestrand commented Jan 7, 2022 •

edited

Loading