Skip to content

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alvestrand opened this issue Nov 9, 2021 · 10 comments
Closed

API shape for creating a WHATWG Stream from a MediaStreamTrack #68

alvestrand opened this issue Nov 9, 2021 · 10 comments

Comments

@alvestrand
Copy link
Contributor

Two shapes have been proposed:

Arguments on which to base a decision are needed.

@jan-ivar
Copy link
Member

jan-ivar commented Nov 9, 2021

Note this issue is solely over shape: A new interface object with a constructor track argument, vs. a new track attribute:

await new MediaStreamTrackProcessor({track}).readable.pipeThrough(transformer).pipeTo(writable);

vs.

await new track.readable.pipeThrough(transformer).pipeTo(writable);

Naming and exposure ([Exposed=DedicatedWorker] only, to reflect consensus) are orthogonal to this discussion.

@jan-ivar
Copy link
Member

jan-ivar commented Nov 9, 2021

Audio exposure is also orthogonal (video-only for now to reflect consensus).

@alvestrand
Copy link
Contributor Author

The rest of the platform follows the pattern of "we set a stream's destination to an object":

  • HTMLMediaElement.srcObject attribute, which takes a (changeable) Stream argument
  • MediaStreamAudioSourceNode, which takes a Stream argument to its constructor
  • RTCPeerConnection.AddTrack() and friends

In all those cases, a MediaStreamTrack that hasn't been attached generates nothing; a MediaStreamTrack can also be attached to multiple destinations simultaneously, and will generate content for all.

I think it is good to retain the pattern of "a MediaStreamTrack attaches to its destination, it isn't a destination".

(Please open separate issues on exposure and audio, where consensus has not been declared.)

@jan-ivar
Copy link
Member

I agree a track is neither a source nor a sink. It's a connector. But so are WHATWG readables/writables.

a MediaStreamTrack that hasn't been attached generates nothing

Tracks exist to be "attached". They're square plugs built for the square receptacles in the MST sources and sinks APIs. Developers at a semantic level use them to describe a chain or a tree:

camera -.-> track1 -.-> pc1
        |           '-> pc2
        | 
        '-> track2 -.-> video1
                    '-> video2

Similarly, readables exist be be piped. They're round plugs built for round receptacles (writables). Again, developers use them to describe a chain or a tree:

readable --tee()--> readable1 ---> {writable, readable} ---> writable1
            | 
            '-----> readable2 ---> {writable, readable} ---> writable2

So we need an adapter. The question is: is it a brick, or a smart cable?

a MediaStreamTrack can also be attached to multiple destinations simultaneously

Yes, MSTP has this advantage over track.readable (I had a slide on track.createReadable()) — But the fan-out is also arguably a liability. As I believe @aboba observed regarding tee(): why solve it when we have track.clone()? Consider:

camera ---> track1 -.-> MSTP1 ---> readable --> {writable, readable} ---> writable1
                    '-> MSTP2 ---> readable --> {writable, readable} ---> writable2

vs.

camera -.-> track1 ---> readable --> {writable, readable} ---> writable1
        '-> track2 ---> readable --> {writable, readable} ---> writable2

@aboba
Copy link
Contributor

aboba commented Nov 16, 2021

@alvestrand To answer your original question, there are some differences which are worth discussing:

  1. MSTP provides an init argument to the constructor, which contains maxBufferSize in addition to track. How would additional arguments like maxBufferSize be set when using track.readable instead? By creating a track property like track.maxBufferSize?
  2. The need to solve the feedback problem remains. Would feedback be provided by adding track.writable?

@guidou
Copy link
Contributor

guidou commented Nov 16, 2021

a MediaStreamTrack can also be attached to multiple destinations simultaneously

Yes, MSTP has this advantage over track.readable (I had a slide on track.createReadable()) — But the fan-out is also arguably a liability. As I believe @aboba observed regarding tee(): why solve it when we have track.clone()? Consider:

camera ---> track1 -.-> MSTP1 ---> readable --> {writable, readable} ---> writable1
                    '-> MSTP2 ---> readable --> {writable, readable} ---> writable2

vs.

camera -.-> track1 ---> readable --> {writable, readable} ---> writable1
        '-> track2 ---> readable --> {writable, readable} ---> writable2

What fanout is a liability and why?
Having two MSTPs gives you more flexibility. For example, you can have configure them with different buffering policies, which might be useful for some applications (e.g., one with a small buffer for some heavy processing, and one with no buffering for self view).
You can fix this with a createReadable() method instead of a readable field.
A small advantage of MSTP over a method on MST is that it gives us a localized object to add additional fields if necessary (e.g., more configuration options), whereas a field or method in track would require adding them to track (or some other object).

So, to summarize:

  • I see createReadable() and MSTP as better than a readable field.
  • I see MSTP with a small potential advantage over createReadable(), but I would in principle be OK with both. One issue in favor of MSTP is that production code is written against it, so I would need to see a clear benefit of createReadable() to prefer it.

@jan-ivar
Copy link
Member

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

For example, you can have configure them with different buffering policies

Or just clone the track (which mirrors advice given to avoid tee()). E.g. why have frame resolution control on one object and buffering on another?

Flexibility from a deeper tree has a complexity cost , so if we don't need it, we should avoid it, and KISS.

Counter-argument: cloning a track comes at a cost of having to manage constraints separately, so an application that didn't want that might appreciate the extra object. It also requires not forgetting to call stop() on all clones.

I think this comes down to ergonomics and preference. track.readable as a singular media handle seems simpler to reason about, and in most cases, individual control of constraints seems appropriate, since it will directly affect downstream bytes. Sharing these constraints with a video.srcObject, say, seems like a maintenance bug waiting to happen.

A readable is already lockable, so by default there's no resource overhead until a readable is used. However, I have some concerns about what happens when the readable becomes errored. MSTP doesn't appear to talk about that.

@guidou
Copy link
Contributor

guidou commented Nov 17, 2021

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

I disagree. The "complexity cost" you mention is negligible IMO, and the ability to have different processors with different configurations more than offsets that "cost".

For example, you can have configure them with different buffering policies

Or just clone the track (which mirrors advice given to avoid tee()). E.g. why have frame resolution control on one object and buffering on another?

How do you configure it differently with a readable field? Or are you suggesting adding any config params that control the readable field as additional fields in MST?
In the case of using clones you have the extra complexity of having to stop two tracks and two readables. Forcing the application to keep track of an additionaal objects that needs cleanup is clearly a much larger complexity cost than the concept of having a 1:N relationship between tracks and readable streams IMO.

Flexibility from a deeper tree has a complexity cost , so if we don't need it, we should avoid it, and KISS.

I don't know if I understand this sentence, but I don't see any deeper tree here. 1:N gives you a single tree of depth 2 if you want multiple readables. 1:1 with clone gives you N trees of depth 2. If anything, the latter is a lot more complex.

Counter-argument: cloning a track comes at a cost of having to manage constraints separately, so an application that didn't want that might appreciate the extra object. It also requires not forgetting to call stop() on all clones.

That's my point.

I think this comes down to ergonomics and preference. track.readable as a singular media handle seems simpler to reason about, and in most cases, individual control of constraints seems appropriate, since it will directly affect downstream bytes. Sharing these constraints with a video.srcObject, say, seems like a maintenance bug waiting to happen.

track.readable has a minor ergonomic advantage in some cases, at the cost of being a lot more complex for other nontrivial but very valid use cases. In terms of preference, I prefer the other point in the tradeoff.

@alvestrand
Copy link
Contributor Author

What fanout is a liability and why?

See slide. More 1-to-many relationships than necessary is a complexity cost in any API. Track and readable are both connectors, and trivially instantiated with clone or tee already. This begs the question why we need more connective tissue.

Flag warning: As we've discussed a lot, tee() is not trivial, and not suitable as-is for video streams. I think @jan-ivar took the token to write (in Javascript) a suitable replacement that would be satisfactory in the video streams context, for possible incorporation into the Streams spec.

@dontcallmedom dontcallmedom transferred this issue from w3c/mediacapture-extensions Jan 4, 2022
@alvestrand
Copy link
Contributor Author

alvestrand commented Jan 7, 2022

The consensus documented is that we have a MediaStreamTrackProcesor that takes a MediaStreamTrack as a constructor argument and produces a WHATWG stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants