Skip to content

How to express encoded size vs. visible size vs. natural size #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pthatcherg opened this issue Sep 18, 2019 · 8 comments
Closed

How to express encoded size vs. visible size vs. natural size #26

pthatcherg opened this issue Sep 18, 2019 · 8 comments

Comments

@pthatcherg
Copy link
Contributor

When we express sizes, how do we distinguish between encoded size, visible size, and natural size. From what I understand:

  • coded size: the resolution of the encoded picture, always a multiple of the macroblock size (eg. 1920x1088).
  • visible region: the region of the coded picture that is valid image data (eg. 1920x1080@0,0).
  • natural size: the intended display size assuming square display pixels (that is, after applying the pixel aspect ratio).
@guest271314
Copy link
Contributor

The only minimal example that can point to is the encoding of a video MediaStreamTrack at Chromium, Chrome using MediaRecorder (with VP8 or VP9 codec) for an unknown reason (potentially involving NaturalSize() usage) here outputs a different image at HTML <video> element at Chromium, the browser that created the file, and Firefox, where the expected output is displayed using the same file. Evidently that change to the source affected captureStream() which crashes the tab when executed on an HTML <video> element that contains variable resolution frames. Mozilla browsers do not have those issues.

If input is 400x300, 300x150 spanning two seconds, one second each respectively, the API should ensure (by testing) the output will be 400x300 one second, 300x150 once second. Not only 400x300 for the entire two seconds. Or arbitrary output, e.g., Epiphany Technology Preview, which scales larger frames to lower resolutions and scales smaller frames to larger resolutions when variable resolution frames are encoded in the video track.

The API should not arbitrarily encode the image in such a way as the output will be dissimilar from the input unless explicitly set.

@markafoltz
Copy link
Contributor

Is this issue addressing settings (expectedWidth/expectedHeight)? Or stats?

@pthatcherg
Copy link
Contributor Author

expectedWidth/expectedHeight is just for faster codec initialization (it's slower if you wait for the first frame).

I was thinking something more like for decode:

  1. We don't deal with pixel ratios. That would be a problem for rendering, not encode/decode (similar to sample rate changes post decode for audio).

  2. The .size on the image is the visible size, ignoring any extra region that's there to make the encoding work.

  3. We don't deal with coded size/region until someone asks and says they need it (why would they?)

As for encoding, I need to learn more about what one would expect to happen if you passed in a resolution that doesn't fit macro blocks.

@pthatcherg
Copy link
Contributor Author

Is this issue addressing settings (expectedWidth/expectedHeight)? Or stats?

It's addressing the metadata of the output frame from the decoder.

@guest271314
Copy link
Contributor

Do not do this Chromium does not dispatch resize event for variable resolution frames and MediaStreamTrack.getSettings() outputs incorrect values for width and height for WebM video output by MediaRecorder at Chromium (arbitrarily change the output to suit only the perspective of the author of the source code, emitting incorrect values relevant to the actual underlying encoded frame).

@guest271314
Copy link
Contributor

Does this issue cover how values are expressed for the decoder portion of WebCodecs?

// Finally the decoders are created and the encoded media is piped through the decoder
// and into the TrackerWriters which converts them into MediaStreamTracks.
// ...
const videoDecoder = new VideoDecoder({codec: 'vp8'});

Kindly clarify if VideoDecoder() will be code independent from the implementers' video decoder(s). That is very important if you are expected reliable values to be output for "size" of the current frame.

@guest271314
Copy link
Contributor

It must be pointed out here that the value of size of the encoded frame will be moot if the decoder does not read each frame when the developer intentionally encodes variable resolution frames intended to be output at HTML <video> element.

It may be necessary to consider a WebMediaPlayer (with capability to select and de-select the appropriate decoder) to substitute for HTML <video> element (and implementer default video decoder) if the underlying encoded frame is expected to be displayed as it was input relevant to pixel dimensions instead of arbitrarily scaled to pixel dimensions set at the container metadata only.

An implementer of browser video decoders and HTML <video> can deliberately ignore potentially variable encoded frames which are written after the initial pixel dimensions of the first frame. The reason for ignoring variable resolution frames could be unclear, though may be based on targeting specific devices which have maximum screen dimensions ("smart" phones; handhelds; tablets). What is interesting when such decisions are made is that invariably when an advertisement or promotion to generate revenue for the parent concern needs to be displayed at HTML <video> element or custom player as an overlay in the corner or center of the screen the implementer finds a way to display the exact dimensions of the underlying frames.

@chcunningham
Copy link
Collaborator

For the root question of "how to expose", please see the solution in draft spec for the VideoFrame interface. I'll give a summary of the current situation below. Please open new bugs for specific issues with this proposal.

  • encoded size -> "coded size" (pixel size of decoded frame prior to any cropping or scaling)
  • visible size -> "crop size" (cropping applied to coded size)
  • natural size -> "display size" (scaling applied to crop size to achieve final display aspect ration)

VideoFrame has all three. Self explanatory.

VideoDecoderConfig has all three. This matches existing generic codec APIs like FFmpeg where not all codecs used have a provision to describe their size in-band. For codecs that do have that capability, the config stuff is just an initial hint.

In chrome, we will use the initial VideoDeocderConfig to pin pixel aspect ratio (PAR) for the stream, while allowing display aspect ratio (DAR) to change per frame. This has been the longstanding behavior of Chrome's

We will not put any size info in the EncodedVideoChunk. We don't have any use for it.

We will put both display size and crop size in VideoEncoderConfig. Crop size is the critical bit, but display size is also nice to have for encoders that want to describe that in-band. Right now the encoder API only has a vague width and height - we'll fix that (#93).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants