How to express encoded size vs. visible size vs. natural size #26

pthatcherg · 2019-09-18T02:24:57Z

When we express sizes, how do we distinguish between encoded size, visible size, and natural size. From what I understand:

coded size: the resolution of the encoded picture, always a multiple of the macroblock size (eg. 1920x1088).
visible region: the region of the coded picture that is valid image data (eg. 1920x1080@0,0).
natural size: the intended display size assuming square display pixels (that is, after applying the pixel aspect ratio).

guest271314 · 2019-09-18T02:49:34Z

The only minimal example that can point to is the encoding of a video MediaStreamTrack at Chromium, Chrome using MediaRecorder (with VP8 or VP9 codec) for an unknown reason (potentially involving NaturalSize() usage) here outputs a different image at HTML <video> element at Chromium, the browser that created the file, and Firefox, where the expected output is displayed using the same file. Evidently that change to the source affected captureStream() which crashes the tab when executed on an HTML <video> element that contains variable resolution frames. Mozilla browsers do not have those issues.

If input is 400x300, 300x150 spanning two seconds, one second each respectively, the API should ensure (by testing) the output will be 400x300 one second, 300x150 once second. Not only 400x300 for the entire two seconds. Or arbitrary output, e.g., Epiphany Technology Preview, which scales larger frames to lower resolutions and scales smaller frames to larger resolutions when variable resolution frames are encoded in the video track.

The API should not arbitrarily encode the image in such a way as the output will be dissimilar from the input unless explicitly set.

markafoltz · 2019-09-18T09:06:53Z

Is this issue addressing settings (expectedWidth/expectedHeight)? Or stats?

pthatcherg · 2019-09-19T21:54:38Z

expectedWidth/expectedHeight is just for faster codec initialization (it's slower if you wait for the first frame).

I was thinking something more like for decode:

We don't deal with pixel ratios. That would be a problem for rendering, not encode/decode (similar to sample rate changes post decode for audio).
The .size on the image is the visible size, ignoring any extra region that's there to make the encoding work.
We don't deal with coded size/region until someone asks and says they need it (why would they?)

As for encoding, I need to learn more about what one would expect to happen if you passed in a resolution that doesn't fit macro blocks.

pthatcherg · 2019-09-23T05:31:27Z

Is this issue addressing settings (expectedWidth/expectedHeight)? Or stats?

It's addressing the metadata of the output frame from the decoder.

guest271314 · 2019-09-23T14:00:08Z

Do not do this Chromium does not dispatch resize event for variable resolution frames and MediaStreamTrack.getSettings() outputs incorrect values for width and height for WebM video output by MediaRecorder at Chromium (arbitrarily change the output to suit only the perspective of the author of the source code, emitting incorrect values relevant to the actual underlying encoded frame).

guest271314 · 2019-09-25T14:35:40Z

Does this issue cover how values are expressed for the decoder portion of WebCodecs?

// Finally the decoders are created and the encoded media is piped through the decoder
// and into the TrackerWriters which converts them into MediaStreamTracks.
// ...
const videoDecoder = new VideoDecoder({codec: 'vp8'});

Kindly clarify if VideoDecoder() will be code independent from the implementers' video decoder(s). That is very important if you are expected reliable values to be output for "size" of the current frame.

guest271314 · 2019-09-25T14:59:17Z

It must be pointed out here that the value of size of the encoded frame will be moot if the decoder does not read each frame when the developer intentionally encodes variable resolution frames intended to be output at HTML <video> element.

It may be necessary to consider a WebMediaPlayer (with capability to select and de-select the appropriate decoder) to substitute for HTML <video> element (and implementer default video decoder) if the underlying encoded frame is expected to be displayed as it was input relevant to pixel dimensions instead of arbitrarily scaled to pixel dimensions set at the container metadata only.

An implementer of browser video decoders and HTML <video> can deliberately ignore potentially variable encoded frames which are written after the initial pixel dimensions of the first frame. The reason for ignoring variable resolution frames could be unclear, though may be based on targeting specific devices which have maximum screen dimensions ("smart" phones; handhelds; tablets). What is interesting when such decisions are made is that invariably when an advertisement or promotion to generate revenue for the parent concern needs to be displayed at HTML <video> element or custom player as an overlay in the corner or center of the screen the implementer finds a way to display the exact dimensions of the underlying frames.

chcunningham · 2020-10-17T00:17:22Z

For the root question of "how to expose", please see the solution in draft spec for the VideoFrame interface. I'll give a summary of the current situation below. Please open new bugs for specific issues with this proposal.

encoded size -> "coded size" (pixel size of decoded frame prior to any cropping or scaling)
visible size -> "crop size" (cropping applied to coded size)
natural size -> "display size" (scaling applied to crop size to achieve final display aspect ration)

VideoFrame has all three. Self explanatory.

VideoDecoderConfig has all three. This matches existing generic codec APIs like FFmpeg where not all codecs used have a provision to describe their size in-band. For codecs that do have that capability, the config stuff is just an initial hint.

In chrome, we will use the initial VideoDeocderConfig to pin pixel aspect ratio (PAR) for the stream, while allowing display aspect ratio (DAR) to change per frame. This has been the longstanding behavior of Chrome's

We will not put any size info in the EncodedVideoChunk. We don't have any use for it.

We will put both display size and crop size in VideoEncoderConfig. Crop size is the critical bit, but display size is also nice to have for encoders that want to describe that in-band. Right now the encoder API only has a vague width and height - we'll fix that (#93).

riju mentioned this issue Jun 4, 2020

Update spec riju/web-codecs#1

Merged

chcunningham closed this as completed Oct 17, 2020

sandersdan mentioned this issue Oct 19, 2020

It's hard to set size(width/height) information in VideoDecoderConfig #94

Closed

kainino0x mentioned this issue Apr 2, 2024

VideoFrame should use display size, not coded size gpuweb/gpuweb#4525

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to express encoded size vs. visible size vs. natural size #26

How to express encoded size vs. visible size vs. natural size #26

pthatcherg commented Sep 18, 2019

guest271314 commented Sep 18, 2019

Uh oh!

markafoltz commented Sep 18, 2019

Uh oh!

pthatcherg commented Sep 19, 2019

Uh oh!

pthatcherg commented Sep 23, 2019

Uh oh!

guest271314 commented Sep 23, 2019

Uh oh!

guest271314 commented Sep 25, 2019

Uh oh!

guest271314 commented Sep 25, 2019

Uh oh!

chcunningham commented Oct 17, 2020

Uh oh!

How to express encoded size vs. visible size vs. natural size #26

How to express encoded size vs. visible size vs. natural size #26

Comments

pthatcherg commented Sep 18, 2019

guest271314 commented Sep 18, 2019

Uh oh!

markafoltz commented Sep 18, 2019

Uh oh!

pthatcherg commented Sep 19, 2019

Uh oh!

pthatcherg commented Sep 23, 2019

Uh oh!

guest271314 commented Sep 23, 2019

Uh oh!

guest271314 commented Sep 25, 2019

Uh oh!

guest271314 commented Sep 25, 2019

Uh oh!

chcunningham commented Oct 17, 2020

Uh oh!