Skip to content

Add AlphaMode to keep/discard alpha channel #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 26, 2021

Conversation

tguilbert-google
Copy link
Member

@tguilbert-google tguilbert-google commented Jun 8, 2021

This PR adds the hasAlpha attribute to VideoFrame, and the AlphaOption to keep/discard alpha data when initializing or encoding VideoFrames.

Fixes #207


💥 Error: 500 Internal Server Error 💥

PR Preview failed to build. (Last tried on Jul 26, 2021, 9:29 PM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 CSS Spec Preprocessor - CSS Spec Preprocessor is the web service used to build Bikeshed specs.

🔗 Related URL

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
<hr>
<address>Apache/2.4.10 (Debian) Server at api.csswg.org Port 443</address>
</body></html>

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

Copy link
Collaborator

@aboba aboba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at EncodedVideoChunkMetadata (and recent discussions about extension) makes me wonder whether we shouldn't separate it out better (e.g. put temporalLayerId into its own dictionary).

See: w3c/webrtc-encoded-transform#9

@chcunningham
Copy link
Collaborator

editors call:

  • is alpha core for RTC use case as wll -> yes
  • should we separate SVC metadata into its own sub-bucket -> sure (svcMetadata)

Copy link
Collaborator

@chcunningham chcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

index.src.html Outdated
@@ -2819,6 +2842,18 @@
The {{VideoFrame/duration}} getter steps are to return
{{VideoFrame/[[duration]]}}.

: <dfn attribute for=VideoFrame>hasAlpha</dfn>
:: Whether or not the {{VideoFrame}} has an alpha channel. If this is `false`,
the frame is considered to be opaque, and must be rendered as such.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expand more on the contrasting false vs true behavior. Something like

When false

  • the frame must be treated as fully opaque (for example, by any method which takes a CanvasImageSource)
  • for planar formats, an alpha plane must not be present
  • for interleaved formats, blocks of samples will contain an 'X' channel where the alpha channel may otherwise have been. The position of the 'X' channel is defined by [[format]]. The values of the 'X' channel are undefined and must be ignored when rendering.

When true

  • the frame must be considered transparent for the pixels and degrees indicated by the alpha values.
  • for planar formats, the alpha plane must appear as an additional plane following the non-alpha planes defined by [[format]]
  • for interleaved formats, the alpha channel will appear at the 'A' position as defined by [[format]]

Just a draft, could probably use additional editing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for the pixels and degrees ...

Is pixel sufficient? Degrees = U and V for example?

Done. I did made some small changes, not necessarily improvements.

@chcunningham
Copy link
Collaborator

@aboba wrote

Looking at EncodedVideoChunkMetadata (and recent discussions about extension) makes me wonder whether we shouldn't separate it out better (e.g. put temporalLayerId into its own dictionary).

@tguilbert-google could you also tackle this (i.e. metadata.temporalLayerId -> metadata.svc.temporalLayerId)? Could be a follow up.

@sandersdan
Copy link
Contributor

I see that the plan has changed from using just RGBA and BGRA to also include RGBX and BGRX. This doesn't seem very consistent to me:

  • I420/I420A can only be distinguished by hasAlpha.
  • RGBX/RGBA can be distinguished by name alone.

It seems like if we are including 'X' RGB formats, then we should also include 'A' YUV formats. In that case hasAlpha is not required at all, it's just a convenience attribute that reflects a property that can be derived from the format name alone.

The original plan of not having 'X' formats was also consistent, in the opposite direction.

I don't see any reasoning for this change?

@chcunningham
Copy link
Collaborator

chcunningham commented Jun 10, 2021

I take your point.

I don't love RGBA + hasAlpha=false. To me that seems confusing given precedent that 'A' means alpha is there while 'X' means its ignored.

My earlier sense from googling this stuff was that 'X' variants of RGB are perhaps more prevalent than 'A' variants of YUV formats. But maybe that's not the case? Seems at least Ffmpeg and Gpac have both. Chrome has both I420 and I420A. I'm open to pursue 'A' and 'X'. Would this then remove the need for frame.hasAlpha entirely?

@sandersdan
Copy link
Contributor

sandersdan commented Jun 10, 2021

I don't love RGBA + hasAlpha=false. To me that seems confusing given precedent that 'A' means alpha is there while 'X' means its ignored.

I don't know how confusing it is. "Opaque" RGBA is quite common while RGBX is not something you are likely to see outside of media. A lot of code is able to handle them identically.

Encoding this information in the format doubles the number of formats, but it also makes the formats more precise. I don't have a strong preference myself.

Chrome's current internal support for I420A but not NV12A does suggest that these are indeed closely-related properties.

My earlier sense from googling this stuff was that 'X' variants of RGB are perhaps more prevalent than 'A' variants of YUV formats.

Likely a similar situation to mulaw; I420A doesn't fit in a fourcc.

Would this then remove the need for frame.hasAlpha entirely?

Yes. It's still a useful convenience, but we would also want other convenience metadata about formats and it's probably not VideoFrame where we would put all of it. (Something like VideoPixelFormat.hasAlpha(format) is more likely.)

@tguilbert-google
Copy link
Member Author

@aboba wrote

Looking at EncodedVideoChunkMetadata (and recent discussions about extension) makes me wonder whether we shouldn't separate it out better (e.g. put temporalLayerId into its own dictionary).

@tguilbert-google could you also tackle this (i.e. metadata.temporalLayerId -> metadata.svc.temporalLayerId)? Could be a follow up.

Yes I can. I will make this a separate PR.

@tguilbert-google
Copy link
Member Author

This latest iteration of the spec gets rid of the the Discard alpha algorithm, in favor of not exposing alpha when hasAlpha is false, even if the underlying media resource has an alpha channel.

If we got rid of hasAlpha, would we need to be more specific about how to handle differences in the format of the underlying media resource and that of the video frame.

In that case, it would be important to specify which formats map to which with AlphaOption.discard.

@chcunningham
Copy link
Collaborator

I don't know how confusing it is. "Opaque" RGBA is quite common while RGBX is not something you are likely to see outside of media. A lot of code is able to handle them identically.

Encoding this information in the format doubles the number of formats, but it also makes the formats more precise. I don't have a strong preference myself.

I also don't have a strong preference. Considering these points, it seems that this is slightly simplifying for users who don't care about the distinction. They don't need list 'x' variants in switch statements on pixel format. For planar formats, I can also imagine common code (e.g. if you do copyTo and just assume the returned layout has 3 planes, you're probably fine). Maybe that's enough reason to go with just hasAlpha, no 'x', no i420a, etc?

@padenot @aboba curious to get your thoughts.

@@ -2515,6 +2558,7 @@
dictionary VideoFrameInit {
unsigned long long duration; // microseconds
long long timestamp; // microseconds
AlphaOption alpha = "keep";
};

dictionary VideoFramePlaneInit {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the outcome of pending discussion is that we keep hasAlpha, we should probably add hasAlpha to this structure as well as a signal for how to interpret the data (e.g. you now have an extra plane for planar formats)

@chcunningham
Copy link
Collaborator

Editors call: @padenot @aboba having further look to consider the questions above.

@chcunningham
Copy link
Collaborator

(Editors call)

Question: If we do it via boolean, do we then allow for alpha combined with all pixel formats. Is that always sane? Does it create need for more elaborate error handling?

Related questions:

  • capabilities signalling: how to know your encoder doesn't support alpha. or a pixel format generally.

@padenot having closer look. @sandersdan, thoughts on the above

@dalecurtis
Copy link
Contributor

  • capabilities signalling: how to know your encoder doesn't support alpha. or a pixel format generally.

The alpha field is part of the VideoEncoderConfig. So if alpha == keep is set isConfigSupported() can reject it. Alpha encoding is always a separate (generally sw encoder), so it's mostly an implementation quality thing.

@chcunningham
Copy link
Collaborator

Question: If we do it via boolean, do we then allow for alpha combined with all pixel formats. Is that always sane? Does it create need for more elaborate error handling?

Discussed this with @sandersdan. He concludes that it is technically sane for any format to have "alpha", but this does create the mentioned headaches around UA support and error handling. This means frame.hasAlpha seems more error prone, and more complex, while not more useful. Hence, we prefer to drop that attribute and go with having alpha always described in the format string. Planar formats that don't suffix with an 'A' must not have alpha (I420 -> no alpha, I420A -> has alpha). Interleaved formats should signal an empty alpha channel using an X, and a non-empty alpha channel using an A (e.g. RGBX vs RGBA).

@padenot LMK if this sounds good and we'll update the PR accordingly.

@padenot
Copy link
Collaborator

padenot commented Jun 25, 2021

@padenot LMK if this sounds good and we'll update the PR accordingly.

I agree with this assessment. Would we expose all permutations at first, or would we only list what is going to be used in practice ? Is everything going to be used? I've never heard of NV12A or anything like that, does that even exist? ffmpeg doesn't have anything like that.

@chcunningham
Copy link
Collaborator

Would we expose all permutations at first, or would we only list what is going to be used in practice?

I would only expose the useful permutations with the implication being that implementers must support whatever we expose. List you've added in #288 looks good for now. I don't think we want to expose the full set of formats in ffmpeg, but that small list should cover most common cases and is probably already supported in every UA.

@padenot
Copy link
Collaborator

padenot commented Jun 26, 2021

I don't think we want to expose the full set of formats in ffmpeg

For sure. I was just using ffmpeg format list as a rough check of if a format even exists at all (e.g. NV12 + alpha), because ffmpeg is known to be quite a general tool.

@chcunningham
Copy link
Collaborator

Editors call: @aboba no concerns. I'll take other pass today, but probably LGTM

Copy link
Collaborator

@chcunningham chcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@chcunningham
Copy link
Collaborator

Nit: re-title to something like "Add AlphaMode to keep/discard alpha channel"

@chcunningham chcunningham changed the title Add VideoFrame.hasAlpha Add AlphaMode to keep/discard alpha channel Jul 26, 2021
@chcunningham chcunningham merged commit 9a6e94b into w3c:main Jul 26, 2021
github-actions bot added a commit that referenced this pull request Jul 26, 2021
SHA: 9a6e94b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit that referenced this pull request Jul 26, 2021
SHA: 9a6e94b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit that referenced this pull request Jul 26, 2021
SHA: 9a6e94b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@tguilbert-google tguilbert-google deleted the alpha branch August 2, 2021 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alpha support
7 participants