Add SSE2 version of Mean16x4 #1814

brianpopow · 2021-11-07T16:47:52Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This PR add SSE2 version of Mean16x4 which is used during lossy encoding with mode 1.
Related to #1786

Profile Results

Before:

After:

The overall encoding time was 9002 ms.
Its an improvement, but only a small one. I guess every bit counts.

codecov · 2021-11-07T17:07:08Z

Codecov Report

Merging #1814 (7d8225b) into master (7495a91) will increase coverage by 0.22%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1814      +/-   ##
==========================================
+ Coverage   87.13%   87.35%   +0.22%     
==========================================
  Files         936      936              
  Lines       48128    48154      +26     
  Branches     6037     6038       +1     
==========================================
+ Hits        41934    42063     +129     
+ Misses       5190     5092      -98     
+ Partials     1004      999       -5

Flag	Coverage Δ
unittests	`87.35% <100.00%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs	`100.00% <100.00%> (ø)`
...rc/ImageSharp/Formats/Webp/Lossy/Vp8EncIterator.cs	`100.00% <100.00%> (ø)`
.../ImageSharp/Formats/Webp/Lossy/WebpLossyDecoder.cs	`97.56% <100.00%> (ø)`
src/ImageSharp/Formats/Webp/Lossy/YuvConversion.cs	`99.33% <100.00%> (+0.05%)`	⬆️
...rc/ImageSharp/Formats/Webp/Lossless/Vp8LEncoder.cs	`97.51% <0.00%> (+0.12%)`	⬆️
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs	`97.54% <0.00%> (+8.85%)`	⬆️
...ageSharp/Formats/Webp/Lossless/PredictorEncoder.cs	`98.29% <0.00%> (+9.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7495a91...7d8225b. Read the comment docs.

antonfirsov · 2021-11-08T20:14:46Z

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

+                fixed (byte* inputPtr = input)
+                fixed (ushort* tmpPtr = tmp)


Same recommendations to avoid pinning. Alternatively you can pin YuvIn before the for (k = 0; k < 16; k += 4) loop, and pass pointers to the method.

JimBobSquarePants · 2021-11-09T13:46:47Z

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

+                ref ushort outputRef = ref MemoryMarshal.GetReference(tmp);
+                Unsafe.As<ushort, Vector128<ushort>>(ref outputRef) = f0.AsUInt16();
+
+                dc[0] = (uint)(tmp[1] + tmp[0]);


It looks to me like if you reverse these span assignments you'll cut out 9 of 12 bounds checks.

antonfirsov · 2021-11-09T13:47:05Z

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

+                dc[0] = (uint)(tmp[1] + tmp[0]);
+                dc[1] = (uint)(tmp[3] + tmp[2]);
+                dc[2] = (uint)(tmp[5] + tmp[4]);
+                dc[3] = (uint)(tmp[7] + tmp[6]);


Isn't this the same as _mm_hadd_epi16 aka. Ssse3.HorizontalAdd?

I'm afraid 12 span indexer bound checks have measureable impact here. All of them seem unnecessary, since is tmp is always of 16 size and dc is always of 4 size. If we can't find any matching HorizontalAdd for this, maybe we should consider passing tmp as a pointer and and indexing dc with Unsafe.* stuff.

yes it is the same as Ssse3.HorizontalAdd, good catch

JimBobSquarePants · 2021-11-09T13:47:46Z

src/ImageSharp/Formats/Webp/Lossy/YuvConversion.cs

+        [MethodImpl(InliningOptions.ShortMethod)]
+        public static void YuvToBgr(int y, int u, int v, Span<byte> bgr)
+        {
+            bgr[0] = (byte)YuvToB(y, u);


Reverse these also.

antonfirsov · 2021-11-09T13:49:06Z

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

+#if SUPPORTS_RUNTIME_INTRINSICS
+            if (Sse2.IsSupported)
+            {
+                tmp.Clear();


Is this really needed? We override the contents in the end.

yeah i think you are right, its not needed

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

Reverse access to dc Co-authored-by: James Jackson-South <[email protected]>

JimBobSquarePants

LGTM 👍

antonfirsov · 2021-11-09T15:01:39Z

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

+                dc[3] = (uint)lower.GetElement(3);
+                dc[2] = (uint)lower.GetElement(2);
+                dc[1] = (uint)lower.GetElement(1);
+                dc[0] = (uint)lower.GetElement(0);


I'm not sure what does GetElement compile to but it can be terribly inefficient. dc[0...3] is 64 bits if I'm not mistaken:

Suggested change

dc[3] = (uint)lower.GetElement(3);

dc[2] = (uint)lower.GetElement(2);

dc[1] = (uint)lower.GetElement(1);

dc[0] = (uint)lower.GetElement(0);

Unsafe.As<uint, Vector64<short>>(ref MemoryMarshal.GetReference(dc)) = lower;

No sorry, it's actually 128 bits. It needs some widening with intrinsics then.

I think this GetLower is not needed at all, now that i think again about it.

What about this: 1452ba0

@antonfirsov so you think we should try to avoid GetElement entirely?

I think it's a good practice to avoid GetElement whenever possible. It looks like it compiles to VPEXTRW extracting scalar registers from the SIMD register, but then you do scalar copy for each value. We can get much better results with shuffling:
SHARPLAB

I think we are looking for _mm_unpacklo_epi16 / Sse2.UnpackLow interleaving lower with zeros to get equivalent uint values, but not sure how would this act with negative short-s. Can hadd end up having negative values?

Sse2.UnpackLow(lower, Vector128<short>.Zero)

That looks indeed much better. I dont think there can be negative values.

Thanks for your help!

using Sse2.Store looks even a tiny bit better. Should I go for that?

dc is stackallocked, so you can pass it as a pointer easily if you want to, though I'm not sure if it's worth the trouble.
The main benefit would be the removal of one Slice() on the call site, the disadvantage is the inconsistency of the method signature (input is a span, dc is a pointer).

I think its not worth it, lets keep the method signature consistent and keep it as it is.

brianpopow added 4 commits November 7, 2021 16:13

Add SSE2 version of Mean16x4

765f5a2

Make Mean16x4 static and move to LossyUtils

8b8871b

Move yuv related methods to YuvConversion class

984971e

Add Mean16x4 sse tests

0c96e37

brianpopow added area:performance formats:webp labels Nov 7, 2021

Merge branch 'master' into bp/meansse

e8c0d2c

antonfirsov reviewed Nov 8, 2021

View reviewed changes

brianpopow and others added 3 commits November 9, 2021 11:21

Avoid pinning

3c9c1bb

Merge branch 'master' into bp/meansse

0ca9d43

Merge master branch

9ab9e75

JimBobSquarePants reviewed Nov 9, 2021

View reviewed changes

antonfirsov reviewed Nov 9, 2021

View reviewed changes

JimBobSquarePants reviewed Nov 9, 2021

View reviewed changes

antonfirsov reviewed Nov 9, 2021

View reviewed changes

brianpopow and others added 4 commits November 9, 2021 14:58

Remove not need Clear of tmp buffer

1418e53

Merge branch 'master' into bp/meansse

9e143ef

Use Ssse3.HorizontalAdd

3cfa040

Reverse access to bgr

84732bf

JimBobSquarePants reviewed Nov 9, 2021

View reviewed changes

src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs Outdated Show resolved Hide resolved

Update src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs

50013d7

Reverse access to dc Co-authored-by: James Jackson-South <[email protected]>

JimBobSquarePants approved these changes Nov 9, 2021

View reviewed changes

antonfirsov reviewed Nov 9, 2021

View reviewed changes

brianpopow added 3 commits November 9, 2021 16:41

Change IsSupported check from SSE2 to Ssse3

f0cb89e

Remove not needed GetLower

1452ba0

Use UnpackLow to set the dc values

7d8225b

brianpopow merged commit 255226b into master Nov 9, 2021

brianpopow deleted the bp/meansse branch November 9, 2021 22:46

dependabot bot mentioned this pull request Jul 23, 2025

Bump SixLabors.ImageSharp from 1.0.4 to 2.1.10 niaid/puppeteer-sharp-test#12

Closed

This was referenced Sep 23, 2025

Bump the nuget group with 5 updates norschel/enterJSWebSecurity2025-Demo1#8

Open

Bump SixLabors.ImageSharp from 1.0.3 to 2.1.11 sajadbz/Sbz-clean-architecture#10

Open

This was referenced Oct 3, 2025

Bump SixLabors.ImageSharp from 1.0.0-beta0006 to 2.1.11 sonusathyadas/watermarkfn#3

Open

Bump SixLabors.ImageSharp from 1.0.0-beta0004 to 2.1.11 pikoro/Nadeko#2

Open

Uh oh!

Add SSE2 version of Mean16x4 #1814

Add SSE2 version of Mean16x4 #1814

Uh oh!

Conversation

brianpopow commented Nov 7, 2021

Prerequisites

Description

Profile Results

Uh oh!

codecov bot commented Nov 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

antonfirsov Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianpopow Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Nov 7, 2021 •

edited

Loading

JimBobSquarePants Nov 9, 2021 •

edited

Loading

antonfirsov Nov 9, 2021 •

edited

Loading

antonfirsov Nov 9, 2021 •

edited

Loading

brianpopow Nov 9, 2021 •

edited

Loading