Improvements to the "Sum" SIMD algorithm #1112

tannergooding · 2018-10-02T00:04:59Z

Does some cleanup so that we have a single "Sum" algorithm (rather than one for aligned and one for unaligned inputs).

For inputs with fewer elements than can fit in the Vector type, it falls back to scalar code.
For inputs that are not naturally aligned (the alignment is not a multiple of 4), it does exclusively unaligned loads
For all other inputs, it will do at most two unaligned loads (one each for any leading/trailing unaligned elements) and all other loads will be aligned.

tannergooding · 2018-10-02T00:05:15Z

CC. @danmosemsft, @eerhardt, @Anipik

tannergooding · 2018-10-02T00:05:46Z

This is a simple example of #836

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

tannergooding · 2018-10-02T01:38:37Z

Results in some minor perf improvements for the Microsoft.ML.CpuMath.PerformanceTests.

Before:

Method	Mean	Error	StdDev
Avx.SumU	149.6 us	0.9997 us	0.9351 us
Native.SumU	262.2 us	2.064 us	1.931 us
Sse.SumU	263.1 us	1.504 us	1.407 us

After:

Method	Mean	Error	StdDev
Avx.Sum	129.7 us	1.103 us	0.9775 us
Native.Sum	261.2 us	1.013 us	0.8983 us
Sse.Sum	255.3 us	1.521 us	1.422 us

tannergooding · 2018-10-02T01:56:49Z

This can serve as the basis for the other algorithms as well. Generally the only tweaking that needs to happen is when dealing with leading/trailing elements, where you may need additional masking/etc to get the elements lined up correctly. For example, Scale requires you to ensure the masked out elements are the original value, rather than zero (which requires a couple additional instructions).

tannergooding · 2018-10-02T03:59:14Z

Test failure is for NormalizerTests.LpGcNormAndWhiteningWorkout and is due to a baseline diff for NormalizerTests.LpGcNormAndWhiteningWorkout where the last couple digits of the result may differ due to different indices being summed together depending on the alignment of the input data (for floating point: a + b + c can produce a different result than a + c + b).

tannergooding · 2018-10-02T04:00:09Z

Baseline	Diff	Delta
`-0.176903129`	`-0.1769031`	`-0.000000029`
`0.114987023`	`0.114987031`	`-0.000000008`
`-0.153417692`	`-0.153417677`	`-0.000000015`
`-0.109801926`	`-0.109801918`	`-0.000000008`
`-0.0158602837`	`-0.0158602744`	`-0.0000000093`
`0.0344656035`	`0.0344656147`	`-0.0000000112`
`0.160775661`	`0.160775676`	`-0.000000015`
`0.169217348`	`0.169217363`	`-0.000000015`
`0.122788094`	`0.122788109`	`-0.000000015`
`0.17765902`	`0.177659035`	`-0.000000015`

Full Baseline

#@ TextLoader{
#@   sep=tab
#@   col=lpnorm:R4:0-10
#@   col=gcnorm:R4:11-21
#@   col=whitened:R4:22-32
#@ }
-0.686319232	0.192169383	-0.152238086	0.03493989	0.346903175	0.09483684	-0.132272437	-0.124785319	-0.5315855	-0.0973325446	0.114802495	-0.626524031	0.289601743	-0.0695612058	0.125636056	0.4509648	0.188099176	-0.0487401523	-0.04093227	-0.465160966	-0.0123033375	0.208920211	-2.604605	0.829638362	-0.5992434	0.19860521	1.33247662	0.369197041	-0.5760094	-0.5490271	-1.94509208	-0.393351972	0.507488966
-0.20306389	-0.1231699	-0.039946992	0.183090389	-0.3328916	0.279628932	-0.0066578323	0.432759076	-0.0798939839	-0.1664458	-0.7057302	-0.137441739	-0.055349838	0.0301625486	0.259335726	-0.270841062	0.3585301	0.0643675	0.5158729	-0.0108833946	-0.09981628	-0.653936446	-0.5923902	-0.324390084	-0.114805378	0.6855182	-1.055579	0.8767955	-0.0392023772	1.21807373	-0.160801888	-0.47570774	-2.22817
-0.268398017	-0.28734377	0.571529865	0.006315247	-0.246294647	-0.445224941	-0.344181	-0.20524554	0.284186125	-0.116832078	-0.06946772	-0.176903129	-0.19703348	0.715542555	0.114987023	-0.153417692	-0.3647864	-0.257424533	-0.109801926	0.410232216	-0.0158602837	0.0344656035	-0.9132714	-0.911281645	1.814283	0.07471426	-0.8969923	-1.44387519	-1.19571114	-0.6542767	0.887983143	-0.4604767	-0.17543222
0.117021732	0.438831449	-0.100304335	0.125380427	-0.413755417	0.0794076	0.133739114	-0.397038	-0.497342378	-0.2632989	0.313451052	0.160775661	0.485780418	-0.0587080531	0.169217348	-0.3752711	0.122788094	0.17765902	-0.358387738	-0.459687948	-0.223320842	0.3591552	0.236966148	1.004758	-0.233154371	0.3862052	-1.02724624	0.240614042	0.299898773	-1.03102541	-1.13852251	-0.6675951	0.766793966

Full Diff:

#@ TextLoader{
#@   sep=tab
#@   col=lpnorm:R4:0-10
#@   col=gcnorm:R4:11-21
#@   col=whitened:R4:22-32
#@ }
-0.686319232	0.192169383	-0.152238086	0.03493989	0.346903175	0.09483684	-0.132272437	-0.124785319	-0.5315855	-0.0973325446	0.114802495	-0.626524031	0.289601743	-0.0695612058	0.125636056	0.4509648	0.188099176	-0.0487401523	-0.04093227	-0.465160966	-0.0123033375	0.208920211	-2.604605	0.829638362	-0.5992434	0.19860521	1.33247662	0.369197041	-0.5760094	-0.5490271	-1.94509208	-0.393351972	0.507488966
-0.20306389	-0.1231699	-0.039946992	0.183090389	-0.3328916	0.279628932	-0.0066578323	0.432759076	-0.0798939839	-0.1664458	-0.7057302	-0.137441739	-0.055349838	0.0301625486	0.259335726	-0.270841062	0.3585301	0.0643675	0.5158729	-0.0108833946	-0.09981628	-0.653936446	-0.5923902	-0.324390084	-0.114805378	0.6855182	-1.055579	0.8767955	-0.0392023772	1.21807373	-0.160801888	-0.47570774	-2.22817
-0.268398017	-0.28734377	0.571529865	0.006315247	-0.246294647	-0.445224941	-0.344181	-0.20524554	0.284186125	-0.116832078	-0.06946772	-0.1769031	-0.19703348	0.715542555	0.114987031	-0.153417677	-0.3647864	-0.257424533	-0.109801918	0.410232216	-0.0158602744	0.0344656147	-0.9132714	-0.911281645	1.814283	0.07471426	-0.8969923	-1.44387519	-1.19571114	-0.6542767	0.887983143	-0.4604767	-0.17543222
0.117021732	0.438831449	-0.100304335	0.125380427	-0.413755417	0.0794076	0.133739114	-0.397038	-0.497342378	-0.2632989	0.313451052	0.160775676	0.485780418	-0.0587080531	0.169217363	-0.3752711	0.122788109	0.177659035	-0.358387738	-0.459687948	-0.223320842	0.3591552	0.236966148	1.004758	-0.233154371	0.3862052	-1.02724624	0.240614042	0.299898773	-1.03102541	-1.13852251	-0.6675951	0.766793966

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

src/Native/CpuMathNative/Sse.cpp

danmoseley · 2018-10-05T23:15:01Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

-                    result128 = Sse.AddScalar(result128, Sse.LoadScalarVector128(pSrcCurrent));
-                    pSrcCurrent++;
+                    // Handle any trailing elements that don't fit into a 128-bit block by moving back so that the next
+                    // unaligned load will read to the end of the array and then mask out any elements already processed


should this be "next aligned load"

No, we are moving back from an aligned address to an unaligned one.

tannergooding · 2018-10-23T19:19:39Z

CC. @eerhardt, @Anipik for review.

danmoseley · 2018-10-23T21:21:08Z

test/Microsoft.ML.TestFramework/DataPipe/TestDataPipeBase.cs

        {
            // bitwise comparison is needed because Abs(Inf-Inf) and Abs(NaN-NaN) are not 0s.
            return FloatUtils.GetBits(x) == FloatUtils.GetBits(y) || Math.Abs(x - y) < DoubleEps;
        }

+        private const float SingleEps = 1e-6f;
+
+        private static bool EqualWithEpsSingle(float x, float y)


Nit, not a new issue, it would be nice if the code consistently used all C# or all .NET names for built-in types. float, Double etc..

Right. I am following the .NET framework guidelines for names here. We would ideally fixup the rest of the names to be the same (as has already been done in most of the public surface area).

danmoseley · 2018-10-23T21:28:46Z

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

@@ -1061,29 +1061,123 @@ public static unsafe void MulElementWiseU(ReadOnlySpan<float> src1, ReadOnlySpan
            }
        }

-        public static unsafe float SumU(ReadOnlySpan<float> src)
+        public static unsafe float Sum(ReadOnlySpan<float> src)


Assuming you're running on a machine supporting AVX -- unit tests would not hit this -- unless you ran them with the env variable set?

I assume not since @fiigii change didn't go in yet.

We have unit/perf tests that explicitly call these methods/code-paths

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

eerhardt

Anipik

LGTM

tannergooding · 2018-10-24T21:20:15Z

Rebased to resolve conflicts.

tannergooding commented Oct 2, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs Outdated Show resolved Hide resolved

eerhardt reviewed Oct 2, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/SseIntrinsics.cs Outdated Show resolved Hide resolved

eerhardt reviewed Oct 2, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs Outdated Show resolved Hide resolved

eerhardt reviewed Oct 2, 2018

View reviewed changes

src/Native/CpuMathNative/Sse.cpp Show resolved Hide resolved

danmoseley requested a review from Anipik October 5, 2018 23:06

danmoseley reviewed Oct 5, 2018

View reviewed changes

danmoseley reviewed Oct 23, 2018

View reviewed changes

danmoseley approved these changes Oct 23, 2018

View reviewed changes

Anipik reviewed Oct 23, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/SseIntrinsics.cs Outdated Show resolved Hide resolved

Anipik reviewed Oct 23, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs Outdated Show resolved Hide resolved

Anipik reviewed Oct 23, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs Outdated Show resolved Hide resolved

eerhardt approved these changes Oct 24, 2018

View reviewed changes

Anipik approved these changes Oct 24, 2018

View reviewed changes

Improvements to the "Sum" SIMD algorithm

33d2fec

tannergooding merged commit 76d1203 into dotnet:master Oct 25, 2018

ghost locked as resolved and limited conversation to collaborators Mar 28, 2022

Improvements to the "Sum" SIMD algorithm #1112

Improvements to the "Sum" SIMD algorithm #1112

Uh oh!

Conversation

tannergooding commented Oct 2, 2018

Uh oh!

tannergooding commented Oct 2, 2018

Uh oh!

tannergooding commented Oct 2, 2018

Uh oh!

Uh oh!

tannergooding commented Oct 2, 2018

Uh oh!

tannergooding commented Oct 2, 2018

Uh oh!

tannergooding commented Oct 2, 2018

Uh oh!

tannergooding commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danmoseley Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 6, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Oct 23, 2018

Uh oh!

danmoseley Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

danmoseley Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

danmoseley Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Anipik left a comment

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Oct 24, 2018

Uh oh!

Uh oh!

tannergooding commented Oct 2, 2018 •

edited

Loading