Add AVX and FMA intrinsics in Factorization Machine #3785

pkumar07 · 2019-05-29T00:26:46Z

Added AVX and FMA C++ intrinsics in factorizationmachinenative.dll which currently implements C++ SSE code as suggested in #3000.

dnfclas · 2019-05-29T00:26:59Z

All CLA requirements met.

wschin · 2019-05-30T16:54:23Z

...crosoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineInterface.netcoreapp.cs

@@ -0,0 +1,125 @@
+using System.Runtime.CompilerServices;


Missing license. And why do we need this file?

wschin · 2019-05-30T16:55:00Z

...rosoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineInterface.netstandard.cs

@@ -30,11 +30,11 @@ private static bool Compat(AlignedArray a)
        }

        [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
-        public static extern void CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices,
+        public static extern void CalculateIntermediateVariablesNativeSSE(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices,


CalculateIntermediateVariablesNativeSSE ---> CalculateIntermediateVariablesSse

wschin · 2019-05-30T16:55:24Z

...rosoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineInterface.netstandard.cs

            float* /*const*/ featureValues, float* /*const*/ linearWeights, float* /*const*/ latentWeights, float* latentSum, float* response);

        [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
-        public static extern void CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight,
+        public static extern void CalculateGradientAndUpdateNativeSSE(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight,


CalculateGradientAndUpdateNativeSSE [](start = 34, length = 35)

CalculateGradientAndUpdateNativeSSE ---> CalculateGradientAndUpdateSse

The same comment are applicable to other similar places.

wschin · 2019-05-30T16:57:50Z

src/Native/FactorizationMachineNative/FactorizationMachineCoreAVX.cpp

+    float * phv = latentAccumulatedSquaredGrads;
+
+    const __m256 _wei = _mm256_set1_ps(weight);
+    const __m256 _s = _mm256_set1_ps(slope);


Does our memory alignment meet AVX's requirement?

Thanks for pointing it out. I will take a look at the memory alignment for AVX/FMA.

wschin · 2019-05-30T17:01:25Z

src/Native/FactorizationMachineNative/FactorizationMachineCoreSSE.cpp

@@ -8,10 +8,10 @@
 #include <limits>
 #include <pmmintrin.h>

-// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part. 
+// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part.


Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 1 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.

Please add the same line to SSE and FMA counterparts.

wschin · 2019-05-30T17:01:56Z

src/Native/FactorizationMachineNative/FactorizationMachineCore.cpp

 // The /*const*/ comment on the parameters of the function means that their values should not get altered by this function.
-EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count,


Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 2 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.

Please add the same line to SSE and FMA counterparts.

wschin · 2019-05-30T17:04:35Z

src/Native/FactorizationMachineNative/FactorizationMachineCoreFMA.cpp

+                if (fprime != f)
+                   _g = _mm256_fmadd_ps(_sx, _q, _g);
+                else
+                   _g = _mm256_fmadd_ps(_sx, _mm256_sub_ps(_q, _mm256_mul_ps(_v, _x)), _g);


Not sure if we can avoid this potential branch. As least, I failed a long time ago. :)

wschin · 2019-05-30T17:10:33Z

build/Dependencies.props

@@ -7,6 +7,7 @@
    <SystemCollectionsImmutableVersion>1.5.0</SystemCollectionsImmutableVersion>
    <SystemMemoryVersion>4.5.1</SystemMemoryVersion>
    <SystemReflectionEmitLightweightPackageVersion>4.3.0</SystemReflectionEmitLightweightPackageVersion>
+    <SystemRuntimeCompilerServices>4.5.2</SystemRuntimeCompilerServices>
    <SystemThreadingTasksDataflowPackageVersion>4.8.0</SystemThreadingTasksDataflowPackageVersion>


Any reason we need this line? Could you add a comment to add some details?

Hi Wei-Sheng! Thanks for reviewing. It’s not needed for now and I will remove it.

wschin · 2019-05-30T17:10:46Z

src/Microsoft.ML.CpuMath/CpuMathUtils.netcoreapp.cs

@@ -9,6 +9,7 @@

 namespace Microsoft.ML.Internal.CpuMath
 {
+    [BestFriend]


BestFriend [](start = 5, length = 10)

Nice!

wschin · 2019-05-30T17:15:22Z

src/Microsoft.ML.StandardTrainers/Microsoft.ML.StandardTrainers.csproj

@@ -1,7 +1,8 @@
 <Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
-    <TargetFramework>netstandard2.0</TargetFramework>
+    <TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework>


@eerhardt, could you please take a look? I am not quite familiar with the build settings. Thank you!

eerhardt · 2019-05-30T17:18:25Z

src/Microsoft.ML.CpuMath/CpuMathUtils.netcoreapp.cs

@@ -9,6 +9,7 @@

 namespace Microsoft.ML.Internal.CpuMath
 {
+    [BestFriend]


Why is this change necessary?

Hi Eric! Access to CpuMathUtils from FactorizationMachineInterface gives an error when using netcoreapp3.0. Issue #3654

wschin · 2019-05-30T17:19:31Z

src/Native/FactorizationMachineNative/CMakeLists.txt

@@ -1,12 +1,16 @@
 project (FactorizationMachineNative)

 set(SOURCES
-    FactorizationMachineCore.cpp
+    FactorizationMachineCoreSSE.cpp


Are we compiling all of source files? Are all of them cross-platform?

Yes, we are compiling all the source files. Not all of them are cross-platform. Avx.isSupported (and similar) dispatches the code as needed.

eerhardt · 2019-05-30T17:20:09Z

src/Native/FactorizationMachineNative/FactorizationMachineCoreAVX.cpp

+#include <limits>
+#include <immintrin.h>
+//check loads
+EXPORT_API(void) CalculateIntermediateVariablesNativeAVX(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,


Why write these functions in C++ when they are only called from netcoreapp3.0? Why not write them in C# like the rest of our intrinsics in netcoreapp3.0?

That’s a good point. I will try implementing the same in C# and will close this PR for now.

eerhardt · 2019-05-30T17:25:00Z

src/Microsoft.ML.StandardTrainers/Microsoft.ML.StandardTrainers.csproj

@@ -1,7 +1,8 @@
 <Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
-    <TargetFramework>netstandard2.0</TargetFramework>
+    <TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework>
+    <TargetFrameworks Condition="'$(UseIntrinsics)' == 'true'">netstandard2.0;netcoreapp3.0</TargetFrameworks>


The NuGet package is not going to work the way this is currently written. See #534 for more information. Basically, once you introduce a single assembly in lib\netcoreapp3.0 of the NuGet package, ALL the assemblies need to be in the lib\netcoreapp3.0 folder of the package.

It would probably be easiest if the new AVX/FMA were put in the CpuMath assembly/package instead.

wschin · 2019-05-31T17:58:46Z

@pkumar07, why did you close it?

pkumar07 · 2019-05-31T21:47:01Z

@wschin
As @eerhardt mentioned that these native functions are only being called from netcoreapp3.0, it would be better to write them in C#. I can try implementing these functions in C#. What do you suggest?

Added AVX intrinsics to Factorization Machine

3695734

MacOS fix

3235100

wschin reviewed May 30, 2019

View reviewed changes

wschin requested a review from eerhardt May 30, 2019 17:17

eerhardt reviewed May 30, 2019

View reviewed changes

wschin reviewed May 30, 2019

View reviewed changes

eerhardt reviewed May 30, 2019

View reviewed changes

pkumar07 closed this May 30, 2019

pkumar07 mentioned this pull request Jul 1, 2019

Add AVX and FMA intrinsics in Factorization Machine #3940

Merged

ghost locked as resolved and limited conversation to collaborators Mar 21, 2022

		// The /const/ comment on the parameters of the function means that their values should not get altered by this function.
		EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count,

Add AVX and FMA intrinsics in Factorization Machine #3785

Add AVX and FMA intrinsics in Factorization Machine #3785

Conversation

pkumar07 commented May 29, 2019

Uh oh!

dnfclas commented May 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wschin May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin commented May 31, 2019

Uh oh!

pkumar07 commented May 31, 2019

Uh oh!

dnfclas commented May 29, 2019 •

edited

Loading

wschin May 30, 2019 •

edited

Loading

wschin May 30, 2019 •

edited

Loading

wschin May 30, 2019 •

edited

Loading

wschin May 30, 2019 •

edited

Loading