-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add AVX and FMA intrinsics in Factorization Machine #3785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,125 @@ | |||
using System.Runtime.CompilerServices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license. And why do we need this file?
@@ -30,11 +30,11 @@ private static bool Compat(AlignedArray a) | |||
} | |||
|
|||
[DllImport(NativePath), SuppressUnmanagedCodeSecurity] | |||
public static extern void CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices, | |||
public static extern void CalculateIntermediateVariablesNativeSSE(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateIntermediateVariablesNativeSSE ---> CalculateIntermediateVariablesSse
float* /*const*/ featureValues, float* /*const*/ linearWeights, float* /*const*/ latentWeights, float* latentSum, float* response); | ||
|
||
[DllImport(NativePath), SuppressUnmanagedCodeSecurity] | ||
public static extern void CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, | ||
public static extern void CalculateGradientAndUpdateNativeSSE(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateGradientAndUpdateNativeSSE [](start = 34, length = 35)
CalculateGradientAndUpdateNativeSSE ---> CalculateGradientAndUpdateSse
The same comment are applicable to other similar places.
float * phv = latentAccumulatedSquaredGrads; | ||
|
||
const __m256 _wei = _mm256_set1_ps(weight); | ||
const __m256 _s = _mm256_set1_ps(slope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does our memory alignment meet AVX's requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. I will take a look at the memory alignment for AVX/FMA.
@@ -8,10 +8,10 @@ | |||
#include <limits> | |||
#include <pmmintrin.h> | |||
|
|||
// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part. | |||
// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 1 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.
Please add the same line to SSE and FMA counterparts.
// The /*const*/ comment on the parameters of the function means that their values should not get altered by this function. | ||
EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 2 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.
Please add the same line to SSE and FMA counterparts.
if (fprime != f) | ||
_g = _mm256_fmadd_ps(_sx, _q, _g); | ||
else | ||
_g = _mm256_fmadd_ps(_sx, _mm256_sub_ps(_q, _mm256_mul_ps(_v, _x)), _g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can avoid this potential branch. As least, I failed a long time ago. :)
@@ -7,6 +7,7 @@ | |||
<SystemCollectionsImmutableVersion>1.5.0</SystemCollectionsImmutableVersion> | |||
<SystemMemoryVersion>4.5.1</SystemMemoryVersion> | |||
<SystemReflectionEmitLightweightPackageVersion>4.3.0</SystemReflectionEmitLightweightPackageVersion> | |||
<SystemRuntimeCompilerServices>4.5.2</SystemRuntimeCompilerServices> | |||
<SystemThreadingTasksDataflowPackageVersion>4.8.0</SystemThreadingTasksDataflowPackageVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we need this line? Could you add a comment to add some details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wei-Sheng! Thanks for reviewing. It’s not needed for now and I will remove it.
@@ -9,6 +9,7 @@ | |||
|
|||
namespace Microsoft.ML.Internal.CpuMath | |||
{ | |||
[BestFriend] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BestFriend [](start = 5, length = 10)
Nice!
@@ -1,7 +1,8 @@ | |||
<Project Sdk="Microsoft.NET.Sdk"> | |||
|
|||
<PropertyGroup> | |||
<TargetFramework>netstandard2.0</TargetFramework> | |||
<TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt, could you please take a look? I am not quite familiar with the build settings. Thank you!
@@ -9,6 +9,7 @@ | |||
|
|||
namespace Microsoft.ML.Internal.CpuMath | |||
{ | |||
[BestFriend] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Eric! Access to CpuMathUtils from FactorizationMachineInterface gives an error when using netcoreapp3.0. Issue #3654
@@ -1,12 +1,16 @@ | |||
project (FactorizationMachineNative) | |||
|
|||
set(SOURCES | |||
FactorizationMachineCore.cpp | |||
FactorizationMachineCoreSSE.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we compiling all of source files? Are all of them cross-platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are compiling all the source files. Not all of them are cross-platform. Avx.isSupported (and similar) dispatches the code as needed.
#include <limits> | ||
#include <immintrin.h> | ||
//check loads | ||
EXPORT_API(void) CalculateIntermediateVariablesNativeAVX(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why write these functions in C++ when they are only called from netcoreapp3.0? Why not write them in C# like the rest of our intrinsics in netcoreapp3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a good point. I will try implementing the same in C# and will close this PR for now.
@@ -1,7 +1,8 @@ | |||
<Project Sdk="Microsoft.NET.Sdk"> | |||
|
|||
<PropertyGroup> | |||
<TargetFramework>netstandard2.0</TargetFramework> | |||
<TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework> | |||
<TargetFrameworks Condition="'$(UseIntrinsics)' == 'true'">netstandard2.0;netcoreapp3.0</TargetFrameworks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NuGet package is not going to work the way this is currently written. See #534 for more information. Basically, once you introduce a single assembly in lib\netcoreapp3.0
of the NuGet package, ALL the assemblies need to be in the lib\netcoreapp3.0
folder of the package.
It would probably be easiest if the new AVX/FMA were put in the CpuMath assembly/package instead.
@pkumar07, why did you close it? |
Added AVX and FMA C++ intrinsics in factorizationmachinenative.dll which currently implements C++ SSE code as suggested in #3000.