Skip to content

Add AVX and FMA intrinsics in Factorization Machine #3785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

pkumar07
Copy link

Added AVX and FMA C++ intrinsics in factorizationmachinenative.dll which currently implements C++ SSE code as suggested in #3000.

@dnfclas
Copy link

dnfclas commented May 29, 2019

CLA assistant check
All CLA requirements met.

@@ -0,0 +1,125 @@
using System.Runtime.CompilerServices;
Copy link
Member

@wschin wschin May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license. And why do we need this file?

@@ -30,11 +30,11 @@ private static bool Compat(AlignedArray a)
}

[DllImport(NativePath), SuppressUnmanagedCodeSecurity]
public static extern void CalculateIntermediateVariablesNative(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices,
public static extern void CalculateIntermediateVariablesNativeSSE(int fieldCount, int latentDim, int count, int* /*const*/ fieldIndices, int* /*const*/ featureIndices,
Copy link
Member

@wschin wschin May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalculateIntermediateVariablesNativeSSE ---> CalculateIntermediateVariablesSse

float* /*const*/ featureValues, float* /*const*/ linearWeights, float* /*const*/ latentWeights, float* latentSum, float* response);

[DllImport(NativePath), SuppressUnmanagedCodeSecurity]
public static extern void CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight,
public static extern void CalculateGradientAndUpdateNativeSSE(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight,
Copy link
Member

@wschin wschin May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalculateGradientAndUpdateNativeSSE [](start = 34, length = 35)

CalculateGradientAndUpdateNativeSSE ---> CalculateGradientAndUpdateSse

The same comment are applicable to other similar places.

float * phv = latentAccumulatedSquaredGrads;

const __m256 _wei = _mm256_set1_ps(weight);
const __m256 _s = _mm256_set1_ps(slope);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our memory alignment meet AVX's requirement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I will take a look at the memory alignment for AVX/FMA.

@@ -8,10 +8,10 @@
#include <limits>
#include <pmmintrin.h>

// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part.
// Compute the output value of the field-aware factorization, as the sum of the linear part and the latent part.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 1 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.

Please add the same line to SSE and FMA counterparts.

// The /*const*/ comment on the parameters of the function means that their values should not get altered by this function.
EXPORT_API(void) CalculateGradientAndUpdateNative(float lambdaLinear, float lambdaLatent, float learningRate, int fieldCount, int latentDim, float weight, int count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the difficulty of reading AVX code, please add a comment:
// This function implements Algorithm 2 in https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
.

Please add the same line to SSE and FMA counterparts.

if (fprime != f)
_g = _mm256_fmadd_ps(_sx, _q, _g);
else
_g = _mm256_fmadd_ps(_sx, _mm256_sub_ps(_q, _mm256_mul_ps(_v, _x)), _g);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can avoid this potential branch. As least, I failed a long time ago. :)

@@ -7,6 +7,7 @@
<SystemCollectionsImmutableVersion>1.5.0</SystemCollectionsImmutableVersion>
<SystemMemoryVersion>4.5.1</SystemMemoryVersion>
<SystemReflectionEmitLightweightPackageVersion>4.3.0</SystemReflectionEmitLightweightPackageVersion>
<SystemRuntimeCompilerServices>4.5.2</SystemRuntimeCompilerServices>
<SystemThreadingTasksDataflowPackageVersion>4.8.0</SystemThreadingTasksDataflowPackageVersion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need this line? Could you add a comment to add some details?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wei-Sheng! Thanks for reviewing. It’s not needed for now and I will remove it.

@@ -9,6 +9,7 @@

namespace Microsoft.ML.Internal.CpuMath
{
[BestFriend]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BestFriend [](start = 5, length = 10)

Nice!

@@ -1,7 +1,8 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework>
Copy link
Member

@wschin wschin May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt, could you please take a look? I am not quite familiar with the build settings. Thank you!

@wschin wschin requested a review from eerhardt May 30, 2019 17:17
@@ -9,6 +9,7 @@

namespace Microsoft.ML.Internal.CpuMath
{
[BestFriend]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Eric! Access to CpuMathUtils from FactorizationMachineInterface gives an error when using netcoreapp3.0. Issue #3654

@@ -1,12 +1,16 @@
project (FactorizationMachineNative)

set(SOURCES
FactorizationMachineCore.cpp
FactorizationMachineCoreSSE.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we compiling all of source files? Are all of them cross-platform?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are compiling all the source files. Not all of them are cross-platform. Avx.isSupported (and similar) dispatches the code as needed.

#include <limits>
#include <immintrin.h>
//check loads
EXPORT_API(void) CalculateIntermediateVariablesNativeAVX(int fieldCount, int latentDim, int count, _In_ int * fieldIndices, _In_ int * featureIndices, _In_ float * featureValues,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why write these functions in C++ when they are only called from netcoreapp3.0? Why not write them in C# like the rest of our intrinsics in netcoreapp3.0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point. I will try implementing the same in C# and will close this PR for now.

@@ -1,7 +1,8 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework>
<TargetFrameworks Condition="'$(UseIntrinsics)' == 'true'">netstandard2.0;netcoreapp3.0</TargetFrameworks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NuGet package is not going to work the way this is currently written. See #534 for more information. Basically, once you introduce a single assembly in lib\netcoreapp3.0 of the NuGet package, ALL the assemblies need to be in the lib\netcoreapp3.0 folder of the package.

It would probably be easiest if the new AVX/FMA were put in the CpuMath assembly/package instead.

@pkumar07 pkumar07 closed this May 30, 2019
@wschin
Copy link
Member

wschin commented May 31, 2019

@pkumar07, why did you close it?

@pkumar07
Copy link
Author

@wschin
As @eerhardt mentioned that these native functions are only being called from netcoreapp3.0, it would be better to write them in C#. I can try implementing these functions in C#. What do you suggest?

@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants