-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improvements to the "Sum" SIMD algorithm #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a simple example of #836 |
Results in some minor perf improvements for the Before:
After:
|
This can serve as the basis for the other algorithms as well. Generally the only tweaking that needs to happen is when dealing with leading/trailing elements, where you may need additional masking/etc to get the elements lined up correctly. For example, |
Test failure is for |
Full Baseline
Full Diff:
|
result128 = Sse.AddScalar(result128, Sse.LoadScalarVector128(pSrcCurrent)); | ||
pSrcCurrent++; | ||
// Handle any trailing elements that don't fit into a 128-bit block by moving back so that the next | ||
// unaligned load will read to the end of the array and then mask out any elements already processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be "next aligned load"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we are moving back from an aligned address to an unaligned one.
{ | ||
// bitwise comparison is needed because Abs(Inf-Inf) and Abs(NaN-NaN) are not 0s. | ||
return FloatUtils.GetBits(x) == FloatUtils.GetBits(y) || Math.Abs(x - y) < DoubleEps; | ||
} | ||
|
||
private const float SingleEps = 1e-6f; | ||
|
||
private static bool EqualWithEpsSingle(float x, float y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, not a new issue, it would be nice if the code consistently used all C# or all .NET names for built-in types. float, Double etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I am following the .NET framework guidelines for names here. We would ideally fixup the rest of the names to be the same (as has already been done in most of the public surface area).
@@ -1061,29 +1061,123 @@ public static unsafe void MulElementWiseU(ReadOnlySpan<float> src1, ReadOnlySpan | |||
} | |||
} | |||
|
|||
public static unsafe float SumU(ReadOnlySpan<float> src) | |||
public static unsafe float Sum(ReadOnlySpan<float> src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you're running on a machine supporting AVX -- unit tests would not hit this -- unless you ran them with the env variable set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume not since @fiigii change didn't go in yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have unit/perf tests that explicitly call these methods/code-paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Rebased to resolve conflicts. |
Does some cleanup so that we have a single "Sum" algorithm (rather than one for aligned and one for unaligned inputs).
For inputs with fewer elements than can fit in the
Vector
type, it falls back to scalar code.For inputs that are not naturally aligned (the alignment is not a multiple of 4), it does exclusively unaligned loads
For all other inputs, it will do at most two unaligned loads (one each for any leading/trailing unaligned elements) and all other loads will be aligned.