-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Update loops in CpuMath to be more efficient #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did you run any benchmarks to make sure it bring speed improvements? |
@@ -431,7 +431,7 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst) | |||
|
|||
Vector128<float> scalarVector128 = Sse.SetAllVector128(scalar); | |||
|
|||
if (pDstCurrent + 4 <= pDstEnd) | |||
if (pDstCurrent <= pDstEnd - 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a temp for pDstEnd - 4
is probably better, as that would better ensure that each loop isn't doing pDstEnd - 4
before the comparison
I don't believe so, both PRs have the same problem, which is that each iteration of the loop may do a substraction/addition as part of the comparison. As commented on #1177 (comment), creating a temp |
@@ -417,7 +417,7 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst) | |||
|
|||
Vector128<float> scalarVector = Sse.SetAllVector128(scalar); | |||
|
|||
while (pDstCurrent + 4 <= pDstEnd) | |||
while (pDstCurrent <= pDstEnd - 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the SSE case as well?
@@ -22,6 +22,8 @@ internal static class AvxIntrinsics | |||
|
|||
private const int Vector256Alignment = 32; | |||
|
|||
private const int destinationEnd = pDstEnd - 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is pDstEnd
defined, I wouldn't expect this could be a constant....
@tannergooding does this look OK now? Nice perf improvement @jwood803 thank you! |
@@ -417,6 +417,8 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst) | |||
{ | |||
float* pDstEnd = pdst + dst.Length; | |||
float* pDstCurrent = pdst; | |||
int destinationEnd = pDstEnd - 4; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: stray newline here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be updated to remove the extra new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This was a bad change/merge
float* pDstEnd = pdst + dst.Length;
float* pDstCurrent = pdst;
int destinationEnd = pDstEnd - 4;
The @shauheen @danmosemsft - maybe now would be a good time to get a I've opened #1495 for the build break. |
Fixes issue #835