-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Same implementation for Sparse Multiplication for aligned and unaligned arrays #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@tannergooding @eerhardt can you please take a look at this ? |
test/Microsoft.ML.CpuMath.PerformanceTests/SsePerformanceTests.cs
Outdated
Show resolved
Hide resolved
while (ppos < pposEnd) | ||
{ | ||
int col = *ppos; | ||
Vector128<float> x1 = Sse.SetVector128(pm3[col], pm2[col], pm1[col], pm0[col]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: I want to check the codegen of this and ensure that it is being emitted "optimally" (two loads and three unpack with two folded loads; rather than as four loads and three unpack).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding did u get the chance to look at the codegen for this ? |
No, not yet. |
{ | ||
int col1 = *ppos; | ||
int col2 = col1 + 4 * ccol; | ||
Vector256<float> x1 = Avx.SetVector256(pm3[col2], pm2[col2], pm1[col2], pm0[col2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have a helper method for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no its different , the one we have the indexs are continous
return Avx.SetVector256(src[idx[7]], src[idx[6]], src[idx[5]], src[idx[4]], src[idx[3]], src[idx[2]], src[idx[1]], src[idx[0]]);
… place, span overloads and function for common code
Before
After
As the matrix becomes more dense , the new algorithm becomes faster cc @danmosemsft @eerhardt @tannergooding |
@tannergooding i have resolved the conflicts and addressed the feedback |
I have restarted the queue and new build passed successfully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Working towards #1018