Skip to content

Conversation

Anipik
Copy link
Contributor

@Anipik Anipik commented Oct 16, 2018

Working towards #1018

@Anipik
Copy link
Contributor Author

Anipik commented Oct 17, 2018

@tannergooding @eerhardt can you please take a look at this ?

while (ppos < pposEnd)
{
int col = *ppos;
Vector128<float> x1 = Sse.SetVector128(pm3[col], pm2[col], pm1[col], pm0[col]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: I want to check the codegen of this and ensure that it is being emitted "optimally" (two loads and three unpack with two folded loads; rather than as four loads and three unpack).

@eerhardt
Copy link
Member

#endif

It looks like this whole block can be removed as well. It is no longer used.


Refers to: src/Native/CpuMathNative/Sse.cpp:79 in db669af. [](commit_id = db669af, deletion_comment = False)

eerhardt
eerhardt previously approved these changes Oct 18, 2018
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eerhardt eerhardt dismissed their stale review October 18, 2018 17:24

Need to fix the CMake file

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Anipik
Copy link
Contributor Author

Anipik commented Oct 18, 2018

@tannergooding did u get the chance to look at the codegen for this ?

@tannergooding
Copy link
Member

No, not yet.

{
int col1 = *ppos;
int col2 = col1 + 4 * ccol;
Vector256<float> x1 = Avx.SetVector256(pm3[col2], pm2[col2], pm1[col2], pm0[col2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a helper method for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its different , the one we have the indexs are continous
return Avx.SetVector256(src[idx[7]], src[idx[6]], src[idx[5]], src[idx[4]], src[idx[3]], src[idx[2]], src[idx[1]], src[idx[0]]);

@Anipik
Copy link
Contributor Author

Anipik commented Oct 22, 2018

Before
Method Avx Sse Native
MatMulPX(S = 10%) 95.28 us 114.00 us 91.8 us
MatMulPX (S = 20%) 146.2 us 181.2 us 131.3 us
After
Method Avx Sse Native
MatMulPX(S = 10%) 96.46 us 106.3 us 92.8 us
MatMulPX (S = 20%) 140 us 171.7 us 131.1 us

As the matrix becomes more dense , the new algorithm becomes faster

cc @danmosemsft @eerhardt @tannergooding

@Anipik
Copy link
Contributor Author

Anipik commented Oct 23, 2018

@tannergooding i have resolved the conflicts and addressed the feedback

@Anipik
Copy link
Contributor Author

Anipik commented Oct 23, 2018

I have restarted the queue and new build passed successfully

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@Anipik Anipik merged commit 263a67b into dotnet:master Oct 24, 2018
@Anipik Anipik deleted the sparse2 branch October 24, 2018 21:14
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants