Skip to content

SSE optimizations for 5x5 convolution. #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2015

Conversation

zakattacktwitter
Copy link
Contributor

This is a pretty specific optimization to Twitter's usage of 5x5 but it could be extended to support more sizes in the future.

@zakattacktwitter
Copy link
Contributor Author

Does anyone have any opinions on this? We also have a fast-path for 3x3 convolutions.

@soumith
Copy link
Member

soumith commented May 28, 2015

no opinion. it looks good!
Would love to have 3x3 as well.
Thanks for putting it in the generic/simd folder. That makes it clear (unlike THVector.h).

@clementfarabet
Copy link
Member

Great, let's add the 3x3 kernels in then.

@andresy
Copy link
Member

andresy commented May 28, 2015

More or less same comment than Soumith': looks great, would love to have more like those. The simd directory is indeed a good idea. The only question is do we want this in the core, or as a package (like simd ;))?

@dominikgrewe
Copy link
Member

I think it's great too! Some tests would be good. And can we hook this up to Lua somehow?

I don't understand why the code is in the "generic" folder though if it's only for float.

@soumith
Copy link
Member

soumith commented May 28, 2015

@dominikgrewe it looks like both Float and Double.

@soumith
Copy link
Member

soumith commented May 28, 2015

or extendable to be so. because the sse instructions are macro-templated as well.

@clementfarabet
Copy link
Member

The macros are Float only actually, that's right. I guess we put them in generic because they're loaded by THTensorConv, and we prob want to have a double version at some point.

Also we only did 3x3 and 5x5 because it's the only two kernels we use for everything.

@soumith
Copy link
Member

soumith commented Jun 14, 2015

I'll give the final merge call on this to @andresy , this will establish our directory and file structure for pushing SIMD optimizations going forward, so needs a bit of thought.

soumith added a commit that referenced this pull request Jul 21, 2015
SSE optimizations for 5x5 convolution.
@soumith soumith merged commit fdb3478 into torch:master Jul 21, 2015
@soumith
Copy link
Member

soumith commented Jul 21, 2015

these aren't hooked into the lua side yet seems like it... coming in a later PR?

@soumith soumith mentioned this pull request Aug 12, 2015
colesbury pushed a commit to colesbury/torch7 that referenced this pull request Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants