-
Notifications
You must be signed in to change notification settings - Fork 2.4k
SSE optimizations for 5x5 convolution. #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does anyone have any opinions on this? We also have a fast-path for 3x3 convolutions. |
no opinion. it looks good! |
Great, let's add the 3x3 kernels in then. |
More or less same comment than Soumith': looks great, would love to have more like those. The simd directory is indeed a good idea. The only question is do we want this in the core, or as a package (like simd ;))? |
I think it's great too! Some tests would be good. And can we hook this up to Lua somehow? I don't understand why the code is in the "generic" folder though if it's only for float. |
@dominikgrewe it looks like both Float and Double. |
or extendable to be so. because the sse instructions are macro-templated as well. |
The macros are Float only actually, that's right. I guess we put them in generic because they're loaded by THTensorConv, and we prob want to have a double version at some point. Also we only did 3x3 and 5x5 because it's the only two kernels we use for everything. |
I'll give the final merge call on this to @andresy , this will establish our directory and file structure for pushing SIMD optimizations going forward, so needs a bit of thought. |
SSE optimizations for 5x5 convolution.
these aren't hooked into the lua side yet seems like it... coming in a later PR? |
SSE optimizations for 5x5 convolution.
This is a pretty specific optimization to Twitter's usage of 5x5 but it could be extended to support more sizes in the future.