SSE optimizations for 5x5 convolution. #241

zakattacktwitter · 2015-05-22T18:49:06Z

This is a pretty specific optimization to Twitter's usage of 5x5 but it could be extended to support more sizes in the future.

zakattacktwitter · 2015-05-28T14:56:52Z

Does anyone have any opinions on this? We also have a fast-path for 3x3 convolutions.

soumith · 2015-05-28T15:28:00Z

no opinion. it looks good!
Would love to have 3x3 as well.
Thanks for putting it in the generic/simd folder. That makes it clear (unlike THVector.h).

clementfarabet · 2015-05-28T15:28:51Z

Great, let's add the 3x3 kernels in then.

andresy · 2015-05-28T15:33:38Z

More or less same comment than Soumith': looks great, would love to have more like those. The simd directory is indeed a good idea. The only question is do we want this in the core, or as a package (like simd ;))?

dominikgrewe · 2015-05-28T15:42:16Z

I think it's great too! Some tests would be good. And can we hook this up to Lua somehow?

I don't understand why the code is in the "generic" folder though if it's only for float.

soumith · 2015-05-28T15:42:58Z

@dominikgrewe it looks like both Float and Double.

soumith · 2015-05-28T15:43:53Z

or extendable to be so. because the sse instructions are macro-templated as well.

clementfarabet · 2015-05-28T16:12:07Z

The macros are Float only actually, that's right. I guess we put them in generic because they're loaded by THTensorConv, and we prob want to have a double version at some point.

Also we only did 3x3 and 5x5 because it's the only two kernels we use for everything.

soumith · 2015-06-14T02:32:08Z

I'll give the final merge call on this to @andresy , this will establish our directory and file structure for pushing SIMD optimizations going forward, so needs a bit of thought.

SSE optimizations for 5x5 convolution.

soumith · 2015-07-21T19:17:21Z

these aren't hooked into the lua side yet seems like it... coming in a later PR?

SSE optimizations for 5x5 convolution.

SSE optimizations for 5x5 convolution.

da9e4fd

soumith added a commit that referenced this pull request Jul 21, 2015

Merge pull request #241 from zakattacktwitter/ztaylor/simd_5x5

fdb3478

SSE optimizations for 5x5 convolution.

soumith merged commit fdb3478 into torch:master Jul 21, 2015

soumith mentioned this pull request Aug 12, 2015

fixing avx checks #330

Merged

colesbury pushed a commit to colesbury/torch7 that referenced this pull request Nov 3, 2016

Merge pull request torch#241 from zakattacktwitter/ztaylor/simd_5x5

4dc3aea

SSE optimizations for 5x5 convolution.

cpuhrsch mentioned this pull request Sep 26, 2018

Remove SSE-only code and convolve5x5 pytorch/pytorch#12109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSE optimizations for 5x5 convolution. #241

SSE optimizations for 5x5 convolution. #241

Uh oh!

zakattacktwitter commented May 22, 2015

Uh oh!

zakattacktwitter commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

clementfarabet commented May 28, 2015

Uh oh!

andresy commented May 28, 2015

Uh oh!

dominikgrewe commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

clementfarabet commented May 28, 2015

Uh oh!

soumith commented Jun 14, 2015

Uh oh!

soumith commented Jul 21, 2015

Uh oh!

Uh oh!

SSE optimizations for 5x5 convolution. #241

SSE optimizations for 5x5 convolution. #241

Uh oh!

Conversation

zakattacktwitter commented May 22, 2015

Uh oh!

zakattacktwitter commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

clementfarabet commented May 28, 2015

Uh oh!

andresy commented May 28, 2015

Uh oh!

dominikgrewe commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

soumith commented May 28, 2015

Uh oh!

clementfarabet commented May 28, 2015

Uh oh!

soumith commented Jun 14, 2015

Uh oh!

soumith commented Jul 21, 2015

Uh oh!

Uh oh!