swap axis for optimization in Tensor3dCopy() #1

freewym · 2015-10-24T20:12:34Z

No description provided.

danpovey · 2015-10-24T22:06:26Z

Thanks-- could you please test if there is any effect on speed, using that
command line I showed you in the log? If the starting .mdl doesn't exist,
you can just take the 100.mdl or some model that exists.
remember to qlogin first.
Dan

On Sat, Oct 24, 2015 at 4:12 PM, Yiming Wang [email protected]
wrote:

You can view, comment on, or merge this pull request online at:

#1
Commit Summary

swap axis for optimization in Tensor3dCopy()

File Changes

M src/ctc/cctc-tombstone.cc
https://github.com/danpovey/kaldi/pull/1/files#diff-0 (37)

Patch Links:

https://github.com/danpovey/kaldi/pull/1.patch

https://github.com/danpovey/kaldi/pull/1.diff

—
Reply to this email directly or view it on GitHub
#1.

danpovey · 2015-10-24T22:06:48Z

... also the function is getting a little long-- it might be better to
declare a separate function to do the rearrangement, and call it.

On Sat, Oct 24, 2015 at 6:06 PM, Daniel Povey [email protected] wrote:

Thanks-- could you please test if there is any effect on speed, using that
command line I showed you in the log? If the starting .mdl doesn't exist,
you can just take the 100.mdl or some model that exists.
remember to qlogin first.
Dan

On Sat, Oct 24, 2015 at 4:12 PM, Yiming Wang [email protected]
wrote:

You can view, comment on, or merge this pull request online at:

#1
Commit Summary

swap axis for optimization in Tensor3dCopy()

File Changes

M src/ctc/cctc-tombstone.cc
https://github.com/danpovey/kaldi/pull/1/files#diff-0 (37)

Patch Links:

https://github.com/danpovey/kaldi/pull/1.patch

https://github.com/danpovey/kaldi/pull/1.diff

—
Reply to this email directly or view it on GitHub
#1.

freewym · 2015-10-25T04:19:32Z

Pushed the new commit. The speedup on Tensor3dCopy seems not significant (8.79502s vs 8.62674s as shown below). The first 3 significant digits of these two time intervals keep the same over multiple runs (8.79 and 8.62 respectively). By printing out the more info, it appears that every time when the swap happens, there is only one ystride (src or dst, but not both) being 1, and x is always swapped with y.

Before optimization:
[cudevice profile]
SymAddMat2 0.484567s
CuMatrix::Resize 0.486771s
MulElements 0.48813s
ApplyLogSoftMaxPerRow 0.552064s
AddMat 0.688837s
CuMatrix::SetZero 0.866473s
AddDiagMatMat 1.04785s
CuMatrixBase::CopyFromMat(from other CuMatrixBase) 1.21041s
AddRows 1.56373s
ApplyHeaviside 2.19864s
CopyRows 4.46078s
Tensor3dCopy 8.79502s
AlphaGeneralFrame 17.1941s
BetaGeneralFrame 18.1611s
AddMatMat 55.2607s
Total GPU time: 116.411s (may involve some double-counting)

After optimization:
[cudevice profile]
SymAddMat2 0.476551s
MulElements 0.489167s
CuMatrix::Resize 0.491286s
ApplyLogSoftMaxPerRow 0.553362s
AddMat 0.687893s
CuMatrix::SetZero 0.86763s
AddDiagMatMat 1.04372s
CuMatrixBase::CopyFromMat(from other CuMatrixBase) 1.21509s
AddRows 1.56267s
ApplyHeaviside 2.20029s
CopyRows 4.46238s
Tensor3dCopy 8.62674s
AlphaGeneralFrame 17.2041s
BetaGeneralFrame 18.174s
AddMatMat 55.1232s
Total GPU time: 116.099s (may involve some double-counting)

danpovey · 2015-10-25T04:31:36Z

OK, thanks- I'll look at it and maybe merge it to-morrow.
Meanwhile, go through the tutorial at OpenFst.org. you may learn enough to
be able to do one of the other recent 'issues' I put on github.
Dan

On Sun, Oct 25, 2015 at 12:19 AM, Yiming Wang [email protected]
wrote:

Pushed the new commit. The speedup on Tensor3dCopy seems not significant
(8.79502s vs 8.62674s as shown below). The first 3 significant digits of
these two time intervals keep the same over multiple runs (8.79 and 8.62
respectively). By printing out the more info, it appears that every time
when the swap happens, there is only one ystride (src or dst, but not both)
being 1, and x is always swapped with y.

Before optimization:
[cudevice profile]
SymAddMat2 0.484567s
CuMatrix::Resize 0.486771s
MulElements 0.48813s
ApplyLogSoftMaxPerRow 0.552064s
AddMat 0.688837s
CuMatrix::SetZero 0.866473s
AddDiagMatMat 1.04785s
CuMatrixBase::CopyFromMat(from other CuMatrixBase) 1.21041s
AddRows 1.56373s
ApplyHeaviside 2.19864s
CopyRows 4.46078s
Tensor3dCopy 8.79502s
AlphaGeneralFrame 17.1941s
BetaGeneralFrame 18.1611s
AddMatMat 55.2607s
Total GPU time: 116.411s (may involve some double-counting)

After optimization:
[cudevice profile]
SymAddMat2 0.476551s
MulElements 0.489167s
CuMatrix::Resize 0.491286s
ApplyLogSoftMaxPerRow 0.553362s
AddMat 0.687893s
CuMatrix::SetZero 0.86763s
AddDiagMatMat 1.04372s
CuMatrixBase::CopyFromMat(from other CuMatrixBase) 1.21509s
AddRows 1.56267s
ApplyHeaviside 2.20029s
CopyRows 4.46238s
Tensor3dCopy 8.62674s
AlphaGeneralFrame 17.2041s
BetaGeneralFrame 18.174s
AddMatMat 55.1232s
Total GPU time: 116.099s (may involve some double-counting)

—
Reply to this email directly or view it on GitHub
#1 (comment).

danpovey · 2015-10-26T20:28:53Z

src/ctc/cctc-tombstone.cc

@@ -24,11 +24,45 @@
 namespace kaldi {
 namespace ctc {

+void SwapDimsForX(int32& xdim, int32& ydim, int32& zdim,


sorry- it's against the Google style guide to use non-const references in function parameters. These should be pointers.

Declared static. and move the comments from header file to .cc file.

danpovey · 2015-10-26T21:29:18Z

src/ctc/cctc-tombstone.cc

@@ -24,11 +24,45 @@
 namespace kaldi {
 namespace ctc {



There should be a comment here briefly explaining what the does; and since this function is not exported, it's good practice to declare it 'static'.

swap axis for optimization in Tensor3dCopy()

Xvector

Xvector for trunk

Add ivector support to online nnet3 decoder

Update run_lstm.sh

from master

Wrap fstext/deterministic-fst.h

freewym force-pushed the opt branch from 2bb74b1 to 9bb3382 Compare October 25, 2015 03:43

danpovey reviewed Oct 26, 2015
View reviewed changes

freewym force-pushed the opt branch from 9bb3382 to 6a0d643 Compare October 26, 2015 20:48

danpovey reviewed Oct 26, 2015
View reviewed changes

swap axis for optimization in Tensor3dCopy()

c831186

freewym force-pushed the opt branch from 6a0d643 to c831186 Compare October 26, 2015 22:19

danpovey added a commit that referenced this pull request Oct 26, 2015

Merge pull request #1 from freewym/opt

49f91ab

swap axis for optimization in Tensor3dCopy()

danpovey merged commit 49f91ab into danpovey:tombstone Oct 26, 2015

danpovey pushed a commit that referenced this pull request Feb 23, 2016

Merge pull request #1 from david-ryan-snyder/xvector

33206a0

Xvector

danpovey pushed a commit that referenced this pull request Feb 23, 2016

Merge pull request #1 from pegahgh/xvector-for-trunk

1f943b0

Xvector for trunk

danpovey pushed a commit that referenced this pull request May 2, 2016

Merge pull request #1 from jcsilva/ai-master

bb500d0

Add ivector support to online nnet3 decoder

danpovey pushed a commit that referenced this pull request May 2, 2016

Merge pull request #1 from ialmajai/ialmajai-patch-1

31cd3cd

Update run_lstm.sh

danpovey pushed a commit that referenced this pull request Nov 7, 2019

Merge pull request #1 from kaldi-asr/master

240f0e4

from master

csukuangfj pushed a commit to csukuangfj/kaldi that referenced this pull request Jan 4, 2021

Merge pull request danpovey#1 from csukuangfj/fangjun-deterministic-fst

4822927

Wrap fstext/deterministic-fst.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

swap axis for optimization in Tensor3dCopy() #1

swap axis for optimization in Tensor3dCopy() #1

Uh oh!

freewym commented Oct 24, 2015

Uh oh!

danpovey commented Oct 24, 2015

Uh oh!

danpovey commented Oct 24, 2015

Uh oh!

freewym commented Oct 25, 2015

Uh oh!

danpovey commented Oct 25, 2015

Uh oh!

danpovey Oct 26, 2015

Uh oh!

freewym Oct 26, 2015

Uh oh!

freewym Oct 26, 2015

Uh oh!

danpovey Oct 26, 2015

Uh oh!

Uh oh!

swap axis for optimization in Tensor3dCopy() #1

swap axis for optimization in Tensor3dCopy() #1

Uh oh!

Conversation

freewym commented Oct 24, 2015

Uh oh!

danpovey commented Oct 24, 2015

Uh oh!

danpovey commented Oct 24, 2015

Uh oh!

freewym commented Oct 25, 2015

Uh oh!

danpovey commented Oct 25, 2015

Uh oh!

danpovey Oct 26, 2015

Choose a reason for hiding this comment

Uh oh!

freewym Oct 26, 2015

Choose a reason for hiding this comment

Uh oh!

freewym Oct 26, 2015

Choose a reason for hiding this comment

Uh oh!

danpovey Oct 26, 2015

Choose a reason for hiding this comment

Uh oh!

Uh oh!