Xvector Egs, Etc #8

david-ryan-snyder · 2016-02-17T22:06:05Z

For nnet3-xvector-get-egs I'm assuming that we don't need to worry about left or right context as we do in other binaries.

Also included in this pull request are a few improvements to the xvector objf/deriv code, such as fixing numerical overflow issues, typos, and mistakes in the comments.

…able names.

… xvector training

danpovey · 2016-02-17T22:20:49Z

Thanks!
Merging.
Please try to finish the get-egs script- including shuffling of egs.
Add the --max-jobs-run $nj option to the $cmd when shuffling to avoid overwhelming the disk (it will have $num_train_archives jobs).
For extracting the examples for the training subset and validation, you'll have to add an extra option to the python script to make the chunk-sizes deterministic rather than random (note: there will typically be one job, and --num-archives=3). The left and right chunk sizes will be identical, and they will range from min-chunk-size to max-chunk-size in a geometric pattern as you go from the first to the last archive.
For the 'archive-chunk-sizes' file you may have to add some kind of way of specifying a filename suffix or pattern so that we can get separate versions of that file for the training-subset and validation-subset egs.

Dan

david-ryan-snyder · 2016-02-17T22:25:49Z

Please try to finish the get-egs script- including shuffling of egs.

Will do.

For the 'archive-chunk-sizes' file you may have to add some kind of way of specifying a filename suffix or pattern so that we can get separate versions of that file for the training-subset and validation-subset egs.

If I understand this correctly, we plan on using the same utterances in train and validation (but of course, different cuts)? Edit: Nevermind, I misread that.

danpovey · 2016-02-17T22:48:18Z

No, for validation the utterances are a held-out set-- see the get_egs.sh
script, it creates that subset.
You'd call that python script two times more- once for training-subset and
once for validation. And they use a different, smaller number of
frames-per-archive- again, that's drafted in the script.

Dan

On Wed, Feb 17, 2016 at 5:25 PM, david-ryan-snyder <[email protected]

wrote:

Please try to finish the get-egs script- including shuffling of egs.

Will do.

For the 'archive-chunk-sizes' file you may have to add some kind of way of
specifying a filename suffix or pattern so that we can get separate
versions of that file for the training-subset and validation-subset egs.

If I understand this correctly, we plan on using the same utterances in
train and validation (but of course, different cuts)?

—
Reply to this email directly or view it on GitHub
#8 (comment).

Xvector Egs, Etc

merge

david-ryan-snyder and others added 5 commits February 15, 2016 18:31

xvector: Fixing some potential numerical overflow issues. Fixing vari…

513c1c6

…able names.

xvector: Adding binary nnet3-xvector-get-egs for getting examples for…

6b6a76a

… xvector training

xvector: Fixing a few typos

2b22bcb

xvector: improving usage message for nnet3-xvector-get-egs

c69f7ed

xvector: removing unnecessary includes

66eb517

danpovey added a commit that referenced this pull request Feb 18, 2016

Merge pull request #8 from david-ryan-snyder/chain

cb4635c

Xvector Egs, Etc

danpovey merged commit cb4635c into danpovey:xvector Feb 18, 2016

danpovey pushed a commit that referenced this pull request Nov 7, 2019

Merge pull request #8 from kaldi-asr/master

6d5e966

merge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Xvector Egs, Etc #8

Xvector Egs, Etc #8

Uh oh!

david-ryan-snyder commented Feb 17, 2016

Uh oh!

danpovey commented Feb 17, 2016

Uh oh!

david-ryan-snyder commented Feb 17, 2016

Uh oh!

danpovey commented Feb 17, 2016

Uh oh!

Uh oh!

Xvector Egs, Etc #8

Xvector Egs, Etc #8

Uh oh!

Conversation

david-ryan-snyder commented Feb 17, 2016

Uh oh!

danpovey commented Feb 17, 2016

Uh oh!

david-ryan-snyder commented Feb 17, 2016

Uh oh!

danpovey commented Feb 17, 2016

Uh oh!

Uh oh!