Skip to content

Incremental determinization [cleaned up/rewrite] #3737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 73 commits into from
Dec 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
240f0e4
Merge pull request #1 from kaldi-asr/master
chenzhehuai Jan 26, 2018
60f2bcf
Merge pull request #2 from kaldi-asr/master
chenzhehuai Feb 6, 2018
c7eb4c5
Merge pull request #4 from kaldi-asr/master
chenzhehuai Mar 14, 2018
25706e7
Merge pull request #5 from kaldi-asr/master
chenzhehuai Mar 22, 2018
ac49815
Merge pull request #6 from kaldi-asr/master
chenzhehuai Mar 23, 2018
6d5e966
Merge pull request #8 from kaldi-asr/master
chenzhehuai Mar 29, 2018
751d8bf
Merge pull request #10 from kaldi-asr/master
chenzhehuai Apr 3, 2018
d8ff7ee
make fst templates inline to eliminate linking errors in other places
chenzhehuai Apr 3, 2018
ada7ea7
Merge pull request #17 from kaldi-asr/master
chenzhehuai Aug 3, 2018
6f366c1
Merge pull request #27 from kaldi-asr/master
chenzhehuai Mar 25, 2019
1f50f06
WIP
Mar 26, 2019
be6fba2
worse wer & ower
Mar 27, 2019
6f92369
clean code
Mar 27, 2019
8080697
this commit is for sanity check
Mar 28, 2019
b302f12
code clean
Mar 28, 2019
7c0f7d7
each time we determinize the piece of lattice, instead of going all …
Mar 29, 2019
bb4e68f
bug fix:
Mar 30, 2019
86595f9
test in libri speech
Mar 30, 2019
080e5b4
clean; without class
Mar 30, 2019
d8907a4
add class LatticeIncrementalDeterminizer
Mar 31, 2019
228d8a2
add config_.determinize_max_active & redeterminize=false
Apr 1, 2019
c350305
update best config; add re-determinization from frame 0 if AppendLatt…
Apr 1, 2019
3844389
1. add time profiling for baseline lattice-faster-decoder for compari…
Apr 2, 2019
90f3ea7
code refine
Apr 2, 2019
9111173
update final weight by extra_cost-alpha, see sheet "ver 3"
Apr 9, 2019
6af8f62
WIP
Apr 10, 2019
40cf7ff
fix bugs and add sanity check
Apr 10, 2019
c5f0a8e
enable det
Apr 10, 2019
467abd8
clean code
Apr 10, 2019
612d398
[experimental] new det algorithm (#31)
chenzhehuai Apr 29, 2019
92ce13c
adding redet frames
May 1, 2019
b4ed30c
add eps removal; 1oco
May 2, 2019
5651200
bug fix when --epsilon-removal=1 --redeterminize-max-frames=10
May 5, 2019
7401fe4
code refine
May 5, 2019
e5cef12
bug fix
May 9, 2019
ecae786
code refine
May 13, 2019
35a7abc
We need to be careful about the case where the start state of the
May 24, 2019
39d4181
code refine
May 30, 2019
8e1648d
Do the following modification. Results can be referred to sheet "ver …
May 30, 2019
9a0873e
make terms consistent with the paper
Jun 11, 2019
4448c1f
code refine according to Hainan's comments
Jun 20, 2019
0a4c9bb
add final-prune-after-determinize
Jul 2, 2019
b4416f5
more comments
Jul 3, 2019
a624b3e
bug fix
Jul 30, 2019
6438a3b
refine
Aug 3, 2019
15cdab7
add online decoder
Aug 5, 2019
9566370
refine
Sep 6, 2019
10a597a
refine
Oct 1, 2019
c8d188f
Merge branch 'incre_det' of https://github.com/chenzhehuai/kaldi into…
danpovey Nov 6, 2019
b079980
Some initial work on rewriting incremental determinization
danpovey Nov 7, 2019
a3fb8ce
Further cleanup
danpovey Nov 7, 2019
629c449
Some intermediate work on incremental-decoder rewrite
danpovey Nov 9, 2019
5f27eb9
Storing some intermediate work
danpovey Nov 9, 2019
86a6bc1
Some more cleanup, working on making it compile
danpovey Nov 11, 2019
a182fe3
Got decoder directory to compile
danpovey Nov 11, 2019
780343c
Fix some compilation issues
danpovey Nov 14, 2019
6836282
Some code simplification in incremental determinization
danpovey Nov 16, 2019
b0cd600
Merge remote-tracking branch 'upstream/master' into chenzhehuai-incre…
danpovey Nov 16, 2019
4faf9bc
Simplify interface in lattice determinization
danpovey Nov 16, 2019
c8ef5ff
Fix compilation errors in incremental decoding
danpovey Nov 16, 2019
8b6a55e
Merge remote-tracking branch 'upstream/master' into incr_det
danpovey Nov 16, 2019
3590656
Some progress on fixing runtime errors
danpovey Nov 16, 2019
075c915
Further progress towards working version
danpovey Nov 17, 2019
017e395
[src] Various fixes in lattice determinization
danpovey Nov 18, 2019
95c7771
[src] Fix to incr-det code
danpovey Nov 18, 2019
19f062c
[src] Code refactor in incr-det; fix assert in CompactLatticeShortest…
danpovey Nov 18, 2019
3ef5544
[src] incr-decoder refactor
danpovey Nov 18, 2019
aaa2484
[src] Further fix to CompactLatticeShortestPath
danpovey Nov 18, 2019
9104f58
[src] Hopefully fix issue with start state
danpovey Nov 18, 2019
6111874
[src] Refactor/cleanup incr-det code
danpovey Nov 18, 2019
66c462f
[src] Bug-fix in incremental decoder
danpovey Nov 22, 2019
c198620
Fix bug in decoder wrapper
danpovey Nov 23, 2019
cfd1e05
Merge master
danpovey Nov 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/bin/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ BINFILES = align-equal align-equal-compiled acc-tree-stats \
matrix-sum build-pfile-from-ali get-post-on-ali tree-info am-info \
vector-sum matrix-sum-rows est-pca sum-lda-accs sum-mllt-accs \
transform-vec align-text matrix-dim post-to-smat compile-graph \
compare-int-vector compute-gop
compare-int-vector latgen-incremental-mapped compute-gop


OBJFILES =
Expand Down
183 changes: 183 additions & 0 deletions src/bin/latgen-incremental-mapped.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
// bin/latgen-incremental-mapped.cc

// Copyright 2019 Zhehuai Chen

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

#include "base/kaldi-common.h"
#include "util/common-utils.h"
#include "tree/context-dep.h"
#include "hmm/transition-model.h"
#include "fstext/fstext-lib.h"
#include "decoder/decoder-wrappers.h"
#include "decoder/decodable-matrix.h"
#include "base/timer.h"

int main(int argc, char *argv[]) {
try {
using namespace kaldi;
typedef kaldi::int32 int32;
using fst::SymbolTable;
using fst::Fst;
using fst::StdArc;

const char *usage =
"Generate lattices, reading log-likelihoods as matrices\n"
" (model is needed only for the integer mappings in its transition-model)\n"
"The lattice determinization algorithm here can operate\n"
"incrementally.\n"
"Usage: latgen-incremental-mapped [options] trans-model-in "
"(fst-in|fsts-rspecifier) loglikes-rspecifier"
" lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ]\n";
ParseOptions po(usage);
Timer timer;
bool allow_partial = false;
BaseFloat acoustic_scale = 0.1;
LatticeIncrementalDecoderConfig config;

std::string word_syms_filename;
config.Register(&po);
po.Register("acoustic-scale", &acoustic_scale,
"Scaling factor for acoustic likelihoods");

po.Register("word-symbol-table", &word_syms_filename,
"Symbol table for words [for debug output]");
po.Register("allow-partial", &allow_partial,
"If true, produce output even if end state was not reached.");

po.Read(argc, argv);

if (po.NumArgs() < 4 || po.NumArgs() > 6) {
po.PrintUsage();
exit(1);
}

std::string model_in_filename = po.GetArg(1), fst_in_str = po.GetArg(2),
feature_rspecifier = po.GetArg(3), lattice_wspecifier = po.GetArg(4),
words_wspecifier = po.GetOptArg(5),
alignment_wspecifier = po.GetOptArg(6);

TransitionModel trans_model;
ReadKaldiObject(model_in_filename, &trans_model);

bool determinize = true;
CompactLatticeWriter compact_lattice_writer;
LatticeWriter lattice_writer;
if (!(determinize ? compact_lattice_writer.Open(lattice_wspecifier)
: lattice_writer.Open(lattice_wspecifier)))
KALDI_ERR << "Could not open table for writing lattices: "
<< lattice_wspecifier;

Int32VectorWriter words_writer(words_wspecifier);

Int32VectorWriter alignment_writer(alignment_wspecifier);

fst::SymbolTable *word_syms = NULL;
if (word_syms_filename != "")
if (!(word_syms = fst::SymbolTable::ReadText(word_syms_filename)))
KALDI_ERR << "Could not read symbol table from file " << word_syms_filename;

double tot_like = 0.0;
kaldi::int64 frame_count = 0;
int num_success = 0, num_fail = 0;

if (ClassifyRspecifier(fst_in_str, NULL, NULL) == kNoRspecifier) {
SequentialBaseFloatMatrixReader loglike_reader(feature_rspecifier);
// Input FST is just one FST, not a table of FSTs.
Fst<StdArc> *decode_fst = fst::ReadFstKaldiGeneric(fst_in_str);
timer.Reset();

{
LatticeIncrementalDecoder decoder(*decode_fst, trans_model, config);

for (; !loglike_reader.Done(); loglike_reader.Next()) {
std::string utt = loglike_reader.Key();
Matrix<BaseFloat> loglikes(loglike_reader.Value());
loglike_reader.FreeCurrent();
if (loglikes.NumRows() == 0) {
KALDI_WARN << "Zero-length utterance: " << utt;
num_fail++;
continue;
}

DecodableMatrixScaledMapped decodable(trans_model, loglikes,
acoustic_scale);

double like;
if (DecodeUtteranceLatticeIncremental(
decoder, decodable, trans_model, word_syms, utt, acoustic_scale,
determinize, allow_partial, &alignment_writer, &words_writer,
&compact_lattice_writer, &lattice_writer, &like)) {
tot_like += like;
frame_count += loglikes.NumRows();
num_success++;
} else {
num_fail++;
}
}
}
delete decode_fst; // delete this only after decoder goes out of scope.
} else { // We have different FSTs for different utterances.
SequentialTableReader<fst::VectorFstHolder> fst_reader(fst_in_str);
RandomAccessBaseFloatMatrixReader loglike_reader(feature_rspecifier);
for (; !fst_reader.Done(); fst_reader.Next()) {
std::string utt = fst_reader.Key();
if (!loglike_reader.HasKey(utt)) {
KALDI_WARN << "Not decoding utterance " << utt
<< " because no loglikes available.";
num_fail++;
continue;
}
const Matrix<BaseFloat> &loglikes = loglike_reader.Value(utt);
if (loglikes.NumRows() == 0) {
KALDI_WARN << "Zero-length utterance: " << utt;
num_fail++;
continue;
}
LatticeIncrementalDecoder decoder(fst_reader.Value(), trans_model, config);
DecodableMatrixScaledMapped decodable(trans_model, loglikes, acoustic_scale);
double like;
if (DecodeUtteranceLatticeIncremental(
decoder, decodable, trans_model, word_syms, utt, acoustic_scale,
determinize, allow_partial, &alignment_writer, &words_writer,
&compact_lattice_writer, &lattice_writer, &like)) {
tot_like += like;
frame_count += loglikes.NumRows();
num_success++;
} else {
num_fail++;
}
}
}

double elapsed = timer.Elapsed();
KALDI_LOG << "Time taken " << elapsed
<< "s: real-time factor assuming 100 frames/sec is "
<< (elapsed * 100.0 / frame_count);
KALDI_LOG << "Done " << num_success << " utterances, failed for " << num_fail;
KALDI_LOG << "Overall log-likelihood per frame is " << (tot_like / frame_count)
<< " over " << frame_count << " frames.";

delete word_syms;
if (num_success != 0)
return 0;
else
return 1;
} catch (const std::exception &e) {
std::cerr << e.what();
return -1;
}
}
3 changes: 2 additions & 1 deletion src/decoder/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ TESTFILES =

OBJFILES = training-graph-compiler.o lattice-simple-decoder.o lattice-faster-decoder.o \
lattice-faster-online-decoder.o simple-decoder.o faster-decoder.o \
decoder-wrappers.o grammar-fst.o decodable-matrix.o
decoder-wrappers.o grammar-fst.o decodable-matrix.o \
lattice-incremental-decoder.o lattice-incremental-online-decoder.o

LIBNAME = kaldi-decoder

Expand Down
123 changes: 120 additions & 3 deletions src/decoder/decoder-wrappers.cc
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ void DecodeUtteranceLatticeFasterClass::operator () () {
success_ = true;
using fst::VectorFst;
if (!decoder_->Decode(decodable_)) {
KALDI_WARN << "Failed to decode file " << utt_;
KALDI_WARN << "Failed to decode utterance with id " << utt_;
success_ = false;
}
if (!decoder_->ReachedFinal()) {
Expand Down Expand Up @@ -195,6 +195,92 @@ DecodeUtteranceLatticeFasterClass::~DecodeUtteranceLatticeFasterClass() {
delete decodable_;
}

template <typename FST>
bool DecodeUtteranceLatticeIncremental(
LatticeIncrementalDecoderTpl<FST> &decoder, // not const but is really an input.
DecodableInterface &decodable, // not const but is really an input.
const TransitionModel &trans_model,
const fst::SymbolTable *word_syms,
std::string utt,
double acoustic_scale,
bool determinize,
bool allow_partial,
Int32VectorWriter *alignment_writer,
Int32VectorWriter *words_writer,
CompactLatticeWriter *compact_lattice_writer,
LatticeWriter *lattice_writer,
double *like_ptr) { // puts utterance's like in like_ptr on success.
using fst::VectorFst;
if (!decoder.Decode(&decodable)) {
KALDI_WARN << "Failed to decode utterance with id " << utt;
return false;
}
if (!decoder.ReachedFinal()) {
if (allow_partial) {
KALDI_WARN << "Outputting partial output for utterance " << utt
<< " since no final-state reached\n";
} else {
KALDI_WARN << "Not producing output for utterance " << utt
<< " since no final-state reached and "
<< "--allow-partial=false.\n";
return false;
}
}

// Get lattice
CompactLattice clat = decoder.GetLattice(decoder.NumFramesDecoded(), true);
if (clat.NumStates() == 0)
KALDI_ERR << "Unexpected problem getting lattice for utterance " << utt;

double likelihood;
LatticeWeight weight;
int32 num_frames;
{ // First do some stuff with word-level traceback...
CompactLattice decoded_clat;
CompactLatticeShortestPath(clat, &decoded_clat);
Lattice decoded;
fst::ConvertLattice(decoded_clat, &decoded);

if (decoded.Start() == fst::kNoStateId)
// Shouldn't really reach this point as already checked success.
KALDI_ERR << "Failed to get traceback for utterance " << utt;

std::vector<int32> alignment;
std::vector<int32> words;
GetLinearSymbolSequence(decoded, &alignment, &words, &weight);
num_frames = alignment.size();
KALDI_ASSERT(num_frames == decoder.NumFramesDecoded());
if (words_writer->IsOpen())
words_writer->Write(utt, words);
if (alignment_writer->IsOpen())
alignment_writer->Write(utt, alignment);
if (word_syms != NULL) {
std::cerr << utt << ' ';
for (size_t i = 0; i < words.size(); i++) {
std::string s = word_syms->Find(words[i]);
if (s == "")
KALDI_ERR << "Word-id " << words[i] << " not in symbol table.";
std::cerr << s << ' ';
}
std::cerr << '\n';
}
likelihood = -(weight.Value1() + weight.Value2());
}

// We'll write the lattice without acoustic scaling.
if (acoustic_scale != 0.0)
fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &clat);
Connect(&clat);
compact_lattice_writer->Write(utt, clat);
KALDI_LOG << "Log-like per frame for utterance " << utt << " is "
<< (likelihood / num_frames) << " over "
<< num_frames << " frames.";
KALDI_VLOG(2) << "Cost for utterance " << utt << " is "
<< weight.Value1() << " + " << weight.Value2();
*like_ptr = likelihood;
return true;
}


// Takes care of output. Returns true on success.
template <typename FST>
Expand All @@ -215,7 +301,7 @@ bool DecodeUtteranceLatticeFaster(
using fst::VectorFst;

if (!decoder.Decode(&decodable)) {
KALDI_WARN << "Failed to decode file " << utt;
KALDI_WARN << "Failed to decode utterance with id " << utt;
return false;
}
if (!decoder.ReachedFinal()) {
Expand Down Expand Up @@ -296,6 +382,37 @@ bool DecodeUtteranceLatticeFaster(
}

// Instantiate the template above for the two required FST types.
template bool DecodeUtteranceLatticeIncremental(
LatticeIncrementalDecoderTpl<fst::Fst<fst::StdArc> > &decoder,
DecodableInterface &decodable,
const TransitionModel &trans_model,
const fst::SymbolTable *word_syms,
std::string utt,
double acoustic_scale,
bool determinize,
bool allow_partial,
Int32VectorWriter *alignment_writer,
Int32VectorWriter *words_writer,
CompactLatticeWriter *compact_lattice_writer,
LatticeWriter *lattice_writer,
double *like_ptr);

template bool DecodeUtteranceLatticeIncremental(
LatticeIncrementalDecoderTpl<fst::GrammarFst> &decoder,
DecodableInterface &decodable,
const TransitionModel &trans_model,
const fst::SymbolTable *word_syms,
std::string utt,
double acoustic_scale,
bool determinize,
bool allow_partial,
Int32VectorWriter *alignment_writer,
Int32VectorWriter *words_writer,
CompactLatticeWriter *compact_lattice_writer,
LatticeWriter *lattice_writer,
double *like_ptr);


template bool DecodeUtteranceLatticeFaster(
LatticeFasterDecoderTpl<fst::Fst<fst::StdArc> > &decoder,
DecodableInterface &decodable,
Expand Down Expand Up @@ -345,7 +462,7 @@ bool DecodeUtteranceLatticeSimple(
using fst::VectorFst;

if (!decoder.Decode(&decodable)) {
KALDI_WARN << "Failed to decode file " << utt;
KALDI_WARN << "Failed to decode utterance with id " << utt;
return false;
}
if (!decoder.ReachedFinal()) {
Expand Down
18 changes: 18 additions & 0 deletions src/decoder/decoder-wrappers.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

#include "itf/options-itf.h"
#include "decoder/lattice-faster-decoder.h"
#include "decoder/lattice-incremental-decoder.h"
#include "decoder/lattice-simple-decoder.h"

// This header contains declarations from various convenience functions that are called
Expand Down Expand Up @@ -88,6 +89,23 @@ void AlignUtteranceWrapper(
void ModifyGraphForCarefulAlignment(
fst::VectorFst<fst::StdArc> *fst);

/// TODO
template <typename FST>
bool DecodeUtteranceLatticeIncremental(
LatticeIncrementalDecoderTpl<FST> &decoder, // not const but is really an input.
DecodableInterface &decodable, // not const but is really an input.
const TransitionModel &trans_model,
const fst::SymbolTable *word_syms,
std::string utt,
double acoustic_scale,
bool determinize,
bool allow_partial,
Int32VectorWriter *alignments_writer,
Int32VectorWriter *words_writer,
CompactLatticeWriter *compact_lattice_writer,
LatticeWriter *lattice_writer,
double *like_ptr); // puts utterance's likelihood in like_ptr on success.


/// This function DecodeUtteranceLatticeFaster is used in several decoders, and
/// we have moved it here. Note: this is really "binary-level" code as it
Expand Down
Loading