-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[src] Incremental Lattice Determinization for Low-Latency WFST Decoder #3317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
from master
pull from master
from master
…the way to the currently-decoded frame, we go up to, say, t-10 (unless this is the end of the utterance), and the same way that we put in temporary initial-probs, we also put in temporary final-probs which reflect the on the states at frame t-10. (we remove them later on, of course).
1. in the determinized lattice, there could be multiple final arcs with the same state label. I need to change the logic here. 2. for the first chunk, there could be some final arcs starting from state 0, while for the last chunk, there could be some initial arcs ending in final state. Hence, I found that we cannot distinguish final and initial arcs by simply "if (s==0)" or "if (clat.Final(arc_appended.nextstate)!=CompactLatticeWeight::Zero()"
+ grep -H Overall exp_dec/incre.fl.1f/base/ora.base exp_dec/incre.fl.1f/base/ora.den.base exp_dec/incre.fl.1f/incre/ora.base exp_dec/incre.fl.1f/incre/ora.den.base exp_dec/incre.fl.1f/base/ora.base:LOG (lattice-oracle[5.5.276~4-6f366]:main():lattice-oracle.cc:383) Overall %WER 1.70591 [ 342 / 20048, 109 insertions, 22 deletions, 211 substitutions ] exp_dec/incre.fl.1f/base/ora.den.base:LOG (lattice-depth[5.5.276~4-6f366]:main():lattice-depth.cc:79) Overall density is 25.1613 over 244027 frames. exp_dec/incre.fl.1f/incre/ora.base:LOG (lattice-oracle[5.5.276~4-6f366]:main():lattice-oracle.cc:383) Overall %WER 1.80567 [ 362 / 20048, 108 insertions, 25 deletions, 229 substitutions ] exp_dec/incre.fl.1f/incre/ora.den.base:LOG (lattice-depth[5.5.276~4-6f366]:main():lattice-depth.cc:79) Overall density is 28.1682 over 244027 frames. + grep -H WER exp_dec/incre.fl.1f/base/wer exp_dec/incre.fl.1f/incre/wer exp_dec/incre.fl.1f/base/wer:%WER 12.57 [ 2532 / 20138, 305 ins, 287 del, 1940 sub ] exp_dec/incre.fl.1f/incre/wer:%WER 12.57 [ 2532 / 20138, 305 ins, 287 del, 1940 sub ] + grep real exp_dec/incre.fl.1f/base/log/decode.1.log exp_dec/incre.fl.1f/incre/log/decode.1.log exp_dec/incre.fl.1f/base/log/decode.1.log:LOG (latgen-faster-mapped[5.5.276~4-6f366]:main():latgen-faster-mapped.cc:164) Time taken 48.4324s: real-time factor assuming 100 frames/sec is 0.912442 exp_dec/incre.fl.1f/incre/log/decode.1.log:LOG (latgen-incremental-mapped[5.5.276~4-6f366]:main():latgen-incremental-mapped.cc:164) Time taken 54.6669s: real-time factor assuming 100 frames/sec is 1.0299
…son (remove it later) 2. add determinize-beam-offset. By this way, the beam used in lattice determinization is (determinize_beam_offset + lattice_beam)
the new algorithm is to determinize "states in the appended lattice with final-arcs to also have non-final arcs leaving them"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small comments in the review.... but there are a couple of slightly bigger issues (and I'm not sure whether to just merge this now or to wait), are:
-
IMO the right way to demonstrate the utility of this is to have a version of an online-decoding setup that uses this.. since this is mostly useful in online scenarios. E.g. modify
online2bin/online2-wav-nnet3-latgen-faster.cc -> online2bin/online2-wav-nnet3-latgen-incremental.cc
and have some code in there that calls it in the way a real user would call it-- i.e. not by calling Decode(), which requires all the input to be available, but by calling AdvanceDecoding() periodically and getting the lattice (possibly with a NULL pointer if it's not needed). -
Someone needs to go over the comments with a fine tooth comb. There are some that need to be reorganized/moved, and generally just making sure they are clear and that they are consistent with how we defined things in the paper.
@hainan-xv do you have any time to do the comment-related part?
I am not sure whether @chenzhehuai still has time to do any work on this, i.e. whether I should ask him or you to do the coding part and the associated testing.
bool GetBestPath(Lattice *ofst, bool use_final_probs = true); | ||
|
||
/** | ||
The following function is specifically designed for incremental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that you are making an effort to explain things, but we should separate explanatin of the interace and external behavior, from explanation of the algorithm. It may be better to just refer to the paper for explanation of the algorithm.
@danpovey Do you mean having the following
btw, Hainan says he will take some time to refine the comment. |
Yes, that's what I mean. |
oh-- and your build is failing. |
@danpovey Done. Hainan, please review this version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comment
|
||
decoder.AdvanceDecoding(); | ||
|
||
if (do_endpointing && decoder.EndpointDetected(endpoint_opts)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be somewhere inside this loop where it keeps the lattice computation up to date, e.g. call GetLattice() with a NULL argument. Otherwise it's not doing the online stuff in a meaningful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now I call GetLattice() inside AdvanceDecoding(). Do you think it'd better be move it to here?
@chenzhehuai @hainan-xv I guess you have been super busy, but I don't think there is much to do here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. The PR looks very good overall. I'm talking to Zhehuai offline regarding some code not included in this review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments.
} | ||
|
||
template <typename FST, typename Token> | ||
bool LatticeIncrementalDecoderTpl<FST, Token>::GetLattice(bool use_final_probs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a function in the interface that does the incremental-determinization work without actually outputting the lattice. (Because determinizer_.GetDeterminizedLattice() does extra work to get the lattice and we might sometimes want to do the incremental-determinization work when we don't need the lattice).
The work done in this branch is pretty helpful for one of our projects, thanks for working to add it to kaldi. We've tried using this branch and it generally works well, but we've noticed some frequent problems when there are relatively long (ex: >15-20 seconds) portions of non-speech if determinize_max_active is set to a value >=6 (more often for higher values). We see this issue for several different models/LMs. In these non-speech regions the number of states in the chunk and lattice roughly doubles after every chunk. This can quickly reduce the decoding speed to many times slower than real-time, consume large amounts of memory, and in many cases it cannot fully decode an audio file. If it makes it through the silence region it then proceeds as normal with a typical number of states in each new chunk but the number of states in the lattice remains very large. The paper only presents results for values of 25-150... how unexpected is it do require such a small value for some audio files? We can reproduce this behavior pretty reliably on TEDLIUM audio where there is usually ~17 seconds of music and applause near the beginning of each audio file. Using larger chunk sizes also eliminates the problem, but the value required to work around this issue is file specific and larger values reduces the benefits of using incremental determinization. The results below were generated from the audio file (a TEDLIUM audio file chopped up a little so the problematic music/applause part is in the middle of a 50 second file) "https://s3.amazonaws.com/cobalt-release/perm/824594725093/AimeeMullins_2009P-middle10-first30-middle10.wav" when running the following command with different values for --determinize-period. The number of states in the/chunk lattice at the problematic area of the audio file are shown for each test case (other parameters are defaults).
--determinize-max-active=10 (finishes in 6 seconds for the 50 second file, often a value of 5 is needed to avoid a slowdowns on some audio files) --determinize-max-active=20 (finishes in 8.8 minutes for the 50 second file) We see this behavior for all of our models that we tried (it doesn't seem specific to a particular AM/LM). |
Can you please look at branch #3737 which is suppose to replace this ? . Closing this PR. |
thanks, I'll try it out |
The original lattice determinization algorithm is always conducted after we generate the lattice of the whole utterance. The reason is that it consumes the lattice of the whole utterance to remain the only best output-label sequence (of HMM states) for each input-label sequence (of words).
The motivation of incremental determinization is to spread out the work of determinization over time, which can be useful for online applications. The method is non-trivial because it determinizes the lattice chunk-by-chunk while still guaranteeing the successful path, going through all chunks, is unique for any unique input label sequence.
Our method is to decode WFSTs and generate lattices at each frame as the previous method. And then we chunk lattices over time and do lattice determinization chunk-by-chunk (with specific designs). After that, we append chunks together incrementally (with specific designs).
We are working on summarizing the algorithm and experiments https://www.overleaf.com/read/qmzpxkjypdvk
@hainan-xv @mahsa7823 @LvHang