Skip to content

Conversation

SimBe195
Copy link
Collaborator

@SimBe195 SimBe195 commented Oct 8, 2025

Currently, in LexiconfreeTimesyncBeamSearch and TreeTimesyncBeamSearch sentence-end is not handled by the LabelScorer; it is only scored by the word-level LM. This PR adds logic to properly handle sentence-end transitions in the new search. The changes consist of the following points:

  • SENTENCE_END is added as a new TransitionType for the LabelScorer.
  • For LexiconfreeTimesyncBeamSearch
    • A sentence-end-index can be specified as a parameter
    • The inferTransitionType function is adjusted accordingly to assign it the new SENTENCE_END transition type
    • If it is present, at the end of a segment only hypotheses that have emitted this sentence-end are kept (otherwise sentence-end-fallback is applied)
  • For TreeTimesyncBeamSearch
    • The CtcTreeBuilder is modified to include the sentenceEndLemma in the tree if it exists and has pronunciations.
    • A set of finalStates is added to the PersistentStateTree. This is used in the search to determine which states are considered valid at segment end. If sentence-end is included in the tree, only the sentence-end sink state is added as final state.
    • A sentenceEndLabelIndex_ is added as a member to the search algorithm and inferred from the lexicon
    • The inferTransitionType function is also adjusted to produce the SENTENCE_END transition type
    • Add second-order exits to word-end-hypotheses in decodeStep. This is because when the sentenceEndLemma has an empty pronunciation (i.e., should only be scored by the LM and not the LabelScorer), the hypotheses may need to take a normal word-end exit and then the sentence-end exit back-to-back in the same decode step.

Depends on changes to the transition types from #138.
Still requires testing.

Here are some plots of the new tree structure including sentence-end:

Tree without sentence-end lemma in lexicon:
no_eos

Tree with sentence-end lemma with empty pronunciation in lexicon:
with_eos_no_pron

Tree with sentence-end lemma and non-empty pronunciation in lexicon:
with_eos_with_pron

Base automatically changed from tdp_label_scorer to master October 8, 2025 12:59
// Add optional blank after the sentence-end lemma
if (allowBlankAfterSentenceEnd_) {
StateId blankAfter = extendState(sentenceEndSink_, blankDesc_);
addExit(blankAfter, sentenceEndSink_, lexicon_.specialLemma("blank")->id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to add loop for this blank state?

addTransition(blankAfter, blankAfter);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because if you take the exit at this state, you are transferred to the root again (tr=2), which only has this one state as successor, so basically you already have this loop as blank-state -> root -> blank-state -> root -> ...
The only aspect we might need to think about is the fact that this blank always counts as a word-end hypothesis. However I think this is fine because actually we are at a word-end. Blanks between words also count as word-end hypotheses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that is a loop already.
Now that you mention this blank should be word end hyp and I agree, it seems it's not the case now.
reachedSentenceEnd will be set to true, only when the next token is SENTENCE_END.
There is no SENTENCE_END to BLANK transition type where we'd also want to set that to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to set reachedSentenceEnd to true once we have a SENTENCE_END transition and afterwards it can't be set to false again. But you're right, I also see a problem there now. In the LabelHypothesis of the Lexiconfree Search, reachedSentenceEnd is also set to true for many other transition types. I think this is indeed a bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, this was a bug. Fixed now.

hypIndex};

auto const sts = lemma->syntacticTokenSequence();
if (sts.size() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have seen more than one syntacticTokenSequence in a lemma. Do we need this assertion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced that in PR #145 . First of all, we need to make sure that we have at least one syntacticTokenSequence, otherwise the lemma should not be scored by the LM. We are currently not supporting multiple syntacticTokenSequences in general, therefore we require that we have exactly one. Alternatively, we would have to pick one, probably the first one, anyway. I would say this is an aspect for future work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will implement support for multi-token-sequences when we actually need it. I haven't seen this case so far and it would complicate the logic a bit since we currently delay the history update until after pruning which we can't do if multiple tokens in a row need to be scored.

@hannah220
Copy link
Contributor

In the graph above, t is time, m is emission idx and tr is traceback?

@larissakl
Copy link
Contributor

@hannah220 With m you're correct, it's the emission index, in this case (monophones) it's just the output index (=the position in the lexicon), so

m=0 -> </s>
m=1 -> A
m=2 -> B
m=3 -> _

t is the transition index (doesn't matter here) and tr is the transition state of an exit, so when you are at a word end, you transition to this state. For example after predicting </s> you now have tr=2 which means you go to the state (the root) with ID 2, so to this new "sentence-end root state", while with all "normal" exits you go to the "normal" root with ID 1 because of tr=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants