Hi, the content of the kwlist.xml is
<kw kwid="KW">
<kwtext><WORD> word second_word</kwtext>
</kw>
and rttm file is
LEXEME utt 1 0 0.39 <WORD> <NA> <NA> <NA>
LEXEME utt 1 0.39 0.15 word <NA> <NA> <NA>
NON-LEX utt 1 0.54 0.05 <eps> <NA> <NA> <NA>
LEXEME utt 1 0.59 0.17 second_word <NA> <NA> <NA>
then the alignment procedure will not map these two things together (no entry in alignment.csv).
However, when I manually edit the rttm to contain this
LEXEME utt 1 0 0.39 <WORD> <NA> <NA> <NA>
LEXEME utt 1 0.39 0.15 word <NA> <NA> <NA>
NON-LEX utt 1 0.54 0.05 <eps> <NA> <NA> <NA>
LEXEME utt 1 0.59 0.17 second_word <NA> <NA> <NA>
the mapping will be created as expected.
I would assume the xml entities (apos, lt, gt, quot and amp) will be decoded/normalized, because they are enforced by the xml specification to be in the "encoded" form, i.e. it's not at the whim of the user how to put these strings there.
Hi, the content of the kwlist.xml is
and rttm file is
then the alignment procedure will not map these two things together (no entry in alignment.csv).
However, when I manually edit the rttm to contain this
the mapping will be created as expected.
I would assume the xml entities (apos, lt, gt, quot and amp) will be decoded/normalized, because they are enforced by the xml specification to be in the "encoded" form, i.e. it's not at the whim of the user how to put these strings there.