Skip to content

Commit 8a05b51

Browse files
BuyuanCuijimreganeginhardvsl9pre-commit-ci[bot]
authored
ZH sentence-level TN (#112)
* Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <[email protected]> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <[email protected]> * whitespace fixes Signed-off-by: Jim O'Regan <[email protected]> * also fix in the verbaliser Signed-off-by: Jim O'Regan <[email protected]> * Update Jenkinsfile Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <[email protected]> Signed-off-by: Alex Cui <[email protected]> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <[email protected]> * Remove unused imports Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <[email protected]> --------- Signed-off-by: ealbasiri <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add inits Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * add country codes from hu (#77) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix electronic case for username (#75) * fix electronic username w/o . Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable sv tests Signed-off-by: Evelina <[email protected]> * disable sv tests Signed-off-by: Evelina <[email protected]> * fix ar test Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable sv tests Signed-off-by: Evelina <[email protected]> * update ci dirs, enable sv tests Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * 0.1.8 release (#79) Signed-off-by: Evelina <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Codeswitched ES/EN ITN (#78) * Initial commit for ES-EN codeswitched ITN Signed-off-by: Anand Joseph <[email protected]> * Enable export for es_en codeswitched ITN Signed-off-by: Anand Joseph <[email protected]> * Add whitelist, update weights Signed-off-by: Anand Joseph <[email protected]> * Add tests for en_es, zone tagged separately in es Signed-off-by: Anand Joseph <[email protected]> * Fix path to test data for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update Jenkinsfile - enable ES/EN tests Signed-off-by: Anand Joseph <[email protected]> * Add __init__.py files Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix issues with failed docker build - due to archiving of debian and issues with re2 Signed-off-by: Anand Joseph <[email protected]> * Remove unused imports and variables Signed-off-by: Anand Joseph <[email protected]> * Update date Signed-off-by: Anand Joseph <[email protected]> * Enable NBSP in sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update copyrights Signed-off-by: Anand Joseph <[email protected]> * Update cache path in for ES/EN CI/CD Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * electronic verbalizer fallback (#81) * 0.1.8 release Signed-off-by: Evelina <[email protected]> * add elec fallback Signed-off-by: Evelina <[email protected]> * update ci Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * minor normalize.py edit for usability (#84) * electronic verbalizer fallback (#81) * 0.1.8 release Signed-off-by: Evelina <[email protected]> * add elec fallback Signed-off-by: Evelina <[email protected]> * update ci Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Linnea Pari Leaver <[email protected]> * documentation edits for grammar/clarity Signed-off-by: Linnea Pari Leaver <[email protected]> * added --output_field flag for command line interface Signed-off-by: Linnea Pari Leaver <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <[email protected]> Signed-off-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Linnea Pari Leaver <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Swedish ITN (#40) * force two digits for month Signed-off-by: Jim O'Regan <[email protected]> * put it in a function, because I reject the garbage pre-commit.ci came up with Signed-off-by: Jim O'Regan <[email protected]> * wrap some more pieces Signed-off-by: Jim O'Regan <[email protected]> * add graph pieces Signed-off-by: Jim O'Regan <[email protected]> * delete junk Signed-off-by: Jim O'Regan <[email protected]> * my copyright Signed-off-by: Jim O'Regan <[email protected]> * add date verbaliser (copy from es) Signed-off-by: Jim O'Regan <[email protected]> * tweaks Signed-off-by: Jim O'Regan <[email protected]> * add date verbaliser Signed-off-by: Jim O'Regan <[email protected]> * add right tokens Signed-off-by: Jim O'Regan <[email protected]> * some tweaks, more needed Signed-off-by: Jim O'Regan <[email protected]> * basic test cases Signed-off-by: Jim O'Regan <[email protected]> * tweaks to TN date tagger Signed-off-by: Jim O'Regan <[email protected]> * tweaks to ITN date tagger Signed-off-by: Jim O'Regan <[email protected]> * tweaks to TN date tagger Signed-off-by: Jim O'Regan <[email protected]> * remove duplicate Signed-off-by: Jim O'Regan <[email protected]> * moved to tagger Signed-off-by: Jim O'Regan <[email protected]> * nothing actually fixed here Signed-off-by: Jim O'Regan <[email protected]> * now most tests pass Signed-off-by: Jim O'Regan <[email protected]> * electronic Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fractions Signed-off-by: Jim O'Regan <[email protected]> * extend Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bare fractions is a bit of an overreach Signed-off-by: Jim O'Regan <[email protected]> * whitelist Signed-off-by: Jim O'Regan <[email protected]> * just inverting the TN whitelist tagger will not work/be useful Signed-off-by: Jim O'Regan <[email protected]> * copy from English Signed-off-by: Jim O'Regan <[email protected]> * overwrite with version from en Signed-off-by: Jim O'Regan <[email protected]> * add basic test case Signed-off-by: Jim O'Regan <[email protected]> * fix call Signed-off-by: Jim O'Regan <[email protected]> * swap tsv sides Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * add optional_era variable Signed-off-by: Jim O'Regan <[email protected]> * add test case Signed-off-by: Jim O'Regan <[email protected]> * make deterministic default, like most of the others Signed-off-by: Jim O'Regan <[email protected]> * also add lowercase versions Signed-off-by: Jim O'Regan <[email protected]> * replacing NEMO_SPACE does not work either Signed-off-by: Jim O'Regan <[email protected]> * increasing weight... did not work last time Signed-off-by: Jim O'Regan <[email protected]> * tweaking test cases, in case it was a sentence splitting issue. It was not Signed-off-by: Jim O'Regan <[email protected]> * put the full stops back Signed-off-by: Jim O'Regan <[email protected]> * add filler words Signed-off-by: Jim O'Regan <[email protected]> * try splitting this out to see if it makes a difference Signed-off-by: Jim O'Regan <[email protected]> * aha, this part should be non-deterministic only Signed-off-by: Jim O'Regan <[email protected]> * single line only Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "increasing weight... did not work last time" This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996. Signed-off-by: Jim O'Regan <[email protected]> * disabling ITN here makes TN work again(?) Signed-off-by: Jim O'Regan <[email protected]> * Revert "disabling ITN here makes TN work again(?)" This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f. Signed-off-by: Jim O'Regan <[email protected]> * changing the variable name fixes norm tests Signed-off-by: Jim O'Regan <[email protected]> * change the variable names Signed-off-by: Jim O'Regan <[email protected]> * add missing test tooling Signed-off-by: Jim O'Regan <[email protected]> * copy telephone fixes from hu Signed-off-by: Jim O'Regan <[email protected]> * copy telephone fixes from hu Signed-off-by: Jim O'Regan <[email protected]> * add a piece for area codes for ITN Signed-off-by: Jim O'Regan <[email protected]> * add country codes from hu Signed-off-by: Jim O'Regan <[email protected]> * extend any_read_digit for ITN Signed-off-by: Jim O'Regan <[email protected]> * country/area codes for ITN Signed-off-by: Jim O'Regan <[email protected]> * first attempt Signed-off-by: Jim O'Regan <[email protected]> * add to t&c Signed-off-by: Jim O'Regan <[email protected]> * add to t&c Signed-off-by: Jim O'Regan <[email protected]> * remove country codes for the time being, makes things ambiguous Signed-off-by: Jim O'Regan <[email protected]> * basic test cases Signed-off-by: Jim O'Regan <[email protected]> * fix Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove trailing whitespace Signed-off-by: Jim O'Regan <[email protected]> * Update __init__.py Signed-off-by: Jim O’Regan <[email protected]> * fix comment Signed-off-by: Jim O'Regan <[email protected]> * fix comment Signed-off-by: Jim O'Regan <[email protected]> * basic transform of TN tests Signed-off-by: Jim O'Regan <[email protected]> * basic transformation of TN decimal tests Signed-off-by: Jim O'Regan <[email protected]> * slight changes to date Signed-off-by: Jim O'Regan <[email protected]> * tweak Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * include space Signed-off-by: Jim O'Regan <[email protected]> * problem with tusen Signed-off-by: Jim O'Regan <[email protected]> * problem with tusen was not that Signed-off-by: Jim O'Regan <[email protected]> * add functions from hu Signed-off-by: Jim O'Regan <[email protected]> * respect my own copyright xD Signed-off-by: Jim O'Regan <[email protected]> * move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage Signed-off-by: Jim O'Regan <[email protected]> * move data loading, this has been an oddity before Signed-off-by: Jim O'Regan <[email protected]> * try changing this year declaration Signed-off-by: Jim O'Regan <[email protected]> * add year + era Signed-off-by: Jim O'Regan <[email protected]> * eliminate more module-level data loading Signed-off-by: Jim O'Regan <[email protected]> * Revert "eliminate more module-level data loading" This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a. Signed-off-by: Jim O'Regan <[email protected]> * expose variables Signed-off-by: Jim O'Regan <[email protected]> * extra param for itn mode Signed-off-by: Jim O'Regan <[email protected]> * change call Signed-off-by: Jim O'Regan <[email protected]> * change comment Signed-off-by: Jim O'Regan <[email protected]> * change comment Signed-off-by: Jim O'Regan <[email protected]> * move data loading Signed-off-by: Jim O'Regan <[email protected]> * fix parens Signed-off-by: Jim O'Regan <[email protected]> * move data loading Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * adapt comments Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adapt/extend tests Signed-off-by: Jim O'Regan <[email protected]> * fix dict init/change keys to something useful Signed-off-by: Jim O'Regan <[email protected]> * initial stab at prefixed numbers Signed-off-by: Jim O'Regan <[email protected]> * some adapting Signed-off-by: Jim O'Regan <[email protected]> * insert kl. if absent Signed-off-by: Jim O'Regan <[email protected]> * fix comments Signed-off-by: Jim O'Regan <[email protected]> * the relative prefixed times Signed-off-by: Jim O'Regan <[email protected]> * + comments Signed-off-by: Jim O'Regan <[email protected]> * enable time Signed-off-by: Jim O'Regan <[email protected]> * space in both directions Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comment Signed-off-by: Jim O'Regan <[email protected]> * fix hours to Signed-off-by: Jim O'Regan <[email protected]> * split by before/after Signed-off-by: Jim O'Regan <[email protected]> * delete, not insert Signed-off-by: Jim O'Regan <[email protected]> * fix if Signed-off-by: Jim O'Regan <[email protected]> * kl. 9 Signed-off-by: Jim O'Regan <[email protected]> * copy from en Signed-off-by: Jim O'Regan <[email protected]> * keep only get_abs_path Signed-off-by: Jim O'Regan <[email protected]> * imports Signed-off-by: Jim O'Regan <[email protected]> * add trimmed file Signed-off-by: Jim O'Regan <[email protected]> * fix imports Signed-off-by: Jim O'Regan <[email protected]> * two abs_paths... could be fun Signed-off-by: Jim O'Regan <[email protected]> * minutes/seconds Signed-off-by: Jim O'Regan <[email protected]> * suffix Signed-off-by: Jim O'Regan <[email protected]> * delete, not insert Signed-off-by: Jim O'Regan <[email protected]> * one optional Signed-off-by: Jim O'Regan <[email protected]> * export variable Signed-off-by: Jim O'Regan <[email protected]> * kl. or one of suffix/zone Signed-off-by: Jim O'Regan <[email protected]> * already disambiguated Signed-off-by: Jim O'Regan <[email protected]> * closure Signed-off-by: Jim O'Regan <[email protected]> * do not insert kl. Signed-off-by: Jim O'Regan <[email protected]> * fix test case Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix spelling Signed-off-by: Jim O'Regan <[email protected]> * Delete measure.py Signed-off-by: Jim O’Regan <[email protected]> * Delete money.py Signed-off-by: Jim O’Regan <[email protected]> * remove unused pieces Signed-off-by: Jim O'Regan <[email protected]> * remove unused pieces Signed-off-by: Jim O'Regan <[email protected]> * remove unused test pieces Signed-off-by: Jim O'Regan <[email protected]> * copy from es Signed-off-by: Jim O'Regan <[email protected]> * add SV ITN Signed-off-by: Jim O'Regan <[email protected]> * add/update __init__ Signed-off-by: Jim O'Regan <[email protected]> * blank line Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comment Signed-off-by: Jim O'Regan <[email protected]> * fix lang Signed-off-by: Jim O'Regan <[email protected]> * fix decimal verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix Signed-off-by: Jim O'Regan <[email protected]> * remove year, conflicts with cardinal Signed-off-by: Jim O'Regan <[email protected]> * space before, not after Signed-off-by: Jim O'Regan <[email protected]> * fix cardinal tests Signed-off-by: Jim O'Regan <[email protected]> * spurious deletion Signed-off-by: Jim O'Regan <[email protected]> * fix comment Signed-off-by: Jim O'Regan <[email protected]> * unused imports Signed-off-by: Jim O'Regan <[email protected]> * re-enable SV TN; enable SV ITN Signed-off-by: Jim O'Regan <[email protected]> * Revert "re-enable SV TN; enable SV ITN" This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b. Signed-off-by: Jim O'Regan <[email protected]> * fix singulras Signed-off-by: Jim O'Regan <[email protected]> * add an export Signed-off-by: Jim O'Regan <[email protected]> * change integer graph Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move spaces Signed-off-by: Jim O'Regan <[email protected]> * use cdrewrite Signed-off-by: Jim O'Regan <[email protected]> * just EOS/BOS Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <[email protected]> * omit en/ett, because they are also articles Signed-off-by: Jim O'Regan <[email protected]> * uncomment Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused Signed-off-by: Jim O'Regan <[email protected]> * strip spaces from decimal part Signed-off-by: Jim O'Regan <[email protected]> * export Signed-off-by: Jim O'Regan <[email protected]> * partial fix, not what I wanted Signed-off-by: Jim O'Regan <[email protected]> * move comment Signed-off-by: Jim O'Regan <[email protected]> * en/ett cannot work in itn case Signed-off-by: Jim O'Regan <[email protected]> * be more deliberate in graph construction Signed-off-by: Jim O'Regan <[email protected]> * accept both Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * +2 tests Signed-off-by: Jim O'Regan <[email protected]> * (try to) accept singular quantities for plurals Signed-off-by: Jim O'Regan <[email protected]> * retry Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * oops Signed-off-by: Jim O'Regan <[email protected]> * replace Signed-off-by: Jim O'Regan <[email protected]> * arcmap Signed-off-by: Jim O'Regan <[email protected]> * version without ones Signed-off-by: Jim O'Regan <[email protected]> * add another test Signed-off-by: Jim O'Regan <[email protected]> * change graph Signed-off-by: Jim O'Regan <[email protected]> * simplify Signed-off-by: Jim O'Regan <[email protected]> * get rid of this, this is where it goes wrong Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more tests Signed-off-by: Jim O'Regan <[email protected]> * add a test Signed-off-by: Jim O'Regan <[email protected]> * multiple states from both ones, try removing and readding Signed-off-by: Jim O'Regan <[email protected]> * remove ones, see if that fixes at least the bare quantities Signed-off-by: Jim O'Regan <[email protected]> * works in the repl, dunno why it still breaks Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate Signed-off-by: Jim O'Regan <[email protected]> * move definition Signed-off-by: Jim O'Regan <[email protected]> * simplify Signed-off-by: Jim O'Regan <[email protected]> * tweak Signed-off-by: Jim O'Regan <[email protected]> * another test Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * local declaration, seems to not be working Signed-off-by: Jim O'Regan <[email protected]> * more tests Signed-off-by: Jim O'Regan <[email protected]> * match verbaliser Signed-off-by: Jim O'Regan <[email protected]> * fix last two failing tests Signed-off-by: Jim O'Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing tests for telephone and word Signed-off-by: Jim O'Regan <[email protected]> * remove unused variable Signed-off-by: Jim O'Regan <[email protected]> * remove unused imports Signed-off-by: Jim O'Regan <[email protected]> * fix comment Signed-off-by: Jim O'Regan <[email protected]> * get rid of convert_space, tests fail Signed-off-by: Jim O'Regan <[email protected]> * put convert_spaces back, change test file; pytest fails Signed-off-by: Jim O'Regan <[email protected]> * Revert "put convert_spaces back, change test file; pytest fails" This reverts commit a7bb7489137b8026aab02aff64df39e874630043. Signed-off-by: Jim O'Regan <[email protected]> * put convert_spaces back, change test file; pytest fails, take 2 Signed-off-by: Jim O'Regan <[email protected]> * deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk Signed-off-by: Jim O'Regan <[email protected]> * try converting the non-breaking spaces in the shell script Signed-off-by: Jim O'Regan <[email protected]> * wrong place Signed-off-by: Jim O'Regan <[email protected]> * fix typo Signed-off-by: Jim O'Regan <[email protected]> * fix path Signed-off-by: Jim O'Regan <[email protected]> * export Signed-off-by: Jim O'Regan <[email protected]> * export Signed-off-by: Jim O'Regan <[email protected]> * remove unused Signed-off-by: Jim O'Regan <[email protected]> * Update date.py Signed-off-by: Jim O’Regan <[email protected]> * Update time.py Signed-off-by: Jim O’Regan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix comment Signed-off-by: Jim O’Regan <[email protected]> * trim comments Signed-off-by: Jim O’Regan <[email protected]> * remove commented line Signed-off-by: Jim O’Regan <[email protected]> * en halv Signed-off-by: Jim O’Regan <[email protected]> * Update test_sparrowhawk_inverse_text_normalization.sh Signed-off-by: Jim O’Regan <[email protected]> --------- Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Italian_TN (#67) * add TN italian Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix init Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix LOCATION Signed-off-by: GiacomoLeoneMaria <[email protected]> * modify graph_utils Signed-off-by: GiacomoLeoneMaria <[email protected]> * correct decimals Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix electronic Signed-off-by: Giacomo Cavallini <[email protected]> * fix electronic Signed-off-by: Giacomo Cavallini <[email protected]> * fix measure Signed-off-by: Giacomo Cavallini <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Signed-off-by: Giacomo Cavallini <[email protected]> Signed-off-by: Mariana <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mariana <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Zh itn (#74) * Add ZH ITN Signed-off-by: Anand Joseph <[email protected]> * Fix copyrights and code cleanup Signed-off-by: Anand Joseph <[email protected]> * Remove invalid tests Signed-off-by: Anand Joseph <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve CodeQL issues Signed-off-by: Anand Joseph <[email protected]> * Cleanup Signed-off-by: Anand Joseph <[email protected]> * Fix missing 'zh' option for ITN and correct comment Signed-off-by: Anand Joseph <[email protected]> * Update __init__.py Change to zh instead of en for the imports. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update for decimal test data Signed-off-by: BuyuanCui <[email protected]> * update for langauge import Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update for Chinese punctuations Signed-off-by: BuyuanCui <[email protected]> * a new class for whitelist Signed-off-by: BuyuanCui <[email protected]> * PYNINI_AVAILABLE = False Signed-off-by: BuyuanCui <[email protected]> * recreated due to file import format issue Signed-off-by: BuyuanCui <[email protected]> * recreated due to format issue Signed-off-by: BuyuanCui <[email protected]> * caught duplicates, removed Signed-off-by: BuyuanCui <[email protected]> * removed duplicates, arranges for CHInese Yuan updates Signed-off-by: BuyuanCui <[email protected]> * updates accordingly to the comments from last PR. Recreated some of the files due to format issues Signed-off-by: BuyuanCui <[email protected]> * removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue Signed-off-by: BuyuanCui <[email protected]> * re-added this file to avoid data file import error Signed-off-by: BuyuanCui <[email protected]> * updated gramamr according to last PR. Removed the acceptance of 千 Signed-off-by: BuyuanCui <[email protected]> * updates Signed-off-by: BuyuanCui <[email protected]> * updated according to last PR. Removed comma after decimal points Signed-off-by: BuyuanCui <[email protected]> * gramamr for Fraction Signed-off-by: BuyuanCui <[email protected]> * gramamr for money and updated according to last PR. Plus process of 元 Signed-off-by: BuyuanCui <[email protected]> * ordinal grammar. updates due to the updates in cardinal grammar Signed-off-by: BuyuanCui <[email protected]> * updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression Signed-off-by: BuyuanCui <[email protected]> * arrangements Signed-off-by: BuyuanCui <[email protected]> * added whitelist grammar Signed-off-by: BuyuanCui <[email protected]> * word grammar for non-classified items Signed-off-by: BuyuanCui <[email protected]> * updated cardinal, decimal, time, itn data Signed-off-by: BuyuanCui <[email protected]> * updates according to last PR Signed-off-by: BuyuanCui <[email protected]> * updates according to the updates for cardinal grammar Signed-off-by: BuyuanCui <[email protected]> * updates for more Mandarin punctuations Signed-off-by: BuyuanCui <[email protected]> * updated accordingly to last PR. removing am pm Signed-off-by: BuyuanCui <[email protected]> * adjustment on the weight Signed-off-by: BuyuanCui <[email protected]> * updated accordingly to the targger updates Signed-off-by: BuyuanCui <[email protected]> * updated accordingly to the time tagger Signed-off-by: BuyuanCui <[email protected]> * updates according to changes in tagger on am and pm Signed-off-by: BuyuanCui <[email protected]> * verbalizer for fraction Signed-off-by: BuyuanCui <[email protected]> * added for mandarin grammar Signed-off-by: BuyuanCui <[email protected]> * kept this file because using English utils results in data namin error Signed-off-by: BuyuanCui <[email protected]> * merge conflict Signed-off-by: BuyuanCui <[email protected]> * removed unsed imports Signed-off-by: BuyuanCui <[email protected]> * deleted unsed import os Signed-off-by: BuyuanCui <[email protected]> * deleted unsed variables Signed-off-by: BuyuanCui <[email protected]> * removed unsed imports Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates and edits based on pr checks Signed-off-by: BuyuanCui <[email protected]> * updates and edits based on pr checks Signed-off-by: BuyuanCui <[email protected]> * format issue, reccreated Signed-off-by: BuyuanCui <[email protected]> * format issue recreated Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed codeing style/format Signed-off-by: BuyuanCui <[email protected]> * fixed coding style and format Signed-off-by: BuyuanCui <[email protected]> * removed duplicated graph for 毛 Signed-off-by: BuyuanCui <[email protected]> * removed the comment Signed-off-by: BuyuanCui <[email protected]> * removed the comment Signed-off-by: BuyuanCui <[email protected]> * removing unnecessary comments Signed-off-by: BuyuanCui <[email protected]> * unnecessary comment removed Signed-off-by: BuyuanCui <[email protected]> * test file updated for more cases Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated with a comment explaining why this file is kept Signed-off-by: BuyuanCui <[email protected]> * updated the file explaining why this file is kept Signed-off-by: BuyuanCui <[email protected]> * added Mandarin as zh Signed-off-by: BuyuanCui <[email protected]> * removing for dplication Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused NEMO objects Signed-off-by: BuyuanCui <[email protected]> * removed duplicates Signed-off-by: BuyuanCui <[email protected]> * removing unsed imports Signed-off-by: BuyuanCui <[email protected]> * updates to fix test file failures Signed-off-by: BuyuanCui <[email protected]> * updates to fix file failtures Signed-off-by: BuyuanCui <[email protected]> * updates to resolve test case failture Signed-off-by: BuyuanCui <[email protected]> * updates to resolve test case failure Signed-off-by: BuyuanCui <[email protected]> * updates to resolve test case failure Signed-off-by: BuyuanCui <[email protected]> * updates to resolve test case failure Signed-off-by: BuyuanCui <[email protected]> * updates to adap to cardinal grammar changes Signed-off-by: BuyuanCui <[email protected]> * updates to adapt to grammar changes Signed-off-by: BuyuanCui <[email protected]> * updates to adopt to cardinal grammar changes Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix style Signed-off-by: BuyuanCui <[email protected]> * fix style Signed-off-by: BuyuanCui <[email protected]> * fix style Signed-off-by: BuyuanCui <[email protected]> * fix style Signed-off-by: BuyuanCui <[email protected]> * fixing pr checks Signed-off-by: BuyuanCui <[email protected]> * removed // for zhtn/itn cache Signed-off-by: BuyuanCui <[email protected]> * Update inverse_normalize.py Added zh as a selection to pass Jenkins checks. Signed-off-by: Buyuan(Alex) Cui <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Buyuan(Alex) Cui <[email protected]> Signed-off-by: BuyuanCui <[email protected]> Co-authored-by: Alex Cui <[email protected]> Co-authored-by: Anand Joseph <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * updated pynini_export.py file to create far files (#88) Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * readd Swedish (#87) Signed-off-by: Jim O'Regan <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Zh tn 0712 (#89) * updates Signed-off-by: BuyuanCui <[email protected]> * updates and fixings according to document on natonal gideline Signed-off-by: BuyuanCui <[email protected]> * Decimal grammar added Signed-off-by: BuyuanCui <[email protected]> * fraction updated Signed-off-by: BuyuanCui <[email protected]> * money updated Signed-off-by: BuyuanCui <[email protected]> * ordinal grammar added Signed-off-by: BuyuanCui <[email protected]> * punctuation grammar added Signed-off-by: BuyuanCui <[email protected]> * time gramamr updated Signed-off-by: BuyuanCui <[email protected]> * tokenizaer updated Signed-off-by: BuyuanCui <[email protected]> * updates on certificate Signed-off-by: BuyuanCui <[email protected]> * data updated and added due to updates and chanegs to the existing grammar Signed-off-by: BuyuanCui <[email protected]> * cardinal updated Signed-off-by: BuyuanCui <[email protected]> * date grammar changed Signed-off-by: BuyuanCui <[email protected]> * decimal grammar added Signed-off-by: BuyuanCui <[email protected]> * grammar updated Signed-off-by: BuyuanCui <[email protected]> * grammar updated Signed-off-by: BuyuanCui <[email protected]> * grammar added Signed-off-by: BuyuanCui <[email protected]> * grammar updates Signed-off-by: BuyuanCui <[email protected]> * test data added Signed-off-by: BuyuanCui <[email protected]> * test python file edits Signed-off-by: BuyuanCui <[email protected]> * updates for tn1.0 and previous tn grammar from contribution Signed-off-by: BuyuanCui <[email protected]> * test cases updated Signed-off-by: BuyuanCui <[email protected]> * coding style fixed Signed-off-by: BuyuanCui <[email protected]> * dates updated for init files Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated the date for zh Signed-off-by: BuyuanCui <[email protected]> * removed unsed imports Signed-off-by: BuyuanCui <[email protected]> * removed comments Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added back the itn tests Signed-off-by: BuyuanCui <[email protected]> * added back measure and math from previou TN Signed-off-by: BuyuanCui <[email protected]> * updated for tests reruns Signed-off-by: BuyuanCui <[email protected]> * updats Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated weights Signed-off-by: BuyuanCui <[email protected]> --------- Signed-off-by: BuyuanCui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Zh tn char (#95) * file name change Signed-off-by: BuyuanCui <[email protected]> * file name change Signed-off-by: BuyuanCui <[email protected]> * file name change Signed-off-by: BuyuanCui <[email protected]> * file name change Signed-off-by: BuyuanCui <[email protected]> * file name change Signed-off-by: BuyuanCui <[email protected]> * file name Signed-off-by: BuyuanCui <[email protected]> * file name Signed-off-by: BuyuanCui <[email protected]> * file name Signed-off-by: BuyuanCui <[email protected]> * file name Signed-off-by: BuyuanCui <[email protected]> * file name Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code stle Signed-off-by: BuyuanCui <[email protected]> * fixed import error Signed-off-by: BuyuanCui <[email protected]> --------- Signed-off-by: BuyuanCui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * audio-based TN fix for empty pred_text/text (#92) * fix for empty pred_text Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unittests Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix path Signed-off-by: Evelina <[email protected]> * fix path Signed-off-by: Evelina <[email protected]> * fix pytest Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * pip 1.2.0 Signed-off-by: Evelina <[email protected]> Signed-off-by: Alex Cui <[email protected]> * French tn (#91) * add tests for fr tn Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add fr tn for cardinals, decimals, fractions and ordinals Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete it far files from tools Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add languages to run_evaluate Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * remove ambiguous spacing Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * enable sh testing for fr tn Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix bug with ordinals Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkinsfile cache date Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix test for ordinals Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update tn cache for fr Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * resolve codeql issues Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Add whitelist_tech.tsv (#96) Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Zhitn 0727 (#93) * updates on itn grammar to pass sparrowhawk tests Signed-off-by: BuyuanCui <[email protected]> * updats for sparrowhawk tests Signed-off-by: BuyuanCui <[email protected]> * updates fro sparrowhawk tests Signed-off-by: BuyuanCui <[email protected]> * coding style fix Signed-off-by: BuyuanCui <[email protected]> * updates for coding style and sparrowhawk test Signed-off-by: BuyuanCui <[email protected]> * updated classes for tests on whitelist and word grammar Signed-off-by: BuyuanCui <[email protected]> * added for tests on whitelist Signed-off-by: BuyuanCui <[email protected]> * added for test on word Signed-off-by: BuyuanCui <[email protected]> * added to run test on whitelist Signed-off-by: BuyuanCui <[email protected]> * added to run test on word Signed-off-by: BuyuanCui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_word.py Removed unused import. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_word.py Removed imports according to CodeQL Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_whitelist.py Removing imports according to CodeQL Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_whitelist.py Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update Jenkinsfile changed zh cache to 07-27-23 as it is the latest update. Signed-off-by: Buyuan(Alex) Cui <[email protected]> --------- Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Buyuan(Alex) Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Es tn romans fix (#98) * fix es tn roman exceptions Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkinsfile Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update eval script for ITN Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * codeql fix Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Change docker image (#102) Change docker image to one including sparrowhawk Signed-off-by: anand-nv <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Print warning instead exception (#97) * raise text Signed-off-by: Nikolay Karpov <[email protected]> * text arg Signed-off-by: Nikolay Karpov <[email protected]> * Failed text Signed-off-by: Nikolay Karpov <[email protected]> * add logger Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm raise Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * logger Signed-off-by: Nikolay Karpov <[email protected]> * NeMo-text-processing Signed-off-by: Nikolay Karpov <[email protected]> * info level Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm raise Signed-off-by: Nikolay Karpov <[email protected]> * verbose Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Normalizer.select_verbalizer Signed-off-by: Nikolay Karpov <[email protected]> * Exception Signed-off-by: Nikolay Karpov <[email protected]> * verbose Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restart ci Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nikolay Karpov <[email protected]> Co-authored-by: Evelina <[email protected]> Signed-off-by: Alex Cui <[email protected]> * warning regardless of verbose flag (#107) * warning Signed-off-by: Nikolay Karpov <[email protected]> * self.verbose Signed-off-by: Nikolay Karpov <[email protected]> --------- Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Unpin setuptools (#106) Signed-off-by: Peter Plantinga <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fixed warnings: File is not always closes. (#113) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Alex Cui <[email protected]> * fix bug #111 (ar currencies) (#117) * fix bug #111 (ar currencies) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update ci folder Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * Logging clean up + IT TN fix (#118) * fix utils and it TN Signed-off-by: Evelina <[email protected]> * clean up Signed-off-by: Evelina <[email protected]> * fix logging Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix format Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix format Signed-off-by: Evelina <[email protected]> * fix format Signed-off-by: Evelina <[email protected]> * add IT TN to CI Signed-off-by: Evelina <[email protected]> * update patch Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * Time_IT_TN (#105) * add time verbalizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * add time tagger and verba Signed-off-by: GiacomoLeoneMaria <[email protected]> * add pytest time Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeQL Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix numbers with eight Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]> * IT TN improvement on tests (#120) * add missing test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fix bug with time tests Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update ci date Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * add sentence test cases Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * refine shortest path for irregular cardinals Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update ci date Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * add single letter exception for roman numerals (#121) * add single letter exception for roman numerals Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update ci dir Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Alex Cui <[email protected]> * rewrote tokenizer Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * removed the file and replaced it with char in 1.8 Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * jenkins file update Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * to fix tn bug@ xuesong Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * tn bug Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * fixeds and updates Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <[email protected]> * adjustments Signed-off-by: BuyuanCui <[email protected]> Signed-off-by: Alex Cui <[email protected]> * testing commit Signed-off-by: Alex Cui <[email protected]> * removing unsed file Signed-off-by: Alex Cui <[email protected]> * updated test cases Signed-off-by: Alex Cui <[email protected]> * updating etst cases Signed-off-by: Alex Cui <[email protected]> * updates adapting to graphs Signed-off-by: Alex Cui <[email protected]> * updated cases for SH tests Signed-off-by: Alex Cui <[email protected]> * updated cases Signed-off-by: Alex Cui <[email protected]> * added some sentences Signed-off-by: Alex Cui <[email protected]> * test cases update Signed-off-by: Alex Cui <[email protected]> * solving rebase issue, repushing changes Signed-off-by: Alex Cui <[email protected]> * resolving conflict Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixings according to ci Signed-off-by: Alex Cui <[email protected]> * fixings according to the ci Signed-off-by: Alex Cui <[email protected]> * removed not used Signed-off-by: Alex Cui <[email protected]> * notused removing Signed-off-by: Alex Cui <[email protected]> * format issue Signed-off-by: Alex Cui <[email protected]> * formt issue Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unused files Signed-off-by: Alex Cui <[email protected]> * removing unused files Signed-off-by: Alex Cui <[email protected]> * remiving unsed files; Signed-off-by: Alex Cui <[email protected]> * removing unsed files Signed-off-by: Alex Cui <[email protected]> * removing unsed files Signed-off-by: Alex Cui <[email protected]> * added sentences as test cases Signed-off-by: Alex Cui <[email protected]> * added senetnces as test cases Signed-off-by: Alex Cui <[email protected]> * removed commentyed out tests Signed-off-by: Alex Cui <[email protected]> * updating dates Signed-off-by: Alex Cui <[email protected]> * attemps to fix bug Signed-off-by: Alex Cui <[email protected]> * inprocess of fixing the bug Signed-off-by: Alex Cui <[email protected]> * fixing existing issue Signed-off-by: Alex Cui <[email protected]> * updated graph_utils, tokenize and classify, and word graphs Signed-off-by: Alex Cui <[email protected]> * added bacl the ppostprocessor far creation Signed-off-by: Alex Cui <[email protected]> * updated NEMO_NOT_ALPHA as a new variable Signed-off-by: Alex Cui <[email protected]> * far files Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * combiedn into measure Signed-off-by: Alex Cui <[email protected]> * removing and combined to meaasure Signed-off-by: Alex Cui <[email protected]> * removing, not used Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates to fix space issue Signed-off-by: Alex Cui <[email protected]> * updates to fix space issue Signed-off-by: Alex Cui <[email protected]> * updates to fix space issue Signed-off-by: Alex Cui <[email protected]> * updates to solve the space issue Signed-off-by: Alex Cui <[email protected]> * resolving sh issue Signed-off-by: Alex Cui <[email protected]> * resolving sh test issue Signed-off-by: Alex Cui <[email protected]> * adding anands updates Signed-off-by: Alex Cui <[email protected]> * data updated for measure and whitelist Signed-off-by: Alex Cui <[email protected]> * updates Signed-off-by: Alex Cui <[email protected]> * updates Signed-off-by: Alex Cui <[email protected]> * updates Signed-off-by: Alex Cui <[email protected]> * removing fraction and math part Signed-off-by: Alex Cui <[email protected]> * removing comments Signed-off-by: Alex Cui <[email protected]> * removing preprocessor, updating measure, adding shitelist cases Signed-off-by: Alex Cui <[email protected]> * removing processor, modification for sp test, shitelist and word Signed-off-by: Alex Cui <[email protected]> * updating zh date Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * realized itn being cvommented out, adding back Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * trying to run zh tn separately because it takes long time to run Signed-off-by: Alex Cui <[email protected]> * modification to ru zh tn separately Signed-off-by: Alex Cui <[email protected]> * independent zh tnitn tests for more time Signed-off-by: Alex Cui <[email protected]> * adding lines to save far file Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates for reducing testing time Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * for ounct graph Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing used graphs Signed-off-by: Alex Cui <[email protected]> * format and removing used comments Signed-off-by: Alex Cui <[email protected]> * removing this one, not used Signed-off-by: Alex Cui <[email protected]> * remove unused commentss� Signed-off-by: Alex Cui <[email protected]> * removing unsed comments Signed-off-by: Alex Cui <[email protected]> * removing unsed comments Signed-off-by: Alex Cui <[email protected]> * removing comments Signed-off-by: Alex Cui <[email protected]> * Delete tools/text_processing_deployment/zh directory Removing far files. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * updates according to the github comments Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing comments Signed-off-by: Alex Cui <[email protected]> * punct grammar Signed-off-by: Alex Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_cases_cardinal.txt Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update Dockerfile Copied from main branch ( which included Anand's updates) Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update launch.sh Found differences in the file. Fixing it back. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_word.py Saw word ITN being commented out. Adding it back. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update money.py Found cardinal grammar not accepting suffix. Fixed it. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update Jenkinsfile Removed duplicated zh test from line 230s Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update utils.py Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update graph_utils.py Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update measure.py Fixing code style, removing unused imports Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update word.py Fixing code style, removing unused imports Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update measure.py Removing unused import. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update post_processing.py Removing unused imports Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update post_processing.py Removing unused import Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update word.py Removing unused imports Signed-off-by: Buyuan(Alex) Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update cardinal.py Deleting unused graph Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update word.py Removing import pynini Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update word.py removing pynini import Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update verbalize.py removing pynutil import Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update post_processing.py removing punct graph imported Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_sparrowhawk_normalization.sh Update on test issue for Docker file locations Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update test_ordinal.py Fixing style. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py Removing because it's not one of the semiotic classes. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py Removing because it's not one of the semiotic classes. Signed-off-by: Buyuan(Alex) Cui <[email protected]> * Update Jenkinsfile Updating Jenkins date Sign…
1 parent 0f67969 commit 8a05b51

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+1006
-1622
lines changed

Jenkinsfile

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pipeline {
2222
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
2323
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
2424
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
25-
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-27-23-0'
25+
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/04-30-24-0'
2626
IT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-26-23-0'
2727
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
2828
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
@@ -189,7 +189,7 @@ pipeline {
189189
}
190190
}
191191

192-
stage('L0: Create RU TN/ITN Grammars & SV & PT & ZH') {
192+
stage('L0: Create RU TN/ITN Grammars & SV & PT') {
193193
when {
194194
anyOf {
195195
branch 'main'
@@ -228,16 +228,6 @@ pipeline {
228228
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=pt --text="dez " --cache_dir ${PT_TN_CACHE}'
229229
}
230230
}
231-
stage('L0: ZH TN grammars') {
232-
steps {
233-
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
234-
}
235-
}
236-
stage('L0: ZH ITN grammars') {
237-
steps {
238-
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="二零零二年一月二十八日 " --cache_dir ${ZH_TN_CACHE}'
239-
}
240-
}
241231
}
242232
}
243233

@@ -267,9 +257,31 @@ pipeline {
267257
}
268258
}
269259
}
260+
stage('L0: Create ZH TN/ITN Grammar') {
261+
when {
262+
anyOf {
263+
branch 'main'
264+
changeRequest target: 'main'
265+
}
266+
}
267+
failFast true
268+
parallel {
269+
stage('L0: ZH ITN grammars') {
270+
steps {
271+
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
272+
}
273+
}
274+
stage('L0: ZH TN grammars') {
275+
steps {
276+
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="6" --cache_dir ${ZH_TN_CACHE}'
277+
}
278+
}
279+
}
280+
}
270281

271282

272283
// L1 Tests starts here
284+
273285
stage('L1: TN/ITN Tests CPU') {
274286
when {
275287
anyOf {

nemo_text_processing/inverse_text_normalization/zh/graph_utils.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
from pynini.export import export
2323
from pynini.lib import byte, pynutil, utf8
2424

25+
from nemo_text_processing.inverse_text_normalization.zh.utils import load_labels
26+
2527
NEMO_CHAR = utf8.VALID_UTF8_CHAR
2628
NEMO_DIGIT = byte.DIGIT
2729
NEMO_HEX = pynini.union(*string.hexdigits).optimize()

nemo_text_processing/inverse_text_normalization/zh/utils.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,17 @@ def get_various_formats(text: str) -> List[str]:
6060
result.append(t.upper())
6161
result.append(t.capitalize())
6262
return result
63+
64+
65+
def load_labels(abs_path):
66+
"""
67+
loads relative path file as dictionary
68+
69+
Args:
70+
abs_path: absolute path
71+
72+
Returns dictionary of mappings
73+
"""
74+
with open(abs_path, encoding="utf-8") as label_tsv:
75+
labels = list(csv.reader(label_tsv, delimiter="\t"))
76+
return labels

nemo_text_processing/text_normalization/zh/data/char/punctuations_zh.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,5 @@
7070
7171
7272
73+
<
74+
>

nemo_text_processing/text_normalization/zh/data/math/symbol.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@
55
×
66
÷
77
°
8+
-

nemo_text_processing/text_normalization/zh/data/measure/units_en.tsv

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
amu 原子质量
22
bar
3-
°
4-
º
53
°c 摄氏度
64
°C 摄氏度
75
ºc 摄氏度
@@ -40,23 +38,6 @@ kw 千瓦
4038
kW 千瓦
4139
lb
4240
lbs
43-
m2 平方米
44-
平方米
45-
m3 立方米
46-
立方米
47-
mbps 兆比特每秒
48-
mg 毫克
49-
mhz 兆赫兹
50-
mi2 平方英里
51-
mi² 平方英里
52-
mi 英里
53-
min 分钟哦
54-
ml 毫升
55-
mm2 平方毫米
56-
mm² 平方毫米
57-
mol 摩尔
58-
mpa 兆帕
59-
mph 英里每小时
6041
ng 纳克
6142
nm 纳米
6243
ns 纳秒
@@ -80,13 +61,7 @@ gb 吉字节
8061
gpa 吉帕斯卡
8162
gy 戈瑞
8263
ha 公顷
83-
m
84-
mm 毫米
85-
ms 毫秒
86-
mv 毫伏
87-
mw 毫瓦
8864
pg 皮克
8965
ps 皮秒
9066
s
91-
ms 毫秒
9267
g

nemo_text_processing/text_normalization/zh/data/measure/units_zh.tsv

Lines changed: 0 additions & 211 deletions
This file was deleted.

nemo_text_processing/text_normalization/zh/data/money/currency_major.tsv

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,6 @@ Ft 匈牙利福林
168168
以色列谢克尔
169169
J$ 牙买加元
170170
лв 哈萨克斯坦腾格
171-
朝鲜园
172171
лв 吉尔吉斯斯坦索姆
173172
老挝基普
174173
ден 马其顿代纳尔
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
1
2+
2
3+
3
4+
4
5+
5
6+
6
7+
7
8+
8
9+
9

0 commit comments

Comments
 (0)