Skip to content

Commit edd2288

Browse files
mgrafufolivoramanhpre-commit-ci[bot]anand-nv
authored
Staging vi tn to main (#338)
* PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <[email protected]> * update sparrowhawk Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * remove irrelevant import and comment Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * add test case for range measure Signed-off-by: folivoramanh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <[email protected]> * refractor minor code Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Fix Jenkinsfile for CI (#325) (#327) * Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <[email protected]> * fix quote in post process Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * revert old codes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Staging vi tn signed off (#339) * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <[email protected]> * update sparrowhawk Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> * Remove irrelavant test case Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <[email protected]> * remove irrelevant import and comment Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * add test case for range measure Signed-off-by: folivoramanh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <[email protected]> * refractor minor code Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <[email protected]> * fix quote in post process Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <[email protected]> * revert old codes Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: folivoramanh <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Comma bugfix for En electronics (#332) * fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * remove unuse import (#340) Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Update Jenkinsfile (#341) Only mount TestData from path Signed-off-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] pre-commit suggestions (#335) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](https://github.com/PyCQA/flake8/compare/7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](https://github.com/PyCQA/isort/compare/6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](https://github.com/psf/black-pre-commit-mirror/compare/25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * fill missing lang in arg run (#347) Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Staging vi tn DCO fixed (#354) * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <[email protected]> * Add missing init file Signed-off-by: folivoramanh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <[email protected]> * update sparrowhawk Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * remove irrelevant import and comment Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * add test case for range measure Signed-off-by: Mai Anh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <[email protected]> * refractor minor code Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) (#327) * Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> Co-authored-by: anand-nv <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <[email protected]> * fix quote in post process Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * revert old codes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Staging vi tn signed off (#339) * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: Mai Anh <[email protected]> * Add missing init file Signed-off-by: Mai Anh <[email protected]> * Fix Cardinal and optimize logic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <[email protected]> * update sparrowhawk Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> * Remove irrelavant test case Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <[email protected]> * remove irrelevant import and comment Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <[email protected]> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * add test case for range measure Signed-off-by: Mai Anh <[email protected]> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <[email protected]> * refractor minor code Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <[email protected]> * fix quote in post process Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <[email protected]> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <[email protected]> * revert old codes Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: Mai Anh <[email protected]> Co-authored-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Comma bugfix for En electronics (#332) * fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * remove unuse import (#340) Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * Update Jenkinsfile (#341) Only mount TestData from path Signed-off-by: anand-nv <[email protected]> Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] pre-commit suggestions (#335) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](https://github.com/PyCQA/flake8/compare/7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](https://github.com/PyCQA/isort/compare/6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](https://github.com/psf/black-pre-commit-mirror/compare/25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <[email protected]> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> * fill missing lang in arg run (#347) Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update vi cache date Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * Refactor Vietnamese (#357) * Refactor Vietnamese ITN taggers: modularize date, add data files, improve naming - Modularize date.py year components for better readability - Add weights to prevent non-deterministic behavior in insert operations - Remove redundant YEAR_WEIGHT constant (use inline weights) - Create zero_prefix.tsv and digit_special.tsv data files - Rename delete_extra_space to delete_single_space in electronic.py for clarity - Add delete_single_space to graph_utils for reuse Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Vietnamese: PSA follow Signed-off-by: Mai Anh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * delete unuse import (#358) Signed-off-by: Mai Anh <[email protected]> --------- Signed-off-by: folivoramanh <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: Mai Anh <[email protected]> Signed-off-by: Mariana <[email protected]> Co-authored-by: Mai Anh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]> Co-authored-by: Mai Anh <[email protected]>
1 parent fcebf16 commit edd2288

File tree

128 files changed

+5263
-454
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+5263
-454
lines changed

Jenkinsfile

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ pipeline {
1919
HU_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/07-16-24-0'
2020
PT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
2121
RU_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
22-
VI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
22+
VI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-29-25-0'
2323
SV_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
2424
ZH_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/11-13-24-0'
2525
IT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/08-22-24-0'
@@ -170,7 +170,7 @@ pipeline {
170170
}
171171
}
172172

173-
stage('L0: Create FR TN/ITN & VI ITN & HU TN & IT TN') {
173+
stage('L0: Create FR TN/ITN & VI TN/ITN & HU TN & IT TN') {
174174
when {
175175
anyOf {
176176
branch 'main'
@@ -196,6 +196,11 @@ pipeline {
196196
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=vi --text="một ngàn " --cache_dir ${VI_TN_CACHE}'
197197
}
198198
}
199+
stage('L0: VI TN grammars') {
200+
steps {
201+
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=vi --text="100" --cache_dir ${VI_TN_CACHE}'
202+
}
203+
}
199204
stage('L0: HU TN grammars') {
200205
steps {
201206
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=hu --text="100" --cache_dir ${HU_TN_CACHE}'

nemo_text_processing/inverse_text_normalization/vi/data/currency.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ $ đô la mỹ
88
won
99
uôn
1010
RM ringgit
11-
đồng
11+
£ bảng anh

nemo_text_processing/inverse_text_normalization/vi/data/electronic/symbols.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
- gạch
33
_ gạch dưới
44
_ shift gạch
5+
_ shift trừ
56
_ síp gạch
67
! chấm than
78
# thăng
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
. chấm
2+
- gạch
3+
- gạch ngang
4+
_ gạch dưới
5+
_ shift gạch
6+
_ shift trừ
7+
_ síp gạch
8+
/ sẹc
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
mốt 1
2+
4
3+
lăm 5
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
linh 0
2+
lẻ 0
3+
không 0

nemo_text_processing/inverse_text_normalization/vi/graph_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
NEMO_SPACE = " "
3838
NEMO_WHITE_SPACE = pynini.union(" ", "\t", "\n", "\r", "\u00a0").optimize()
3939
NEMO_NOT_SPACE = pynini.difference(NEMO_CHAR, NEMO_WHITE_SPACE).optimize()
40-
NEMO_NOT_QUOTE = pynini.difference(NEMO_CHAR, r'"').optimize()
40+
NEMO_NOT_QUOTE = pynini.difference(NEMO_CHAR, '"').optimize()
4141

4242
NEMO_PUNCT = pynini.union(*map(pynini.escape, string.punctuation)).optimize()
4343
NEMO_GRAPH = pynini.union(NEMO_ALNUM, NEMO_PUNCT).optimize()
@@ -47,6 +47,7 @@
4747
delete_space = pynutil.delete(pynini.closure(NEMO_WHITE_SPACE))
4848
insert_space = pynutil.insert(" ")
4949
delete_extra_space = pynini.cross(pynini.closure(NEMO_WHITE_SPACE, 1), " ")
50+
delete_single_space = pynutil.delete(NEMO_SPACE)
5051

5152
# French frequently compounds numbers with hyphen.
5253
delete_hyphen = pynutil.delete(pynini.closure("-", 0, 1))

nemo_text_processing/inverse_text_normalization/vi/taggers/cardinal.py

Lines changed: 51 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -36,117 +36,120 @@ class CardinalFst(GraphFst):
3636

3737
def __init__(self):
3838
super().__init__(name="cardinal", kind="classify")
39-
graph_zero = pynini.string_file(get_abs_path("data/numbers/zero.tsv"))
40-
graph_digit = pynini.string_file(get_abs_path("data/numbers/digit.tsv"))
4139
graph_ties = pynini.string_file(get_abs_path("data/numbers/ties.tsv"))
4240
graph_teen = pynini.string_file(get_abs_path("data/numbers/teen.tsv"))
4341

42+
thousand_words = pynini.union("ngàn", "nghìn")
43+
negative_words = pynini.union("âm", "trừ")
44+
45+
graph_hundred = pynini.cross("trăm", "")
46+
graph_ten = pynini.cross("mươi", "")
47+
zero = pynini.cross(pynini.union("linh", "lẻ"), "0")
48+
49+
graph_zero = pynini.string_file(get_abs_path("data/numbers/zero.tsv"))
50+
graph_digit = pynini.string_file(get_abs_path("data/numbers/digit.tsv"))
4451
graph_one = pynini.cross("mốt", "1")
4552
graph_four = pynini.cross("tư", "4")
4653
graph_five = pynini.cross("lăm", "5")
4754
graph_half = pynini.cross("rưỡi", "5")
48-
graph_hundred = pynini.cross("trăm", "")
49-
graph_ten = pynini.cross("mươi", "")
50-
zero = pynini.cross(pynini.union("linh", "lẻ"), "0")
5155

5256
optional_ten = pynini.closure(delete_space + graph_ten, 0, 1)
5357
last_digit_exception = pynini.project(pynini.cross("năm", "5"), "input")
54-
last_digit = pynini.union(
58+
self.last_digit = pynini.union(
5559
(pynini.project(graph_digit, "input") - last_digit_exception.arcsort()) @ graph_digit,
5660
graph_one,
5761
graph_four,
5862
graph_five,
5963
)
60-
61-
graph_hundred_ties_component = (graph_digit | graph_zero) + delete_space + graph_hundred
62-
graph_hundred_ties_component += delete_space
63-
graph_hundred_ties_component += pynini.union(
64+
last_digit = self.last_digit
65+
# Build hundreds component (e.g., "một trăm", "hai trăm")
66+
graph_hundreds_component = (graph_digit | graph_zero) + delete_space + graph_hundred
67+
graph_hundreds_component += delete_space
68+
graph_hundreds_component += pynini.union(
6469
graph_teen,
65-
(graph_half | graph_four | graph_one) + pynutil.insert("0"),
66-
graph_ties + optional_ten + ((delete_space + last_digit) | pynutil.insert("0")),
67-
zero + delete_space + (graph_digit | graph_four),
68-
pynutil.insert("00"),
69-
)
70-
graph_hundred_ties_component |= (
70+
(graph_half | graph_four | graph_one) + pynutil.insert("0", weight=0.1),
71+
graph_ties + optional_ten + ((delete_space + last_digit) | pynutil.insert("0", weight=0.1)),
72+
zero + delete_space + (graph_digit | graph_four | graph_five),
73+
pynutil.insert("00", weight=0.1),
74+
).optimize()
75+
graph_hundreds_component |= (
7176
pynutil.insert("0")
7277
+ delete_space
7378
+ pynini.union(
7479
graph_teen,
7580
graph_ties + optional_ten + delete_space + last_digit,
76-
graph_ties + delete_space + graph_ten + pynutil.insert("0"),
77-
zero + delete_space + (graph_digit | graph_four),
78-
)
81+
graph_ties + delete_space + graph_ten + pynutil.insert("0", weight=0.1),
82+
zero + delete_space + (graph_digit | graph_four | graph_five),
83+
).optimize()
84+
)
85+
graph_hundred_component = graph_hundreds_component | (
86+
pynutil.insert("00", weight=0.1) + delete_space + graph_digit
7987
)
80-
graph_hundred_component = graph_hundred_ties_component | (pynutil.insert("00") + delete_space + graph_digit)
8188

8289
graph_hundred_component_at_least_one_none_zero_digit = graph_hundred_component @ (
8390
pynini.closure(NEMO_DIGIT) + (NEMO_DIGIT - "0") + pynini.closure(NEMO_DIGIT)
8491
)
8592
self.graph_hundred_component_at_least_one_none_zero_digit = (
86-
graph_hundred_component_at_least_one_none_zero_digit
93+
graph_hundred_component_at_least_one_none_zero_digit.optimize()
8794
)
88-
graph_hundred_ties_zero = graph_hundred_ties_component | pynutil.insert("000")
95+
graph_hundreds_zero = graph_hundreds_component | pynutil.insert("000", weight=0.1)
8996

9097
graph_thousands = pynini.union(
91-
graph_hundred_component_at_least_one_none_zero_digit
92-
+ delete_space
93-
+ pynutil.delete(pynini.union("nghìn", "ngàn")),
98+
graph_hundred_component_at_least_one_none_zero_digit + delete_space + pynutil.delete(thousand_words),
9499
pynutil.insert("000", weight=0.1),
95-
)
96-
97-
graph_ten_thousand = pynini.union(
98-
graph_hundred_component_at_least_one_none_zero_digit + delete_space + pynutil.delete("vạn"),
99-
pynutil.insert("0000", weight=0.1),
100-
)
101-
102-
graph_ten_thousand_suffix = pynini.union(
103-
graph_digit + delete_space + pynutil.delete(pynini.union("nghìn", "ngàn")),
104-
pynutil.insert("0", weight=0.1),
105-
)
100+
).optimize()
106101

107102
graph_million = pynini.union(
108103
graph_hundred_component_at_least_one_none_zero_digit + delete_space + pynutil.delete("triệu"),
109104
pynutil.insert("000", weight=0.1),
110-
)
105+
).optimize()
111106
graph_billion = pynini.union(
112107
graph_hundred_component_at_least_one_none_zero_digit
113108
+ delete_space
114109
+ pynutil.delete(pynini.union("tỉ", "tỷ")),
115110
pynutil.insert("000", weight=0.1),
116-
)
111+
).optimize()
117112

113+
# Main graph combining all magnitude levels
118114
graph = pynini.union(
115+
# Full format: billion + million + thousand + hundred
119116
graph_billion
120117
+ delete_space
121118
+ graph_million
122119
+ delete_space
123120
+ graph_thousands
124121
+ delete_space
125-
+ graph_hundred_ties_zero,
126-
graph_ten_thousand + delete_space + graph_ten_thousand_suffix + delete_space + graph_hundred_ties_zero,
122+
+ graph_hundreds_zero,
123+
# Special thousand format with last digit or "rưỡi" (half)
127124
graph_hundred_component_at_least_one_none_zero_digit
128125
+ delete_space
129-
+ pynutil.delete(pynini.union("nghìn", "ngàn"))
126+
+ pynutil.delete(thousand_words)
130127
+ delete_space
131-
+ (((last_digit | graph_half) + pynutil.insert("00")) | graph_hundred_ties_zero),
128+
+ pynini.union(
129+
pynini.union(last_digit, graph_half) + pynutil.insert("00", weight=0.1), graph_hundreds_zero
130+
),
131+
# Single digits (for non-exception cases)
132132
graph_digit,
133133
graph_zero,
134134
)
135135

136-
graph = graph @ pynini.union(
137-
pynutil.delete(pynini.closure("0")) + pynini.difference(NEMO_DIGIT, "0") + pynini.closure(NEMO_DIGIT),
138-
"0",
136+
graph = (
137+
graph
138+
@ pynini.union(
139+
pynutil.delete(pynini.closure("0")) + pynini.difference(NEMO_DIGIT, "0") + pynini.closure(NEMO_DIGIT),
140+
"0",
141+
).optimize()
139142
)
140143

141144
# don't convert cardinals from zero to nine inclusive
142-
graph_exception = pynini.project(pynini.union(graph_digit, graph_zero), "input")
145+
single_digits = pynini.project(pynini.union(graph_digit, graph_zero), "input").optimize()
143146

144147
self.graph_no_exception = graph
145148

146-
self.graph = (pynini.project(graph, "input") - graph_exception.arcsort()) @ graph
149+
self.graph = pynini.difference(pynini.project(graph, "input"), single_digits) @ graph
147150

148151
optional_minus_graph = pynini.closure(
149-
pynutil.insert("negative: ") + pynini.cross(pynini.union("âm", "trừ"), '"-"') + NEMO_SPACE,
152+
pynutil.insert("negative: ") + pynini.cross(negative_words, '"-"') + NEMO_SPACE,
150153
0,
151154
1,
152155
)

0 commit comments

Comments
 (0)