Releases: PyThaiNLP/pythainlp
PyThaiNLP v5.1.0-beta1
Schedule
- First Beta release: 27 December 2024
- Production release: WIP
PyThaiNLP 5.1 Change Log #900
What's Changed
- Add Thai Universal Dependency Treebank postag by @wannaphong in #916
- Add Thai Discourse Treebank postag by @wannaphong in #910
- Update tone_detector() API description by @bact in #919
- Add save and load for pythainlp.classify.param_free.GzipModel by @wannaphong in #908
- Add Thai G2P v2 Grapheme-to-Phoneme model by @wannaphong in #923
- Bump transformers from 4.36.0 to 4.38.0 by @dependabot in #907
- Add preprocess function to split whitespace before
romanize
by @pavaris-pm in #924 - Fix collate() to consider tonemark in ordering by @WTFPUn in #926
- test: Add more cases too covered all possible Marttra by @HRNPH in #929
- Bump github/codeql-action from 2 to 3 by @dependabot in #939
- Bump actions/setup-python from 4 to 5 by @dependabot in #940
- Bump peaceiris/actions-gh-pages from 3 to 4 by @dependabot in #937
- Bump conda-incubator/setup-miniconda from 2 to 3 by @dependabot in #936
- Bump actions/stale from 6 to 9 by @dependabot in #938
- Add support for list of strings as input to sent_tokenize() by @ayaan-qadri in #927
- Bump python-crfsuite from 0.9.9 to 0.9.11 by @dependabot in #943
- Tidy up workflow files by @bact in #946
- Upgrade Python in CI to 3.10 by @bact in #947
- Fix nltk.downloader warning by @bact in #949
- Remove unused pytest by @bact in #950
- Unify unit test workflow across OSes by @bact in #951
- Specify a limited test suite by @bact in #952
- Use common warn_deprecation by @bact in #956
- Move sent_tokenize with default crfcut to testx by @bact in #958
- Merge new sent_tokenize test to fix-954 by @bact in #959
- Move more sent_tokenize test by @bact in #960
- Move more sent_tokenize test by @bact in #961
- Fix sent_tokenize(engine="whitespace") return value to be a list of string by @wannaphong in #957
- Fix maiyamok() that expanding the wrong word by @bact in #962
- Add version to deprecation warnings by @bact in #963
- Remove tests with Sonarcloud issue by @bact in #964
- Add test_tools to test suite by @bact in #965
- Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console by @bact in #969
- Make CLI able to handle Unicode characters output on Windows console by @bact in #968
- Split test_tag and testx_tag by @bact in #970
- Add test_tag to init by @bact in #971
- Add test_corpus to init by @bact in #972
- Add test coverage by @bact in #974
- Add test_khavee to test suite by @bact in #967
- Create CHANGELOG.md by @bact in #975
- Add Compact Tests (testc) by @bact in #976
- Add testc_tools (misspell) by @bact in #977
- Fix warnings and types by @bact in #978
- Fix nlpo3.load_dict() that never print error msg when not success by @bact in #979
- Add tests.compact.transliterate (PyICU test) by @bact in #980
- Add documentation about compact install option by @bact in #981
- Bump symspellpy from 6.7.7 to 6.7.8 by @dependabot in #985
- Bump sentencepiece from 0.1.99 to 0.2.0 by @dependabot in #982
- Bump tensorflow from 2.13.1 to 2.18.0 by @dependabot in #988
- Bump bpemb from 0.3.4 to 0.3.6 by @dependabot in #989
- Add nlpo3 to compact install/test by @bact in #987
- Bump h5py from 3.1.0 to 3.12.1 by @dependabot in #991
- Use "build" instead of setup.py + add "[cd build]" build trigger word by @bact in #994
- Add Thai Solar Date convert to Thai Lunar Date by @wannaphong in #998
- Update requests requirement from ==2.31.* to ==2.32.* by @dependabot in #1003
- Bump gensim from 4.3.2 to 4.3.3 by @dependabot in #1009
- Update numpy requirement from ==1.22.* to ==1.26.* by @dependabot in #1007
- Bump epitran from 1.9 to 1.25.1 by @dependabot in #1006
- Bump astral-sh/ruff-action from 1 to 2 by @dependabot in #1010
- Bump spacy-thai from 0.7.1 to 0.7.8 by @dependabot in #1014
- Bump fairseq from 0.10.2 to 0.12.2 by @dependabot in #1013
- Bump transformers from 4.38.0 to 4.47.0 by @dependabot in #1020
- Bump panphon from 0.20.0 to 0.21.2 by @dependabot in #1022
- Remove clause_tokenize by @wannaphong in #1024
- Update warn_deprecation to get deprecated and removal versions by @bact in #1028
- Remove unnecessary enumerate in expand_maiyamok by @bact in #1029
- Add SPDX FileType by @bact in #1032
- Bump spylls from 0.1.5 to 0.1.7 by @dependabot in #1035
- Bump emoji from 0.5.4 to 0.6.0 by @dependabot in #1036
- Bump wtpsplit from 1.0.1 to 1.3.0 by @dependabot in #1037
- Simplify calculate_f_year_f_dev() by @bact in #1031
- Bump sacremoses from 0.0.41 to 0.1.1 by @dependabot in #1034
- Bump protobuf from 3.20.3 to 5.29.1 by @dependabot in #1033
- Bump protobuf from 5.29.1 to 5.29.2 by @dependabot in #1042
- Bump ufal-chu-liu-edmonds from 1.0.2 to 1.0.3 by @dependabot in #1040
- Bump transformers from 4.47.0 to 4.47.1 by @dependabot in #1039
- Bump astral-sh/ruff-action from 2 to 3 by @dependabot in #1044
- Add Thai pangram text by @wannaphong in #1045
- Fixed #1004 by @wannaphong in #1046
- PyThaiNLP v5.1.0-beta1 by @wannaphong in #1047
New Contributors
- @WTFPUn made their first contribution in #926
- @ayaan-qadri made their first contribution in #927
Full Changelog: v5.0.5...v5.1.0-beta1
PyThaiNLP v5.0.5 Released!
PyThaiNLP v5.0.5
is a bug fix release of PyThaiNLP v5.0
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Add clause_tokenize warnings #1026
- Fix maiyamok() (merge back from #962)
Full Changelog: v5.0.4...v5.0.5
PyThaiNLP v5.0.4 Released!
PyThaiNLP v5.0.4
is a bug fix release of PyThaiNLP v5.0.3
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Fixed #914 by @wannaphong in #917
Full Changelog: v5.0.3...v5.0.4
PyThaiNLP v5.0.3 Released!
PyThaiNLP v5.0.3
is a bug fix release of PyThaiNLP v5.0.2
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Create .editorconfig by @bact in #909
- Fix empty string ('') added (in some cases) when using word_tokenize with join_broken_num=True by @S2P2 in #912
New Contributors
Full Changelog: v5.0.2...v5.0.3
PyThaiNLP v5.0.2 Released!
PyThaiNLP v5.0.2
is a bug fix release of PyThaiNLP v5.0.1
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Update README and license header by @bact in #902
- Updated crfcut.py by @varunkatiyar819 in #905
New Contributors
- @varunkatiyar819 made their first contribution in #905
Full Changelog: v5.0.1...v5.0.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.1 Released!
PyThaiNLP v5.0.1
is a bug fix release of PyThaiNLP v5.0.0
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Fixed bug: ImportError pycrfsuite #901
Full Changelog: v5.0.0...v5.0.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.0 Released!
We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!
With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add πͺΏ Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- Add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: v4.0.2...v5.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.0-beta1
Schedule
- First Beta release: 5 February 2024
- Production release: 10 February 2024
See 5.0 Milestone.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add:
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add πͺΏ Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
PyThaiNLP v5.0.0-dev2
What's Changed
- Add pythainlp.morpheme by @wannaphong in #896
Full Changelog: v5.0.0-dev1...v5.0.0-dev2
PyThaiNLP v5.0.0-dev1
What's Changed
- Add Thai word list from Volubilis dictionary by @konbraphat51 in #870
- Add Thai word list from Thai Wikipedia titles by @konbraphat51 in #869
- switch PyThaiNLP source code to SPDX license ID by @pavaris-pm in #876
- Add pythainlp.util.to_idn by @wannaphong in #875
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Add Thai word list from ICU BreakIterator dictionary by @pavaris-pm in #879
- Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by @dependabot in #885
- Add license info to /tests and README_TH.md by @bact in #886
- Add PhayaThaiBERT engine with new features [WIP] by @pavaris-pm in #873
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- Add pythainlp.corpus.find_synonyms by @wannaphong in #890
- ruff: docstring-code-format = true by @bact in #892
- Add pythainlp.util.morse by @wannaphong in #891
Full Changelog: v5.0.0-dev0...v5.0.0-dev1