Releases: PyThaiNLP/pythainlp
PyThaiNLP v2.3.2 Release!
PyThaiNLP v2.3.2
is This release is a bug fix release of PyThaiNLP 2.3.
Bug Fixed
- Fixed clause_tokenize returns an empty list. #609
Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
PyThaiNLP v2.4.0-dev0
PyThaiNLP v2.4.0-dev0
is The first development release of PyThaiNLP 2.4 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.4 change log #545
News
Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1
We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
Deprecation and other API changes
- #550 Deprecated syllable_tokenize.
syllable_tokenize
is deprecated, usesubword_tokenize
instead - 701fb3a
pythainlp.tag.named_entity.ThaiNameTagger
is change topythainlp.tag.thainer.ThaiNameTagger
. This old class will be deprecated in PyThaiNLP version 2.5.
Augment
- #580 Add Thai Text Augmentation
Corpus
- #557 Fix lots of misspellings in dictionary (words_th.txt)
- #576 Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on
default_db.json
and you dont load last thainer model from Internet.
Tag
- #599 Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- #600 Add NER class -
NER
class for Named-entity recognizer tasks.
Translate
Tokenization
- #562 Tokenize repeating dots and commas from numbers
- #585 Fix token_max_len bug that makes it always zero
- #562 Tokenize repeating dots and commas from numbers (fix #461)
- #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- 3144110 Add SEFR CUT to pythainlp
- #599 Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Transliterate
- #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- #585 Manually merge update-royin branch with dev branch to add O-ANG rule
- #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Word Vector
Spell
- #591 Add more spelling engine
- #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- #579 Add pythainlp.generate
Other
- #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
PyThaiNLP v2.3.1 Release!
PyThaiNLP v2.3.1
is This release is a bug fix release of PyThaiNLP 2.3.
Bug Fixed
- Fix gensim #546
Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
PyThaiNLP v2.3.1-dev0
PyThaiNLP v2.3.1-dev0
is The development release of PyThaiNLP 2.3.1 (For development only)
Bug Fixed
- Fix gensim #546
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
PyThaiNLP v2.3.0 Release!
PyThaiNLP v2.3.0
is The production release of PyThaiNLP 2.3
Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
You can install or upgrade using pip install -U pythainlp
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
PyThaiNLP v2.3.0-dev1
PyThaiNLP v2.3.0-dev1
is The development release of PyThaiNLP 2.3 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
PyThaiNLP v2.3.0-beta1
PyThaiNLP v2.3.0-beta1
is The first beta release of PyThaiNLP 2.3
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
Links
- Website: https://pythainlp.github.io
- Docs: https://pythainlp.github.io/dev-docs/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
v2.3.0-dev0
PyThaiNLP v2.3.0-dev0
is The first development release of PyThaiNLP 2.3 (For development only)
Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 2.3 change log #445
Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class.
pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')
(Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)
Tokenizer
- #484 Add: model option for
attacut.tokenize()
- #502 Add:
corpus.util.revise_wordset()
to revise tokenization dictionary - #503 Add:
NERCut
tokenization engine
Corpus
- License change:
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0).
- All language models created by PyThaiNLP project are released under Creative Commons Attribution 4.0 International Public License (CC-by).
- #449 Fix: remove instances with
[
or]
from etcc.txt - #467 Add:
corpus.common.provinces()
can now return romanized names - #476 Add:
thai_family_names()
to get a set of Thai family names - #487 Fix:
thailand_provinces_th.csv
not found issue - #492 Fix: remove erroneous
AITT
tag from ORCHID to UD table -- thanks @c4n for the fix
POS Tagger
- #464 Add:
LST20
language model for part-of-speech tagging - #468 Add: port
PerceptronTagger
from NTLK. POS tagging no longer needs NLTK for dependency. - #478 Update: ORCHID POS tags documentation
Name Entity Tagging
Transliterate
Text Summarize
- #523 Add mT5 text summarize to
pythainlp.summarize
Chunk parser
- #524 Add
pythainlp.tag.chunk
Util
- #481 Fix:
remove_repeat_vowels()
bug that remove spaces between different vowels - #483 Add: add
remove()
method to remove a word from a trie -- thanks @korakot - #490 Fix:
thai_strftime()
- normalize output for unsupported directive (running in glibc and musl should produce the same output) - #512 Add:
emoji_to_thai()
to convert emoji to Thai description -- thanks @ppirch for the development - #513 Add:
thai_keyboard_dist()
to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development
PyThaiNLP 2.2.6
PyThaiNLP 2.2.6 Released!
This release is a bug fix release.
- Update
pythainlp.tag
docs #492 thai_strftime
: Normalize output for unsupported directive #490- port pickle to json and add lst20 postag model to
pythainlp.corpus
#488
Thanks to the following contributors to 2.2.6: @c4n
Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md
You can install or upgrade using pip install -U pythainlp
- GitHub Releases: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.6
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team
PyThaiNLP 2.2.5
PyThaiNLP 2.2.5 Released!
This release is a bug fix release.
- Fix: not found file for pythainlp.corpus #486
https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.5
You can install or upgrade using pip install -U pythainlp
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team