Skip to content

Releases: PyThaiNLP/pythainlp

PyThaiNLP v2.3.2 Release!

25 Aug 02:45
ef3503a
Compare
Choose a tag to compare

PyThaiNLP v2.3.2 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed

  • Fixed clause_tokenize returns an empty list. #609

Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

PyThaiNLP v2.4.0-dev0

01 Aug 06:59
648a608
Compare
Choose a tag to compare
PyThaiNLP v2.4.0-dev0 Pre-release
Pre-release

PyThaiNLP v2.4.0-dev0 is The first development release of PyThaiNLP 2.4 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.4 change log #545

News

Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1
We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.

Deprecation and other API changes

  • #550 Deprecated syllable_tokenize. syllable_tokenize is deprecated, use subword_tokenize instead
  • 701fb3a pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.

Augment

  • #580 Add Thai Text Augmentation

Corpus

  • #557 Fix lots of misspellings in dictionary (words_th.txt)
  • #576 Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on default_db.json and you dont load last thainer model from Internet.

Tag

  • #599 Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #600 Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • #589 Add pythainlp.translate.Translate Class
  • #588 Add Chinese-Thai Machine Translation

Tokenization

  • #562 Tokenize repeating dots and commas from numbers
  • #585 Fix token_max_len bug that makes it always zero
  • #562 Tokenize repeating dots and commas from numbers (fix #461)
  • #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • 3144110 Add SEFR CUT to pythainlp
  • #599 Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Transliterate

  • #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • #585 Manually merge update-royin branch with dev branch to add O-ANG rule
  • #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Word Vector

  • #573 Fix token_max_len bug that makes it always zero
  • #583 Add pythainlp.word_vector.WordVector

Spell

  • #591 Add more spelling engine
  • #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Generate

  • #579 Add pythainlp.generate

Other

  • #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

PyThaiNLP v2.3.1 Release!

04 Apr 17:29
449e9b0
Compare
Choose a tag to compare

PyThaiNLP v2.3.1 is This release is a bug fix release of PyThaiNLP 2.3.

Bug Fixed

Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img)


We build Thai NLP.

PyThaiNLP

PyThaiNLP v2.3.1-dev0

04 Apr 11:43
Compare
Choose a tag to compare
PyThaiNLP v2.3.1-dev0 Pre-release
Pre-release

PyThaiNLP v2.3.1-dev0 is The development release of PyThaiNLP 2.3.1 (For development only)

Bug Fixed

Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

PyThaiNLP v2.3.0 Release!

31 Mar 13:17
cb81a2f
Compare
Choose a tag to compare

PyThaiNLP v2.3.0 is The production release of PyThaiNLP 2.3

Documentation: https://pythainlp.github.io/docs/2.3/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

You can install or upgrade using pip install -U pythainlp

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Thanks all the contributors. (Image made with contributors-img)


We build Thai NLP.

PyThaiNLP

PyThaiNLP v2.3.0-dev1

23 Mar 10:54
Compare
Choose a tag to compare
PyThaiNLP v2.3.0-dev1 Pre-release
Pre-release

PyThaiNLP v2.3.0-dev1 is The development release of PyThaiNLP 2.3 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

PyThaiNLP v2.3.0-beta1

23 Mar 11:21
Compare
Choose a tag to compare
Pre-release

PyThaiNLP v2.3.0-beta1 is The first beta release of PyThaiNLP 2.3

Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

Links

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

v2.3.0-dev0

16 Mar 07:02
2d912bd
Compare
Choose a tag to compare
v2.3.0-dev0 Pre-release
Pre-release

PyThaiNLP v2.3.0-dev0 is The first development release of PyThaiNLP 2.3 (For development only)

Documentation: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

See PyThaiNLP 2.3 change log #445

Deprecation and other API changes

Tokenizer

  • #484 Add: model option for attacut.tokenize()
  • #502 Add: corpus.util.revise_wordset() to revise tokenization dictionary
  • #503 Add: NERCut tokenization engine

Corpus

POS Tagger

  • #464 Add: LST20 language model for part-of-speech tagging
  • #468 Add: port PerceptronTagger from NTLK. POS tagging no longer needs NLTK for dependency.
  • #478 Update: ORCHID POS tags documentation

Name Entity Tagging

  • #526 Update ThaiNER 1.4 to ThaiNER 1.5
  • #538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate

  • #485 Fixed Romanize failed in some examples
  • #511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize

  • #523 Add mT5 text summarize to pythainlp.summarize

Chunk parser

  • #524 Add pythainlp.tag.chunk

Util

  • #481 Fix: remove_repeat_vowels() bug that remove spaces between different vowels
  • #483 Add: add remove() method to remove a word from a trie -- thanks @korakot
  • #490 Fix: thai_strftime() - normalize output for unsupported directive (running in glibc and musl should produce the same output)
  • #512 Add: emoji_to_thai() to convert emoji to Thai description -- thanks @ppirch for the development
  • #513 Add: thai_keyboard_dist() to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks @ppirch for the development

PyThaiNLP 2.2.6

13 Dec 12:20
590e24d
Compare
Choose a tag to compare

PyThaiNLP 2.2.6 Released!
This release is a bug fix release.

  • Update pythainlp.tag docs #492
  • thai_strftime: Normalize output for unsupported directive #490
  • port pickle to json and add lst20 postag model to pythainlp.corpus #488

Thanks to the following contributors to 2.2.6: @c4n

Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md

You can install or upgrade using pip install -U pythainlp

We build Thai NLP
PyThaiNLP Team

PyThaiNLP 2.2.5

16 Nov 13:00
Compare
Choose a tag to compare

PyThaiNLP 2.2.5 Released!
This release is a bug fix release.

  • Fix: not found file for pythainlp.corpus #486

https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.5

You can install or upgrade using pip install -U pythainlp
Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
Tutorials: https://thainlp.org/pythainlp/tutorials/
GitHub: https://github.com/PyThaiNLP/pythainlp
We build Thai NLP
PyThaiNLP Team