All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Added
UnicodeSentenceTokenizer
that tokenizes sentences following Unicode segmentation rules using theunicode-segmentation
crate #66 - Added
PunctuationTokenizer
that tokenizes sentences delimited by punctuation #70
- Updated the Python wrapper to use PyO3 0.10 which in particular raises Rust panics as Python exceptions #69
- Added Python 3.8 wheel generation #65
- Tokenizers can now be pickled in Python #73
- Only Python 3.6+ is now supported in the Python package.
- Renamed
UnicodeSegmentTokenizer
toUnicodeWordTokenizer
. #75 - Better error handling. In particular
error::VTextError
is replaced byerror::EstimatorErr
. #76
- Josh Bowles
- Josh Levy-Kramer
- Roman Yurchak