RePlay 0.17.0 Release notes

Highlights
Backwards Incompatible Changes
Deprecations
New Features
Improvements
Bug fixes

Highlights

We are excited to announce the release of RePlay 0.17.0!
The new version fixes serious bugs related to the performance of LabelEncoder and saving checkpoints in transformers. In addition, methods have been added to save splitters and SequentialTokenizer without using pickle.

Backwards Incompatible Changes

Change `SequentialDataset` behavior

When training transformers on big data, a slowdown was detected that increased the epoch time from 5 minutes to 1 hour. The slowdown was due to the fact that by default, the model trainer saves checkpoints every 50 steps of the epoch. While saving the checkpoint, not only the model was saved, but also the entire training dataset was implicitly saved. The behavior was corrected by changing the SequentialDataset and the callbacks used in it. Therefore, using SequentialDataset from older versions will not be possible. Otherwise, no interface changes were required.

Deprecations

Added a deprecation warning related to saving splitters and SequentialTokenizer using a pickle. In future versions, the functionality will be removed.

New Features

A new strategy in the `LabelEncoder`

The drop strategy has been added. It allows you to throw tokens from the dataset that were not present at the training stage. If all rows are deleted, the corresponding warning will appear.

New Linters

We keep up with the latest trends in code quality control, so the list of linters for testing code quality has been updated. The use of Pylint and PyCodestyle has been removed. Added the linters Ruff, Black and toml-sort.

Improvements

PyArrow dependency

The dependency on PyArrow has been adjusted. The RePlay now can work with any version that is greater than 12.0.1.

Bug fixes

Performance fixes at the `partial_fit` stage in `LabelEncoder`

The slowdown occurred when using DataFrame from Pandas. The partial_fit stage had a quadratic running time. The bug has been fixed, now the time linearly depends on the size of the dataset.

Timestamp tokenization when using `SasRec`

Fixed an error that occurs when training a SasRec transformer with a ti_modification=True parameter.

Loading a checkpoint with a modified embedding in the transformers

The error occurred when loading the model on another device, when the dimensions of embeddings in transformers were changed before that. The example of working with embeddings in transformers has been updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.17.0

RePlay 0.17.0 Release notes

Highlights

Backwards Incompatible Changes

Change `SequentialDataset` behavior

Deprecations

New Features

A new strategy in the `LabelEncoder`

New Linters

Improvements

PyArrow dependency

Bug fixes

Performance fixes at the `partial_fit` stage in `LabelEncoder`

Timestamp tokenization when using `SasRec`

Loading a checkpoint with a modified embedding in the transformers

v0.17.0

RePlay 0.17.0 Release notes

Highlights

Backwards Incompatible Changes

Change SequentialDataset behavior

Deprecations

New Features

A new strategy in the LabelEncoder

New Linters

Improvements

PyArrow dependency

Bug fixes

Performance fixes at the partial_fit stage in LabelEncoder

Timestamp tokenization when using SasRec

Loading a checkpoint with a modified embedding in the transformers

Change `SequentialDataset` behavior

A new strategy in the `LabelEncoder`

Performance fixes at the `partial_fit` stage in `LabelEncoder`

Timestamp tokenization when using `SasRec`