v2.0.0-beta.4
Pre-release
Pre-release
New features:
- parallel training (#892 #905 #913) (Bytedance)
- automatically determine the
sel
from the training data. (#831) - build low and high precision at the same time (#879)
Performance improvement:
- speedup tabulate cuda kernel by reducing shm using (#830) (Bytedance)
- speedup format_nlist_b (#832 #845)
Enhancements:
- support to specify CUDA/ROCm root in python pkg building (#834) (Bytedance)
- use cached Session to speed up py tests (#833)
- add message for DecodeError raised when using model compression (#839)
- remove cub include for CUDA>=11 (#866)
- Add Errcheck after every kernel function runs And merge redundant code (#855)
- adapt changes to auditwheel directory in manylinux (#889)
- enhance the cli to generate doc json file (#891)
- raise warning before training if sel is not enough (#914)
Bug fixings: