v2.0.0-beta.4

Pre-release

Pre-release

amcadmus released this 04 Aug 08:37

· 1820 commits to devel since this release

ee0ed99

New features:

parallel training (#892 #905 #913) (Bytedance)
automatically determine the sel from the training data. (#831)
build low and high precision at the same time (#879)

Performance improvement:

speedup tabulate cuda kernel by reducing shm using (#830) (Bytedance)
speedup format_nlist_b (#832 #845)

Enhancements:

support to specify CUDA/ROCm root in python pkg building (#834) (Bytedance)
use cached Session to speed up py tests (#833)
add message for DecodeError raised when using model compression (#839)
remove cub include for CUDA>=11 (#866)
Add Errcheck after every kernel function runs And merge redundant code (#855)
adapt changes to auditwheel directory in manylinux (#889)
enhance the cli to generate doc json file (#891)
raise warning before training if sel is not enough (#914)

Bug fixings:

fix bug 824 and Synchronize updates to CUDA cod (#828)
Fix the empty neighbor distance array in neighbor_stat.py (#882)
fix InvalidArgumentError caused by zero sel and optimize zero matrix (#900)
fix 'NoneType' has no len() in auto_sel (#911)

Assets 7