Skip to content

v2.0.0-beta.4

Pre-release
Pre-release
Compare
Choose a tag to compare
@amcadmus amcadmus released this 04 Aug 08:37
· 1820 commits to devel since this release
ee0ed99

New features:

  • parallel training (#892 #905 #913) (Bytedance)
  • automatically determine the sel from the training data. (#831)
  • build low and high precision at the same time (#879)

Performance improvement:

  • speedup tabulate cuda kernel by reducing shm using (#830) (Bytedance)
  • speedup format_nlist_b (#832 #845)

Enhancements:

  • support to specify CUDA/ROCm root in python pkg building (#834) (Bytedance)
  • use cached Session to speed up py tests (#833)
  • add message for DecodeError raised when using model compression (#839)
  • remove cub include for CUDA>=11 (#866)
  • Add Errcheck after every kernel function runs And merge redundant code (#855)
  • adapt changes to auditwheel directory in manylinux (#889)
  • enhance the cli to generate doc json file (#891)
  • raise warning before training if sel is not enough (#914)

Bug fixings:

  • fix bug 824 and Synchronize updates to CUDA cod (#828)
  • Fix the empty neighbor distance array in neighbor_stat.py (#882)
  • fix InvalidArgumentError caused by zero sel and optimize zero matrix (#900)
  • fix 'NoneType' has no len() in auto_sel (#911)