Skip to content

v2.0.0-beta.3

Pre-release
Pre-release
Compare
Choose a tag to compare
@amcadmus amcadmus released this 05 Jul 00:42
· 1843 commits to devel since this release
de428e3

New feature:

  • derivatives for deep tensor (#805)

Performance improvement:

  • speedup ROCm kernels which use atomicAdd (#809 #815 ) (from ByteDance)
  • speedup CUDA kernels (use atomicAdd inside) by reducing the global memory write (#811)

Enhancement:

  • add type-embedding developer doc (#762)
  • add model compression support for models with exclude_types feature (#754)
  • improve the doc and user interface of model compression (#772)
  • allow c++ tests to run without internet (#785)
  • support converting models generated in v1.3 to 2.0 compatibility (#725)
  • give a default value to T and convert models from v1.2 to 2.0 compatibility (#789)
  • improved documents for conda (#798)
  • throw a message if tf runtime is incompatible (#797)
  • capture OOM and print debug message (#801)

Bug fixings

  • fix bug of custom op's multiple initialization (#812)
  • fix bug of empty input in gelu.cu (#800)
  • fix model compression bug of zero environment matrix (#808)
  • box.npy is not necessary for nopbc (#810)
  • fill rij with zero (#818)
  • NOT_LOADABLE should be tuple (#825)