v2.0.0-beta.3

Pre-release

Pre-release

amcadmus released this 05 Jul 00:42

· 1843 commits to devel since this release

de428e3

New feature:

derivatives for deep tensor (#805)

Performance improvement:

speedup ROCm kernels which use atomicAdd (#809 #815 ) (from ByteDance)
speedup CUDA kernels (use atomicAdd inside) by reducing the global memory write (#811)

Enhancement:

add type-embedding developer doc (#762)
add model compression support for models with exclude_types feature (#754)
improve the doc and user interface of model compression (#772)
allow c++ tests to run without internet (#785)
support converting models generated in v1.3 to 2.0 compatibility (#725)
give a default value to T and convert models from v1.2 to 2.0 compatibility (#789)
improved documents for conda (#798)
throw a message if tf runtime is incompatible (#797)
capture OOM and print debug message (#801)

Bug fixings

fix bug of custom op's multiple initialization (#812)
fix bug of empty input in gelu.cu (#800)
fix model compression bug of zero environment matrix (#808)
box.npy is not necessary for nopbc (#810)
fill rij with zero (#818)
NOT_LOADABLE should be tuple (#825)

Assets 8

0 Join discussion