Release v3.1.0 · huggingface/text-generation-inference

Important changes

Deepseek R1 is fully supported on both AMD and Nvidia !

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id deepseek-ai/DeepSeek-R1

What's Changed

Attempt to remove AWS S3 flaky cache for sccache by @mfuntowicz in #2953
Update to attention-kernels 0.2.0 by @danieldk in #2950
fix: Telemetry by @Hugoch in #2957
Fixing the oom maybe with 2.5.1 change. by @Narsil in #2958
Add backend name to telemetry by @Hugoch in #2962
Add fp8 support moe models by @mht-sharma in #2928
Update to moe-kernels 0.8.0 by @danieldk in #2966
Hotfixing intel-cpu (not sure how it was working before). by @Narsil in #2967
Add deepseekv3 by @Narsil in #2968
doc: Update TRTLLM deployment doc. by @Hugoch in #2960
Update moe-kernel to 0.8.2 for rocm by @mht-sharma in #2977
Prepare for release 3.1.0 by @Narsil in #2972

Full Changelog: v3.0.2...v3.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.1.0

Important changes

What's Changed

Contributors