Support AWQ quantization with bias

When the AWQ quantizer was used with a layer that uses a bias, the bias tensor was not correctly passed/used. Instead, the value `true`/`1.0` was added to the linear transformation. Correctly pass through the bias when it is not `None`. Fixes #2106.
huggingface · Jun 25, 2024 · f50c401 · f50c401
1 parent fc9c315
commit f50c401
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/server/text_generation_server/layers/awq/quantize/qmodule.py b/server/text_generation_server/layers/awq/quantize/qmodule.py
@@ -35,7 +35,7 @@ def __init__(self, w_bit, group_size, qweight, qzeros, scales, bias):
         self.qweight = qweight
         self.qzeros = qzeros
         self.scales = scales
-        if bias:
+        if bias is not None:
             self.bias = bias
         else:
             self.bias = None

diff --git a/server/text_generation_server/layers/linear.py b/server/text_generation_server/layers/linear.py
@@ -217,7 +217,7 @@ def get_linear(weight, bias, quantize):
                 qweight=weight.qweight,
                 qzeros=weight.qzeros,
                 scales=weight.scales,
-                bias=bias is not None,
+                bias=bias,
             )
         except ImportError:
             raise NotImplementedError(