Skip to content

Commit

Permalink
doc: clarify that --quantize is not needed for pre-quantized models (
Browse files Browse the repository at this point in the history
  • Loading branch information
danieldk authored Sep 19, 2024
1 parent c103760 commit abd24dd
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
4 changes: 3 additions & 1 deletion docs/source/reference/launcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ Options:
## QUANTIZE
```shell
--quantize <QUANTIZE>
Whether you want the model to be quantized
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.

Marlin kernels will be used automatically for GPTQ/AWQ models.

[env: QUANTIZE=]

Expand Down
1 change: 1 addition & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
pyright
pytest
pytest-asyncio
redocly
ruff
syrupy
]);
Expand Down
6 changes: 5 additions & 1 deletion launcher/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,11 @@ struct Args {
#[clap(long, env)]
num_shard: Option<usize>,

/// Whether you want the model to be quantized.
/// Quantization method to use for the model. It is not necessary to specify this option
/// for pre-quantized models, since the quantization method is read from the model
/// configuration.
///
/// Marlin kernels will be used automatically for GPTQ/AWQ models.
#[clap(long, env, value_enum)]
quantize: Option<Quantization>,

Expand Down

0 comments on commit abd24dd

Please sign in to comment.