Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
Signed-off-by: Adrien Gallouët <[email protected]>
  • Loading branch information
angt committed Feb 7, 2025
1 parent b77d05d commit d96a777
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/backends/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,10 @@ The table below summarizes key options:
| `--split-mode` | Split the model across multiple GPUs |
| `--defrag-threshold` | Defragment the KV cache if holes/size > threshold |
| `--numa` | Enable NUMA optimizations |
| `--use-mmap` | Use memory mapping for the model |
| `--use-mlock` | Use memory locking to prevent swapping |
| `--offload-kqv` | Enable offloading of KQV operations to the GPU |
| `--flash-attention` | Enable flash attention for faster inference |
| `--type-k` | Data type used for K cache |
| `--type-v` | Data type used for V cache |
| `--validation-workers` | Number of tokenizer workers used for payload validation and truncation |
Expand Down

0 comments on commit d96a777

Please sign in to comment.