LLaMA.CPP

LLaMA.CPP is an open-source project that enables inference of Large Language Models (LLMs) like LLaMA on various hardware. Written in C/C++, it boasts minimal dependencies and supports diverse platforms, from Apple Silicon to NVIDIA GPUs. Notably, it excels in quantization techniques, reducing model sizes and accelerating inference speeds. LLaMA.CPP democratizes access to powerful AI capabilities, allowing users to run sophisticated language models on consumer-grade devices.

LLaMA.CPP uses n_predict instead of max_tokens; however you can safely use max_tokens because it will be converted automatically. To use embeddings you will also need to start your webserver with --embedding argument and an appropriate model. The expected port is 8080.

Interface Name

llamacpp

Example Usage

const { LLMInterface } = require('llm-interface');

LLMInterface.setApiKey({'llamacpp': process.env.LLAMACPP_API_KEY});

async function main() {
  try {
    const response = await LLMInterface.sendMessage('llamacpp', 'Explain the importance of low latency LLMs.');
    console.log(response.results);
  } catch (error) {
    console.error(error);
    throw error;
  }
}

main();

Model Aliases

The following model aliases are provided for this provider.

default: gpt-3.5-turbo
large: gpt-3.5-turbo
small: gpt-3.5-turbo
agent: openhermes

Embeddings Model Aliases

default: none
large: none
small: none

Options

The following parameters can be passed through options.

cache_prompt: Details not available, please refer to the LLM provider documentation.
dynatemp_exponent: Details not available, please refer to the LLM provider documentation.
dynatemp_range: Details not available, please refer to the LLM provider documentation.
frequency_penalty: Details not available, please refer to the LLM provider documentation.
grammar: Details not available, please refer to the LLM provider documentation.
id_slot: Details not available, please refer to the LLM provider documentation.
ignore_eos: Details not available, please refer to the LLM provider documentation.
image_data: Details not available, please refer to the LLM provider documentation.
json_schema: Details not available, please refer to the LLM provider documentation.
logit_bias: Details not available, please refer to the LLM provider documentation.
max_tokens: Details not available, please refer to the LLM provider documentation.
min_keep: Details not available, please refer to the LLM provider documentation.
min_p: Details not available, please refer to the LLM provider documentation.
mirostat: Details not available, please refer to the LLM provider documentation.
mirostat_eta: Details not available, please refer to the LLM provider documentation.
mirostat_tau: Details not available, please refer to the LLM provider documentation.
n_keep: Details not available, please refer to the LLM provider documentation.
n_probs: Details not available, please refer to the LLM provider documentation.
penalize_nl: Details not available, please refer to the LLM provider documentation.
penalty_prompt: Details not available, please refer to the LLM provider documentation.
presence_penalty: Details not available, please refer to the LLM provider documentation.
repeat_last_n: Details not available, please refer to the LLM provider documentation.
repeat_penalty: Details not available, please refer to the LLM provider documentation.
samplers: Details not available, please refer to the LLM provider documentation.
seed: Details not available, please refer to the LLM provider documentation.
stop: Details not available, please refer to the LLM provider documentation.
stream: Details not available, please refer to the LLM provider documentation.
system_prompt: Details not available, please refer to the LLM provider documentation.
temperature: Details not available, please refer to the LLM provider documentation.
tfs_z: Details not available, please refer to the LLM provider documentation.
top_k: Details not available, please refer to the LLM provider documentation.
top_p: Details not available, please refer to the LLM provider documentation.
typical_p: Details not available, please refer to the LLM provider documentation.

Features

Streaming
Embeddings

Getting an API Key

No API Key (Local URL): This is not a traditional API so no API key is required. However, a URL(s) is required to use this service. (Ensure you have the matching models installed locally)

To get an API key, first create a LLaMA.CPP account, then visit the link below.

http://localhost:8080/v1/chat/completions

LLaMA.CPP Documentation

LLaMA.CPP documentation is available here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamacpp.md

llamacpp.md

LLaMA.CPP

Interface Name

Example Usage

Model Aliases

Embeddings Model Aliases

Options

Features

Getting an API Key

LLaMA.CPP Documentation

Files

llamacpp.md

Latest commit

History

llamacpp.md

File metadata and controls

LLaMA.CPP

Interface Name

Example Usage

Model Aliases

Embeddings Model Aliases

Options

Features

Getting an API Key

LLaMA.CPP Documentation