LLaMA.CPP is an open-source project that enables inference of Large Language Models (LLMs) like LLaMA on various hardware. Written in C/C++, it boasts minimal dependencies and supports diverse platforms, from Apple Silicon to NVIDIA GPUs. Notably, it excels in quantization techniques, reducing model sizes and accelerating inference speeds. LLaMA.CPP democratizes access to powerful AI capabilities, allowing users to run sophisticated language models on consumer-grade devices.
LLaMA.CPP uses n_predict
instead of max_tokens
; however you can safely use max_tokens
because it will be converted automatically. To use embeddings you will also need to start your webserver with --embedding
argument and an appropriate model. The expected port is 8080
.
llamacpp
const { LLMInterface } = require('llm-interface');
LLMInterface.setApiKey({'llamacpp': process.env.LLAMACPP_API_KEY});
async function main() {
try {
const response = await LLMInterface.sendMessage('llamacpp', 'Explain the importance of low latency LLMs.');
console.log(response.results);
} catch (error) {
console.error(error);
throw error;
}
}
main();
The following model aliases are provided for this provider.
default
: gpt-3.5-turbolarge
: gpt-3.5-turbosmall
: gpt-3.5-turboagent
: openhermes
default
: nonelarge
: nonesmall
: none
The following parameters can be passed through options
.
cache_prompt
: Details not available, please refer to the LLM provider documentation.dynatemp_exponent
: Details not available, please refer to the LLM provider documentation.dynatemp_range
: Details not available, please refer to the LLM provider documentation.frequency_penalty
: Details not available, please refer to the LLM provider documentation.grammar
: Details not available, please refer to the LLM provider documentation.id_slot
: Details not available, please refer to the LLM provider documentation.ignore_eos
: Details not available, please refer to the LLM provider documentation.image_data
: Details not available, please refer to the LLM provider documentation.json_schema
: Details not available, please refer to the LLM provider documentation.logit_bias
: Details not available, please refer to the LLM provider documentation.max_tokens
: Details not available, please refer to the LLM provider documentation.min_keep
: Details not available, please refer to the LLM provider documentation.min_p
: Details not available, please refer to the LLM provider documentation.mirostat
: Details not available, please refer to the LLM provider documentation.mirostat_eta
: Details not available, please refer to the LLM provider documentation.mirostat_tau
: Details not available, please refer to the LLM provider documentation.n_keep
: Details not available, please refer to the LLM provider documentation.n_probs
: Details not available, please refer to the LLM provider documentation.penalize_nl
: Details not available, please refer to the LLM provider documentation.penalty_prompt
: Details not available, please refer to the LLM provider documentation.presence_penalty
: Details not available, please refer to the LLM provider documentation.repeat_last_n
: Details not available, please refer to the LLM provider documentation.repeat_penalty
: Details not available, please refer to the LLM provider documentation.samplers
: Details not available, please refer to the LLM provider documentation.seed
: Details not available, please refer to the LLM provider documentation.stop
: Details not available, please refer to the LLM provider documentation.stream
: Details not available, please refer to the LLM provider documentation.system_prompt
: Details not available, please refer to the LLM provider documentation.temperature
: Details not available, please refer to the LLM provider documentation.tfs_z
: Details not available, please refer to the LLM provider documentation.top_k
: Details not available, please refer to the LLM provider documentation.top_p
: Details not available, please refer to the LLM provider documentation.typical_p
: Details not available, please refer to the LLM provider documentation.
- Streaming
- Embeddings
No API Key (Local URL): This is not a traditional API so no API key is required. However, a URL(s) is required to use this service. (Ensure you have the matching models installed locally)
To get an API key, first create a LLaMA.CPP account, then visit the link below.
LLaMA.CPP documentation is available here.