ramalama-stack

An external provider for Llama Stack allowing for the use of RamaLama for inference.

Installing

You can install ramalama-stack from PyPI via pip install ramalama-stack

This will install Llama Stack and RamaLama as well if they are not installed already.

Usage

Warning

The following workaround is currently needed to run this provider - see #53 for more details

curl --create-dirs --output ~/.llama/providers.d/remote/inference/ramalama.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/providers.d/remote/inference/ramalama.yaml
curl --create-dirs --output ~/.llama/distributions/ramalama/ramalama-run.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/ramalama-run.yaml

First you will need a RamaLama server running - see the RamaLama project docs for more information.
Ensure you set your INFERENCE_MODEL environment variable to the name of the model you have running via RamaLama.
You can then run the RamaLama external provider via llama stack run ~/.llama/distributions/ramalama/ramalama-run.yaml

Note

You can also run the RamaLama external provider inside of a container via Podman

podman run \
 --net=host \
 --env RAMALAMA_URL=http://0.0.0.0:8080 \
 --env INFERENCE_MODEL=$INFERENCE_MODEL \
 quay.io/ramalama/llama-stack

This will start a Llama Stack server which will use port 8321 by default. You can test this works by configuring the Llama Stack Client to run against this server and sending a test request.

If your client is running on the same machine as the server, you can run llama-stack-client configure --endpoint http://0.0.0.0:8321 --api-key none
If your client is running on a different machine, you can run llama-stack-client configure --endpoint http://<hostname>:8321 --api-key none
The client should give you a message similar to Done! You can now use the Llama Stack Client CLI with endpoint <endpoint>
You can then test the server by running llama-stack-client inference chat-completion --message "tell me a joke" which should return something like

ChatCompletionResponse(
    completion_message=CompletionMessage(
        content='A man walked into a library and asked the librarian, "Do you have any books on Pavlov\'s dogs
and Schrödinger\'s cat?" The librarian replied, "It rings a bell, but I\'m not sure if it\'s here or not."',
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None,
    metrics=[
        Metric(metric='prompt_tokens', value=14.0, unit=None),
        Metric(metric='completion_tokens', value=63.0, unit=None),
        Metric(metric='total_tokens', value=77.0, unit=None)
    ]
)

Llama Stack User Interface

Llama Stack includes an experimental user-interface, check it out here.

To deploy the UI, run this:

podman run -d --rm --network=container:ramalama --name=streamlit quay.io/redhat-et/streamlit_client:0.1.0

Note

If running on MacOS (not Linux), --network=host doesn't work. You'll need to publish additional ports 8321:8321 and 8501:8501 with the ramalama serve command, then run with network=container:ramalama.

If running on Linux use --network=host or -p 8501:8501 instead. The streamlit container will be able to access the ramalama endpoint with either.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github		.github
docs		docs
src/ramalama_stack		src/ramalama_stack
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ramalama-stack

Installing

Usage

Llama Stack User Interface

About

Uh oh!

Releases 13

Uh oh!

Contributors 7

Uh oh!

Languages

License

containers/ramalama-stack

Folders and files

Latest commit

History

Repository files navigation

ramalama-stack

Installing

Usage

Llama Stack User Interface

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Uh oh!

Contributors 7

Uh oh!

Languages