ramalama client #539

ericcurtin · 2025-01-07T10:47:25Z

As a test tool for "ramalama serve" (would also be useful as a follow on feature to connect to remote endpoints like openai, perplexity, etc.).

Should behave identically to "ramalama run", except it does not inference within the client process, it should send a request to the server process and stream the results back.

Can exist here as python or, it could also be written in C++ and eventually contributed back to llama.cpp upstream.

ericcurtin · 2025-01-07T10:52:05Z

Probably makes more sense as a ramalama-only tool in python as the official openai library is in python:

https://github.com/openai/openai-python

ericcurtin · 2025-01-07T11:10:28Z

Since this will rely on python library outside the standard python library, by default we should execute this inside the container

ericcurtin · 2025-01-07T11:15:47Z

This will also help us test vllm, since vllm only operates as a server

ericcurtin added enhancement New feature or request good first issue Good for newcomers labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ramalama client #539

ramalama client #539

ericcurtin commented Jan 7, 2025 •

edited

Loading

ericcurtin commented Jan 7, 2025

ericcurtin commented Jan 7, 2025 •

edited

Loading

ericcurtin commented Jan 7, 2025

ramalama client #539

ramalama client #539

Comments

ericcurtin commented Jan 7, 2025 • edited Loading

ericcurtin commented Jan 7, 2025

ericcurtin commented Jan 7, 2025 • edited Loading

ericcurtin commented Jan 7, 2025

ericcurtin commented Jan 7, 2025 •

edited

Loading

ericcurtin commented Jan 7, 2025 •

edited

Loading