Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ramalama client #539

Open
ericcurtin opened this issue Jan 7, 2025 · 3 comments
Open

ramalama client #539

ericcurtin opened this issue Jan 7, 2025 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 7, 2025

As a test tool for "ramalama serve" (would also be useful as a follow on feature to connect to remote endpoints like openai, perplexity, etc.).

Should behave identically to "ramalama run", except it does not inference within the client process, it should send a request to the server process and stream the results back.

Can exist here as python or, it could also be written in C++ and eventually contributed back to llama.cpp upstream.

@ericcurtin ericcurtin added enhancement New feature or request good first issue Good for newcomers labels Jan 7, 2025
@ericcurtin
Copy link
Collaborator Author

Probably makes more sense as a ramalama-only tool in python as the official openai library is in python:

https://github.com/openai/openai-python

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 7, 2025

Since this will rely on python library outside the standard python library, by default we should execute this inside the container

@ericcurtin
Copy link
Collaborator Author

This will also help us test vllm, since vllm only operates as a server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant