You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a test tool for "ramalama serve" (would also be useful as a follow on feature to connect to remote endpoints like openai, perplexity, etc.).
Should behave identically to "ramalama run", except it does not inference within the client process, it should send a request to the server process and stream the results back.
Can exist here as python or, it could also be written in C++ and eventually contributed back to llama.cpp upstream.
The text was updated successfully, but these errors were encountered:
As a test tool for "ramalama serve" (would also be useful as a follow on feature to connect to remote endpoints like openai, perplexity, etc.).
Should behave identically to "ramalama run", except it does not inference within the client process, it should send a request to the server process and stream the results back.
Can exist here as python or, it could also be written in C++ and eventually contributed back to llama.cpp upstream.
The text was updated successfully, but these errors were encountered: