-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
OpenAI API refactoring + Functions calling #2210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey, @FlorianJoncour. Thanks for your work! I have tested this, but it doesn't works by using openai's official example. Because current api server doesn't support messages with object which is the second request in that example. |
For the min_p typo error, i think you could commit in a separate PR. |
I'm not sure which example you're exactly referring to, so I'm not sure if I've solved your issue, but there was indeed an error when the generation was in stream mode, it should return an array. btw it should work now (using official OpenAI library): Prompt:
Batch mode:
Stream mode:
May I add an example since there are already examples to test the API ? |
@FlorianJoncour This code copied from https://platform.openai.com/docs/guides/function-calling:
Can you execute this example smoothly? The initial request is fine, but the second one isn't supported. |
So you was right in your first message ! Thank you. I added all types for ChatCompletionRequest.messages as described on the OpenAI doc and requests dont crash anymore. It should work now. |
Hey, @FlorianJoncour. I tested the latest commit again. The |
I use the default template. When I display the prompt server side right after tokenizer.apply_chat_template, I get this:
And so the final response is:
Don't mind about model="gpt-3.5-turbo", I set this name for simplicity with various tools. |
@FlorianJoncour I made some modifications on your code, and work now. I might post my changes in the future. For llm models, i think codellama-sft models are better(like phind-codellama). |
Whether the merge has been completed? @FlorianJoncour @esmeetu |
schema = json.dumps(tool.function.parameters, | ||
indent=4) | ||
text_inject += f"```\njsonschema\n{schema}\n```" | ||
text_inject += ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be hard coded? It seems like depending on how the model was trained for function calling, the specific injected text could have a large impact on performance of the model. Seems like a good idea to allow this to be configurable IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following what was suggested by @Tostino , I think that it could be very handly to incorporate a tool_template.json
path to personalize the tool response (probably preserving the current as default)
Particularly, the way the json_schema_params in constructed IMO it's ok. I wouldn't change it.
Nevertheless, incorporating a tool_template.json
like:
{
"prefix": "str",
"suffix": "str",
"order": "top"
"func_call_token": "str"
}
could provide enough freedom in regards of models' adaptation when inject_prompt
is performed. ( For ease of understanding please refer to image below )
What's do you think about @simon-mo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea and easy to implement.
If we need to implement templates, it means that ideally we should remove all text instructions in serving.py
.
So, we need to define a path to store a default tool_template.json
.
The examples/
folder is not suitable.
There are four options:
- Hardcode it as a string in
serving.py
(which I don't like) - The
vllm/entrypoints/openai/
folder (I'm undecided) - Define a
vllm/data/
folder or something similar (I'm undecided) - Add a command-line argument to allow users to define a configuration directory (or
$HOME/.config/vllm/openai
by default) and store data there.
I prefer the last option, knowing that a general-purpose directory might be used for another type of templates.
Besides function calls, LLMs (starting with ChatGPT) can be extended by embeddings (which could potentially be implemented in the future in vLLM), dynamic Python interpreter, etc...
Again, I am not sure if this is the primary goal of vLLM, but having a compatible OpenAI server with most features ready to use seems as important as inference performance!
The OpenAI's documentation gives an idea of what could potentially be implemented.
https://platform.openai.com/docs/assistants/tools
Edit: Another option could be a command-line argument to define the path to a single file, as it is already the case with chat-template. Maybe that would be preferred, but I think it's less general, and if we take this approach, in a year, we might end up with 4 or 5 similar arguments of the same kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd follow the same approach as chat-template
, command-line arg with a path.
vllm/entrypoints/openai/protocol.py
Outdated
@@ -52,9 +52,60 @@ class UsageInfo(BaseModel): | |||
completion_tokens: Optional[int] = 0 | |||
|
|||
|
|||
class FunctionCall(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be cleaner to use the same class name as OAI both for FunctionCall
and ToolCallsMessage
I apologize for the delay, but I am French and vacations are important! I have significantly improved the parsing to capture function calls even if the model doesn't start its message with !function_call, which greatly reduces the number of false negatives. I also added an example usage based on OpenAI's documentation (mentioned by @esmeetu), extented to use two functions. Finally, I also renamed some types to match the types defined in OpenAI's implementation, as suggested by @AguirreNicolas. The current implementation seems good to me, but I'm not sure if it will be merged, as it's not the primary goal of vLLM. If it's not merged, I will publish it in a separate repository. |
Thank you for your contribution! I will make a pass soon.
Yes I will make sure this gets into vLLM. It is one of our priorities in fact. |
I would agree with @esmeetu's suggestion here. Can you keep this PR to just function call implementation? It is complex enough to be reviewed. The refactoring make it hard to see the diff that's brought by function call.
I think vLLM can definitely ship with default set of templates if needed, as part of package data. Then a user configurable override should always be possible. See other model (ChatGLM) that's tuned on tool use have different format: https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md#tool-calling. Airoboros also have slightly different input acceptance: https://github.com/cpacker/MemGPT/blob/febee38db315b17da393bc0849392225ee5cd7b4/memgpt/local_llm/llm_chat_completion_wrappers/airoboros.py#L293-L316 |
Ok, so I close this PR and I will do it in multiple stages. Edit: #2360 |
Thank you ❤️ |
This PR should help in generating function calls which are always valid according to their schema: #2105: Add grammars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FlorianJoncour it is worth noting that trying this out yesterday with Autogen, I had to manually add "tool_choice": "auto" to my llm_config using the latest autogen 0.2.6 in order to get it to pick up the tools and inject them into the prompt. It seems like Autogen doesn't pass a tool choice by default
if request.tool_choice is not None
#Had to add tool_choice for Autogen 0.2.6
llm_config = {
"seed": 42,
"config_list": config_list,
"temperature": 0.0,
"timeout": 3000,
"tool_choice":"auto"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FlorianJoncour I was facing the same problem, but using the ChatOpenAI
class in Langchain. Until @wybartel comments (b.t.w, thanks a lot!) I coudn't find the reason why your tool feature was not called.
def inject_prompt(self, request: ChatCompletionRequest): | ||
""" Tested with : | ||
https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B/discussions/3 """ | ||
if request.tool_choice is not None and request.tools is not None and request.tool_choice == "auto": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a support for tool_choice with function name to force the model to call that function. It'd probably need a different promt for that.
tool_choice
definition from https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice
Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function.
none is the default when no functions are present. auto is the default if functions are present.
@Pernekhan @AguirreNicolas @wybartel The new PR fixe all these issues : #2488 |
Hello,
I had to implement function call handling, so I directly implemented it in vLLM.
To do this, there are two major changes.
Firstly, the OpenAI API has been refactored to separate the server and generation. This makes the whole thing a bit clearer and also allows for creating an OpenAI server outside of Uvicorn.
Next, I implemented something similar to the "tools" in the OpenAI API. (Tools are more general replacements for direct function calls, which are now deprecated).
For now, only function calls are supported.
The system (like OpenAI apparently) works by injecting a prompt and capturing the result.
https://platform.openai.com/docs/guides/function-calling
I developed with the NeuralHermes-2.5-Mistral-7B model, and it works very well.
Some notes:
Improvement points:
Most current models are not fine-tuned for making function calls, and I haven't found a reference for a specific token indicating a function call by the model.
So, I defined something that should works good.
But since prompts now use templates, maybe a future update could use them to define a function call token and thus force future models to also use one, which should make capturing function calls more reliable.
And maybe the injected prompt could also be put into a template.