You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the outline[serve] installs old versions, install vLLM 0.2.6 and latest outline, plus pydantic 2.0 using pip. (See the actual list of package versions below.)
Host this model (24GB VRAM): deepseek-ai/deepseek-coder-6.7b-instruct
{
"prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite down the first 10 prime numbers as a comma separated list.\n\n### Response:\n",
"n": 1,
"best_of": 1,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"repetition_penalty": 1.0,
"temperature": 0.0,
"top_p": 1.0,
"top_k": -1,
"min_p": 0.0,
"use_beam_search": false,
"length_penalty": 1.0,
"early_stopping": false,
"stop": [],
"stop_token_ids": [],
"include_stop_str_in_output": false,
"ignore_eos": false,
"max_tokens": 50,
"logprobs": null,
"prompt_logprobs": null,
"skip_special_tokens": true,
"spaces_between_special_tokens": true,
"cfg": "\\\n?start: DIGIT+ ( \",\" DIGIT+ )* _WS?\n%import common.DIGIT\n%import common.WS -> _WS\n"
}
The above request will crash with 500 Internal Server Error, see the server side exception below.
The same grammar has been tested OK with Lark without vLLM. The grammar includes optional white-space at the end, because this model seems to require it to stop.
Also, please rename the cfg request parameter to grammar or lark_grammar, thanks!
Error message:
INFO: 192.168.1.70:56873 - "POST /generate HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/viktor/env/outlines/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/viktor/env/outlines/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/routing.py", line 762, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/routing.py", line 782, in app
await route.handle(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/home/viktor/env/outlines/lib/python3.10/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
File "/home/viktor/env/outlines/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/home/viktor/env/outlines/lib/python3.10/site-packages/outlines/serve/serve.py", line 75, in generate
sampling_params = SamplingParams(
TypeError: SamplingParams.__init__() got an unexpected keyword argument 'cfg'
Due to this bug it is not possible to use a grammar with vLLM at all.
I can only use vLLM, because it has the best throughput and correctness. The GBNF grammar seems to have problems in llama.cpp and the througput is ~10x worse.
Grammar support would help avoid repeated queries due to non-conformance of LLM output, which is an essential efficiency improvement in most real-world tasks.
The text was updated successfully, but these errors were encountered:
Describe the issue as clearly as possible:
vLLM request crashes with
500 Internal Server Error
if thecfg
request parameter is specified.Steps/code to reproduce the bug:
Follow the vLLM tutorial:
https://outlines-dev.github.io/outlines/reference/vllm/
Since the
outline[serve]
installs old versions, install vLLM 0.2.6 and latest outline, plus pydantic 2.0 using pip. (See the actual list of package versions below.)Host this model (24GB VRAM):
deepseek-ai/deepseek-coder-6.7b-instruct
Command:
Request URL:
http://127.0.0.1:8000/generate
Send a request with this body:
The above request will crash with 500 Internal Server Error, see the server side exception below.
The same grammar has been tested OK with Lark without vLLM. The grammar includes optional white-space at the end, because this model seems to require it to stop.
Also, please rename the
cfg
request parameter togrammar
orlark_grammar
, thanks!Error message:
Outlines/Python version information:
Context for the issue:
Due to this bug it is not possible to use a grammar with vLLM at all.
I can only use vLLM, because it has the best throughput and correctness. The GBNF grammar seems to have problems in llama.cpp and the througput is ~10x worse.
Grammar support would help avoid repeated queries due to non-conformance of LLM output, which is an essential efficiency improvement in most real-world tasks.
The text was updated successfully, but these errors were encountered: