Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Running with TransformersModel does not work #414

Open
danielkorat opened this issue Jan 29, 2025 · 6 comments
Open

[BUG] Running with TransformersModel does not work #414

danielkorat opened this issue Jan 29, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@danielkorat
Copy link

Describe the bug
When replacing HfApiModel with TransformersModel in examples/benchmark.ipynb, the eval results for meta-llama/Llama-3.1-8B-Instruct (and various other published models) are far worse than published (scores of less than 5).

Code to reproduce the error
https://github.com/danielkorat/smolagents/blob/transformers/examples/benchmark-transformers.ipynb

Error logs (if any)
Seems like a big part of problem is the parsing of the LLM output (specifically the assistant role):

Image

Also, the regex parsing error arises in nearly all examples.

Expected behavior
Trying to reproduce the results for meta-llama/Llama-3.1-8B-Instruct, as published in the original notebook:

Image

Packages version:

>>> smolagents.__version__
'1.5.0.dev'

Additional context
Add any other context about the problem here.

accelerate==1.3.0
datasets==3.1.0
matplotlib==3.10.0
matplotlib-inline==0.1.7
numpy==1.26.4
seaborn==0.13.2
sentence-transformers==3.3.0
sympy==1.13.1
transformers==4.48.1
@danielkorat danielkorat added the bug Something isn't working label Jan 29, 2025
@danielkorat
Copy link
Author

danielkorat commented Jan 29, 2025

Hi @aymeric-roucher 👋
Note that this means that smolagents can not be used on local deployments right now.

@danielkorat danielkorat changed the title [BUG] benchmarking with TransformersModel does not work [BUG] Running with TransformersModel does not work Jan 29, 2025
@ryantzr1
Copy link

hi @aymeric-roucher @danielkorat 👋

I'm also facing this issue when trying the text_to_sql.py example with TransformersModel() instead of HfApiModel(). The agent fails with the same regex error when generating the SQL query.

Code to reproduce the error
https://github.com/ryantzr1/smolagents/blob/test-sql-example/examples/text_to_sql.py
Error Log
Image

I'm currently trying to build a LlamaCppModels class to allow users to work with llama.cpp models via llama-cpp-python. If I find a fix for TransformersModel(), I'll update.

@nickvdw
Copy link

nickvdw commented Jan 30, 2025

hi @aymeric-roucher @danielkorat 👋

I'm also facing this issue when trying the text_to_sql.py example with TransformersModel() instead of HfApiModel(). The agent fails with the same regex error when generating the SQL query.

Code to reproduce the error https://github.com/ryantzr1/smolagents/blob/test-sql-example/examples/text_to_sql.py Error Log Image

I'm currently trying to build a LlamaCppModels class to allow users to work with llama.cpp models via llama-cpp-python. If I find a fix for TransformersModel(), I'll update.

I've seen such code parsing errors before while using TransformersModel(). For me, including the max_new_tokens (e.g., 4096) argument in the TransformersModel(), seemed to help, as suggested in #201 (comment). However, I'm not sure if it'll also help in your case.

As a side note, it'd be great to have a LlamaCppModels class amongst others (e.g., vLLM, ONNXRuntime, etc).

@aymeric-roucher
Copy link
Collaborator

Thank you folks for reporting!
I'd go with @nickvdw to say that the generation was interrupted, which would be prevented with a higher max_new_tokens parameter!

@danielkorat
Copy link
Author

Thank you folks for reporting! I'd go with @nickvdw to say that the generation was interrupted, which would be prevented with a higher max_new_tokens parameter!

FYI I tried that but it still did not solve the issue

@matfrei
Copy link

matfrei commented Feb 6, 2025

I encountered the same issue today. After a little digging, the problem seems to be that TransformerModel does not set a default parameter for max_new_tokens. Therefore the transformers default is used, which at 20 tokens is really low for any agentic task.

Explicitly passing max_new_tokens to the TransformersModel() constructor as @aymeric-roucher and @nickvdw have suggested sure helps, but to avoid people getting caught up in this issue, it might be nice to set a default value of, say 4096 tokens here (or maybe in self.kwargs["max_tokens"] in the constructor, but that's a bit less transparent).

Happy to submit a small pull request if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants