Add support for gemma3-text by thewh1teagle · Pull Request #70 · huggingface/optimum-onnx

thewh1teagle · 2025-10-04T21:42:21Z

Added support for gemma3-text following the code in:

[Gemma3] Add VLM support (need help) #50

also added a working example with gemma3-270m-instruct

will update and improve as needed.

thewh1teagle · 2025-10-04T22:12:58Z

Happy to add more examples if they’re welcomed. for example, this one is hugely useful for fine-tuned models with LoRA

Details

"""Simple example: Export Gemma3 270M with LoRA adapter to ONNX and generate text.

Usage:
    uv pip install onnxruntime peft
    uv run examples/gemma3.py
"""

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

from optimum.exporters.onnx import onnx_export_from_model
from optimum.onnxruntime import ORTModelForCausalLM
import time


# Load base model and merge with LoRA adapter
base_model_id = "google/gemma-3-270m-it"  # The base model for your LoRA
adapter_id = "thewh1teagle/gemma3-heb-g2p"

base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.merge_and_unload()  # Merge LoRA weights into base model

tokenizer = AutoTokenizer.from_pretrained(adapter_id)

# Export merged model to ONNX
print("Exporting to ONNX...")
output_dir = "gemma3_onnx"
onnx_export_from_model(
    model=model,
    output=output_dir,
    task="text-generation-with-past"
)

# Save tokenizer to the same directory
tokenizer.save_pretrained(output_dir)

# Load the exported ONNX model
ort_model = ORTModelForCausalLM.from_pretrained(output_dir)

# Chat with instruction-tuned model
system_message = """Given the following Hebrew sentence, convert it to IPA phonemes.
Input Format: A Hebrew sentence.
Output Format: A string of IPA phonemes.
"""

user_prompt = "אז מה דעתך, האם אתה יודע לדבר עברית גם כמו שאני יודע לדבר או שאתה לא?"

conversation = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

# Generate with parameters similar to the working Ollama script
start_time = time.time()
outputs = ort_model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.9,
    top_p=0.95,
    top_k=64,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.convert_tokens_to_ids(["<end_of_turn>", "</s>"])
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract only the model's response (after the last "model" turn)
if "<start_of_turn>model" in response:
    response = response.split("<start_of_turn>model")[-1].strip()
    # Remove any end tokens
    for end_token in ["<end_of_turn>", "</s>"]:
        response = response.replace(end_token, "")

print(response.strip())

print(f"Time taken: {time.time() - start_time:.2f} seconds")

bil-ash · 2025-10-05T02:14:18Z

Looking forward to gemma3n multimodal support

IlyasMoutawwakil · 2025-10-05T07:11:33Z

Thanks for the addition ! I don't think an example script is the best way, maybe it would be better to add the snippet in the documentation under a relevant section or make it into a notebook can also be very useful !
Can you please add the model types gemma3 and gemma3_text in the testing files (tests/onnxruntime/test_decoder.py, tests/onnxruntime/testing_utils.py and tests/exporters/onnx) along with a tiny model id from the hub for testing.

IlyasMoutawwakil · 2025-10-05T11:27:13Z

            CohereRotaryEmbedding.forward = self.original_forward
+
+
+class Gemma3LMModelPatcher(DecoderModelPatcher):


did you try exporting without this patcher ? (it might not be necessary for text only generation)

yup, u right
I removed them and tests are working

IlyasMoutawwakil · 2025-10-05T11:27:42Z

no need for the lock file 🤗

thewh1teagle · 2025-10-15T03:28:39Z

@IlyasMoutawwakil
I think a lot of people would benefit from an examples folder with sample scripts. it’s a big win for dev experience. why not add it?
I don’t have much time to keep working on this PR so just checking in advance, does it require any more work beyond what you mentioned in your last comment?

IlyasMoutawwakil · 2025-10-16T07:19:34Z

@thewh1teagle

I think a lot of people would benefit from an examples folder with sample scripts.

yeah ofc feel free, my proposition is to simply put it in the docs for better viz

does it require any more work beyond what you mentioned in your last comment?

yes needs to add testing (you can see how it's added in https://github.com/huggingface/optimum-onnx/pull/43/files 🤗)

fosple · 2025-10-16T21:54:57Z

@thewh1teagle
I just found this. Maybe this helps with the implementation:
Convert_Gemma_3_270M_to_ONNX.ipynb

Build script from Xenova:
build_gemma.py

simplify the example

thewh1teagle · 2025-10-18T03:28:59Z

@IlyasMoutawwakil
Since optimum-onnx doesn't have official website (maybe somewhat docs in HF) having examples folder is great and it does have great visibility. people tend to look for it in my experience.

thewh1teagle · 2025-10-18T03:37:53Z

Added tests and verified with

uv run --extra tests --extra onnxruntime pytest tests/onnxruntime/test_decoder.py -k "gemma3" -v

thewh1teagle · 2025-10-18T03:49:11Z

@IlyasMoutawwakil
Also I added a mention about the examples folder in the main readme. feel free to modify anything as you prefer.

IlyasMoutawwakil · 2025-10-18T07:10:13Z

Since optimum-onnx doesn't have official website (maybe somewhat docs in HF) having examples folder is great and it does have great visibility. people tend to look for it in my experience.

We do have docs 😥 the repo's mian page links to it right under the description and also in the readme if you click on "Documentation"
https://huggingface.co/docs/optimum-onnx/en/quickstart

HuggingFaceDocBuilderDev · 2025-10-18T07:12:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

thewh1teagle · 2025-10-19T01:55:27Z

@IlyasMoutawwakil
Maybe it's not something about optimum-onnx itself, the docs template in huggingfae is just not friendly in general

many unclear navigation buttons

IlyasMoutawwakil · 2025-10-21T09:57:41Z

@thewh1teagle what's not clear exactly ? btw it's open source so contributions are welcome https://github.com/huggingface/doc-builder

IlyasMoutawwakil · 2025-10-21T10:00:24Z

@bot \style

geraldstanje1 · 2025-10-22T01:01:54Z

hi @thewh1teagle does that also work with gemma3 4b model?

thewh1teagle · 2025-10-22T01:20:00Z

@IlyasMoutawwakil my feedback was just to let you know so you can potentially improve it. not a complaint! :) I really appreciate the work you’re doing on this open source library (and open source projects in general from HF)

IlyasMoutawwakil · 2025-10-22T05:41:35Z

+@register_tasks_manager_onnx("gemma3", *COMMON_TEXT_GENERATION_TASKS)
+@register_tasks_manager_onnx("gemma3_text", *COMMON_TEXT_GENERATION_TASKS)
+class Gemma3OnnxConfig(GemmaOnnxConfig):
+    MIN_TRANSFORMERS_VERSION = version.parse("4.52.0")


any reason why not 4.50.0 ?
https://github.com/huggingface/transformers/blob/v4.50.0/src/transformers/models/gemma3/modeling_gemma3.py

I bumped to 4.53.0 and added a comment about why

IlyasMoutawwakil · 2025-10-22T06:57:53Z


 @register_tasks_manager_onnx("gemma", *[*COMMON_TEXT_GENERATION_TASKS, "text-classification"])
-class GemmaOnnxConfig(LlamaOnnxConfig):
+class GemmaOnnxConfig(TextDecoderOnnxConfig):


I discovered that gemma models in general don't need the position ids argument

@echarlaix wdyt ? this also removes the need for position ids from gpt_oss and nemotron

IlyasMoutawwakil

Thanks a lot for the contribution ! I made some changes making sure the minimal transformers version passes all tests !

thewh1teagle force-pushed the feat/gemma3-text branch from 97a3e2c to 3f08a26 Compare October 4, 2025 22:00

thewh1teagle mentioned this pull request Oct 4, 2025

Exporting google/gemma-3n-e4b-it language_model (decoder) into ONNX format #56

Closed

IlyasMoutawwakil reviewed Oct 5, 2025

View reviewed changes

thewh1teagle mentioned this pull request Oct 15, 2025

Add onnx inference thewh1teagle/gemma3-g2p#1

Closed

thewh1teagle force-pushed the feat/gemma3-text branch from 3f08a26 to e663633 Compare October 15, 2025 03:23

IlyasMoutawwakil mentioned this pull request Oct 17, 2025

Exporting Gemma3 4B? #75

Closed

thewh1teagle added 2 commits October 18, 2025 06:24

feat: add support for gemma3-text

9744167

simplify the example

make uv lock like main

9824aec

thewh1teagle force-pushed the feat/gemma3-text branch from 7c35fdf to 9824aec Compare October 18, 2025 03:24

add gemma3 text tests

9a2a513

thewh1teagle added 2 commits October 18, 2025 06:45

remove patcher for gemma3 text

4f43910

Update README.md to include examples section for ONNX model usage

9de0ce9

IlyasMoutawwakil reviewed Oct 21, 2025

View reviewed changes

Comment thread tests/exporters/onnx/utils_tests.py Outdated

Apply suggestion from @IlyasMoutawwakil

1d70ef1

IlyasMoutawwakil reviewed Oct 22, 2025

View reviewed changes

Comment thread optimum/exporters/onnx/model_configs.py Outdated

Apply suggestion from @IlyasMoutawwakil

ab9669c

IlyasMoutawwakil reviewed Oct 22, 2025

View reviewed changes

IlyasMoutawwakil and others added 5 commits October 22, 2025 07:44

Add gemma3 and gemma3_text to model list

82c5a7d

Update optimum/exporters/onnx/model_configs.py

14cec9f

Merge branch 'main' into pr/thewh1teagle/70

7f1249f

gemma models don't really need position ids

801afdb

hybrid cache exception

7c86839

IlyasMoutawwakil reviewed Oct 22, 2025

View reviewed changes

gpt_oss and nemotron follow

94b7645

IlyasMoutawwakil approved these changes Oct 22, 2025

View reviewed changes

IlyasMoutawwakil merged commit f5df6b5 into huggingface:main Oct 22, 2025
29 of 36 checks passed

This was referenced Oct 22, 2025

feat: Add native Gemma3 support for ONNX export #87

Closed

Exporting Gemma-3 models to ONNX is broken #45

Closed

[Gemma3] Add VLM support (need help) #50

Open

		CohereRotaryEmbedding.forward = self.original_forward


		class Gemma3LMModelPatcher(DecoderModelPatcher):

Conversation

thewh1teagle commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thewh1teagle commented Oct 4, 2025

Uh oh!

bil-ash commented Oct 5, 2025

Uh oh!

IlyasMoutawwakil commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

thewh1teagle Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

thewh1teagle Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

thewh1teagle commented Oct 15, 2025

Uh oh!

IlyasMoutawwakil commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fosple commented Oct 16, 2025

Uh oh!

thewh1teagle commented Oct 18, 2025

Uh oh!

thewh1teagle commented Oct 18, 2025

Uh oh!

thewh1teagle commented Oct 18, 2025

Uh oh!

IlyasMoutawwakil commented Oct 18, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 18, 2025

Uh oh!

thewh1teagle commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil commented Oct 21, 2025

Uh oh!

Uh oh!

IlyasMoutawwakil commented Oct 21, 2025

Uh oh!

geraldstanje1 commented Oct 22, 2025

Uh oh!

thewh1teagle commented Oct 22, 2025

Uh oh!

Uh oh!

IlyasMoutawwakil Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

thewh1teagle commented Oct 4, 2025 •

edited

Loading

IlyasMoutawwakil commented Oct 5, 2025 •

edited

Loading

IlyasMoutawwakil commented Oct 16, 2025 •

edited

Loading

thewh1teagle commented Oct 19, 2025 •

edited

Loading

IlyasMoutawwakil Oct 22, 2025 •

edited

Loading

IlyasMoutawwakil Oct 22, 2025 •

edited

Loading