Gemma3 when run with mixed_float16 generates blank response

**Describe the bug**
On Xeon 6, when Gemma3 is executed in fp16, it does not generate any output tokens. This issue is also noticed on nvidia GPU when run with colab. 

**To Reproduce**
We can use this colab to reproduce the issue https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/core/keras_inference.ipynb and enable mixed precision before loading the model

keras.mixed_precision.set_global_policy("mixed_float16")
gemma_lm = keras_hub.models.Gemma3CausalLM.from_preset(
    "gemma3_instruct_4b"
)

After gemma_lm.generate("what is keras in 3 bullet points?", max_length=64) is called, it does not generate any new output tokens.

**Expected behavior**
It should generate relevant output in response to the prompt.

**Additional context**
when changes similar to this HF PR - https://github.com/huggingface/transformers/pull/36832 are applied in keras-hub gemma3 model, it does generate new output. 
If not this, there is a need to fix fp16 generate, so relevant output is generated on HW that support fp16.

**Would you like to help us fix it?**
I ported changes from HF to keras-hub and create a PR for reference


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 when run with mixed_float16 generates blank response #2424

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemma3 when run with mixed_float16 generates blank response #2424

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions