[DeepSeek R1] Qwen2.5 Distillations #2236

DavidLandup0 · 2025-04-29T09:20:12Z

This PR adds a distinct family of Qwen2.5 models, distilled from DeepSeek-R1:

While technically distillations, DeepSeek's configurations make changes to the tokenizer config and preprocessing flow. To avoid the flag-based slippery slope of adding overriding configs to existing Qwen models, as well as to complement #2171, we separate the tokenizer and preprocessor, adding the distinct changes introduced with DeepSeek-R1's distillation as separate classes and files.

Example Usage

Google Colab

2-line setup/prompt on Google Colab:

https://colab.research.google.com/drive/1Lgt7N29MvmZ8lSbRnALDMX9-sVqv63X9?usp=sharing

Keras-Hub

Python 3.9.9 (v3.9.9:ccb0e6a345, Nov 15 2021, 13:06:05) 
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
>>> import keras_hub
>>> hf_preset = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
>>> keras_hub_model = keras_hub.models.DeepSeekR1QwenCausalLM.from_preset(f"hf://{hf_preset}")
>>> keras_hub_model.generate("What is Keras?", max_length=24)
'What is Keras? Explain its applications?\nWhat is TensorFlow? Explain its applications?\nAlso Explain TensorFlow.js Applications.\n'

HuggingFace Equivalent

>>> hf_preset = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
>>> deepseek_qwen = AutoModelForCausalLM.from_pretrained(hf_preset)
>>> deepseek_qwen_tokenizer = AutoTokenizer.from_pretrained(hf_preset)

>>> inputs = deepseek_qwen_tokenizer(["What is Keras?"], return_tensors="pt")
>>> outputs = deepseek_qwen.generate(**inputs, max_new_tokens=24)
>>> deepseek_qwen_tokenizer.decode(outputs[0])

'<｜begin▁of▁sentence｜>What is Keras? What is its purpose? What is Keras used for? What is Keras used for in practice? What is K'

Numerical Equivalency

Currently, there seems to be noise in the numerics/weights when naively converting. Still looking into why this is happening.
Though, they're generally comparable. For example, taking the mean across the first axis of the lm_head (called token_embedding in KerasHub - we see a fairly similar profile, but not numerical equivalency:

>>> ax.plot(keras_hub_model.backbone.token_embedding.get_weights()[0].mean(axis=0), label='KH', alpha=0.5)
>>> ax.plot(deepseek_qwen.lm_head.weight.mean(axis=0).detach().numpy(), label='HF', alpha=0.5)

This doesn't seem to affect the outputs that much though, as seen above in the responses. I'll investigate further into why these discrepancies arise - since they should be directly loading the weights as they are into the structure of the model's components.

pass-lin · 2025-05-03T07:54:47Z

Here's a little suggestion: you can try testing your performance on the math dataset. If the final results are comparable to those achieved by vllm, we can ignore this error.

mattdangerw · 2025-05-06T18:50:03Z

@DavidLandup0 sounds like what we really need here is the ability to combine a QwenBackbone with a DeepSeek tokenizer? If so, I think we might be able to relax our requirements so a high level task (e.g. the QwenCausalLM could be using a tokenizer from DeepSeek). This is something I have thought we probably need anyway.

I'll try to make a PR showing the basic loading changes, but lmk what you think conceptually!

DavidLandup0 · 2025-05-25T11:35:36Z

@DavidLandup0 sounds like what we really need here is the ability to combine a QwenBackbone with a DeepSeek tokenizer? If so, I think we might be able to relax our requirements so a high level task (e.g. the QwenCausalLM could be using a tokenizer from DeepSeek). This is something I have thought we probably need anyway.

I'll try to make a PR showing the basic loading changes, but lmk what you think conceptually!

Fundamentally, yes. We're looking to switch up the tokenizer for an existing workflow/backbone. Having a general builder where you can do arbitrary tokenizers and backbones would be beneficial across the board since it's not uncommon for people to mix-and-match tokenizers with models.

To unblock this PR - what do you think about going forward with the API that we have right now, and then updating this when we allow mix-and-match? As-is, we only have a single file (i.e. DeepSeekQwen2 model) duplicated, so it's easy to switch once we support the new feature.

DavidLandup0 · 2025-06-22T08:01:24Z

cc/ @mattdangerw @divyashreepathihalli

divyashreepathihalli · 2025-07-11T00:04:15Z

/gemini review

gemini-code-assist

Code Review

This PR introduces a new family of Qwen2.5 models distilled from DeepSeek-R1, including separate tokenizer and preprocessor classes. The code includes new model, preprocessor, and tokenizer files for the DeepSeek-R1-Distill-Qwen models. There are duplicate imports in keras_hub/api/models/__init__.py and keras_hub/api/tokenizers/__init__.py that should be addressed.

gemini-code-assist · 2025-07-11T00:05:27Z

keras_hub/api/models/__init__.py

+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import (
+    DeepSeekR1QwenCausalLM as DeepSeekR1Qwen2CausalLM,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import (
+    DeepSeekR1QwenCausalLM as DeepSeekR1QwenCausalLM,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import (
+    DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1Qwen2CausalLMPreprocessor,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import (
+    DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1QwenCausalLMPreprocessor,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (
+    DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (
+    DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer,


There appear to be duplicate imports with different aliases. DeepSeekR1QwenCausalLM and DeepSeekR1QwenCausalLMPreprocessor and DeepSeekR1QwenTokenizer are imported twice, once with an alias ending in 2 and once without. This can lead to confusion and potential errors if the wrong alias is used. Consider removing the duplicate imports or ensuring the aliases are used consistently.

Suggested change

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import (

DeepSeekR1QwenCausalLM as DeepSeekR1Qwen2CausalLM,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import (

DeepSeekR1QwenCausalLM as DeepSeekR1QwenCausalLM,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import (

DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1Qwen2CausalLMPreprocessor,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import (

DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1QwenCausalLMPreprocessor,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer,

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import (

DeepSeekR1QwenCausalLM

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import (

DeepSeekR1QwenCausalLMPreprocessor

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer

)

gemini-code-assist · 2025-07-11T00:05:28Z

keras_hub/api/tokenizers/__init__.py

+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (
+    DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer,
+)
+from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (
+    DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer,


Similar to the models/__init__.py file, there appear to be duplicate imports with different aliases. DeepSeekR1QwenTokenizer is imported twice, once with an alias ending in 2 and once without. This can lead to confusion and potential errors if the wrong alias is used. Consider removing the duplicate imports or ensuring the aliases are used consistently.

Suggested change

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer,

)

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer,

from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import (

DeepSeekR1QwenTokenizer

)

DavidLandup0 added 6 commits April 29, 2025 15:28

Add DeepSeekR1-Qwen preprocessor, tokenizer and conversion script

d3d6164

Fix typo, add sanity check

169ec14

Merge master branch into feature branch

eece281

Remove commented code

3b22383

Remove prints

0ee0033

Shorten name for E501

05d500f

DavidLandup0 requested review from mattdangerw and divyashreepathihalli April 29, 2025 09:39

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

sachinprasadhs added this to KerasHub Jul 16, 2025

sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DeepSeek R1] Qwen2.5 Distillations #2236

[DeepSeek R1] Qwen2.5 Distillations #2236

Uh oh!

DavidLandup0 commented Apr 29, 2025 •

edited

Loading

Uh oh!

pass-lin commented May 3, 2025

Uh oh!

mattdangerw commented May 6, 2025

Uh oh!

DavidLandup0 commented May 25, 2025

Uh oh!

DavidLandup0 commented Jun 22, 2025

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Uh oh!

gemini-code-assist bot Jul 11, 2025

Uh oh!

Uh oh!

[DeepSeek R1] Qwen2.5 Distillations #2236

Are you sure you want to change the base?

[DeepSeek R1] Qwen2.5 Distillations #2236

Uh oh!

Conversation

DavidLandup0 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example Usage

Google Colab

Keras-Hub

HuggingFace Equivalent

Numerical Equivalency

Uh oh!

pass-lin commented May 3, 2025

Uh oh!

mattdangerw commented May 6, 2025

Uh oh!

DavidLandup0 commented May 25, 2025

Uh oh!

DavidLandup0 commented Jun 22, 2025

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DavidLandup0 commented Apr 29, 2025 •

edited

Loading