[DSV4] DeepSeekV4 Pro#2858
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds a new standalone example script ChangesDeepSeek V4 Pro PTQ Example
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new example script, deepseek_v4_pro_example.py, for quantizing the DeepSeek-V4 model using llmcompressor with FP8 and NVFP4 schemes. Feedback on the changes highlights several critical improvements to make the script reusable and functional: avoiding hardcoded absolute local paths for the model and offload directories, loading the tokenizer dynamically from the model ID, removing commented-out debug code, and uncommenting the recipe parameter in the oneshot function so that the quantization scheme is actually applied.
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" | ||
| MODEL_ID = "/mnt/nvme-data/engine/kylesayrs/DeepSeek-V4-Pro-BF16" | ||
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" |
There was a problem hiding this comment.
Avoid hardcoding absolute local paths specific to a user environment and clean up duplicate commented-out lines. Consider using a Hugging Face model ID or a configurable environment variable/parameter.
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" | |
| MODEL_ID = "/mnt/nvme-data/engine/kylesayrs/DeepSeek-V4-Pro-BF16" | |
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" | |
| MODEL_ID = "deepseek-ai/DeepSeek-V4" # Replace with your model ID or local path |
| MODEL_ID, | ||
| device_map="auto_offload", | ||
| max_memory={}, | ||
| offload_folder="/mnt/nvme-data/engine/kylesayrs/offload_folder", |
| model=model, | ||
| processor=tokenizer, | ||
| dataset=ds, | ||
| # recipe=recipe, |
| # kluge for the way I saved the decompressed checkpoint | ||
| # mds = model.model.layers[-1].self_attn.wq_a._hf_hook.weights_map.dataset.index | ||
| # mds["model.hc_head.base"] = mds['model.hc_head.hc_base'] | ||
| # mds["model.hc_head.fn"] = mds['model.hc_head.hc_fn'] | ||
| # mds["model.hc_head.scale"] = mds['model.hc_head.hc_scale'] |
| # mds["model.hc_head.fn"] = mds['model.hc_head.hc_fn'] | ||
| # mds["model.hc_head.scale"] = mds['model.hc_head.hc_scale'] | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained("RedHatAI/DeepSeek-V4-Flash-BF16") |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
examples/quantizing_moe/deepseek_v4_pro_example.py (3)
81-88: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider adding a comment to explain
add_special_tokens=False.For educational clarity, it would help users to understand why
add_special_tokens=Falseis used here (since special tokens like BOS/EOS were already added in thepreprocessfunction).📝 Suggested improvement
def tokenize(sample): return tokenizer( sample["text"], padding=False, max_length=MAX_SEQUENCE_LENGTH, truncation=True, - add_special_tokens=False, + add_special_tokens=False, # Special tokens already added in preprocess() )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 81 - 88, In the tokenize function, add a comment above or next to the add_special_tokens=False parameter to explain why special tokens are not being added here. The comment should clarify that special tokens like BOS/EOS were already added during preprocessing in the preprocess function, so they should not be added again during tokenization to avoid duplication.Source: Path instructions
30-30: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider removing the empty
max_memoryparameter or add explanation.The
max_memory={}parameter is empty and appears to have no effect. For educational clarity, either:
- Remove it if not needed
- Provide an example value with a comment explaining how users can limit per-device memory allocation
Example with explanation:
max_memory={0: "40GB", 1: "40GB", "cpu": "100GB"}, # Optional: limit per-device memory🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/quantizing_moe/deepseek_v4_pro_example.py` at line 30, The `max_memory={}` parameter being passed in the function call is empty and has no practical effect. Either remove this parameter entirely since an empty dictionary serves no purpose, or replace it with a meaningful example that demonstrates how users can limit per-device memory allocation (such as specifying device IDs and memory limits like {0: "40GB", 1: "40GB", "cpu": "100GB"}) and add a comment explaining the parameter's purpose for educational clarity.Source: Path instructions
104-121: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider minor clarifications to the recipe configuration.
The quantization recipe is well-structured, but could benefit from small improvements:
Line 109: The target
r"re:.*attn\.compressor\.indexer\.q_b_proj$"is not mentioned in the comments above (lines 96-102). Adding it to the comments would help users understand the full scope.Line 120: The explicit
ignore=[]parameter is unnecessary when empty and can be removed for cleaner code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 104 - 121, Add a clarifying comment above the QuantizationModifier to document the compressor indexer target pattern `r"re:.*attn\.compressor\.indexer\.q_b_proj$"` that appears in the attention config_groups, explaining its purpose alongside the existing attention projection targets. Additionally, remove the empty `ignore=[]` parameter from the QuantizationModifier initialization as it is unnecessary when not explicitly needed.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/quantizing_moe/deepseek_v4_pro_example.py`:
- Around line 16-19: The comment for the
DeepseekV4PreTrainedModel._keep_in_fp32_modules_strict workaround lacks
sufficient context for educational purposes. Enhance the existing comment to
include a comprehensive explanation of the upstream bug being worked around,
link to relevant GitHub issues or pull requests that document this bug, and add
a TODO comment specifying the conditions or library version when this workaround
can be removed. This will help users understand why the workaround is necessary
and when it becomes obsolete.
- Around line 34-38: Remove the commented-out "kluge" workaround code block and
its explanatory comment from the example file. Delete the entire section
starting with the comment "# kluge for the way I saved the decompressed
checkpoint" and all subsequent commented-out lines that follow it which
manipulate the mds weights_map dataset index. This developer-specific workaround
code is not relevant to users learning from the example and should be removed to
keep the example clean and production-ready.
- Around line 126-139: The recipe parameter in the oneshot() function call is
commented out, preventing the quantization recipe from being applied during the
calibration process. Uncomment the recipe=recipe, parameter in the oneshot()
function call to ensure the quantization recipe defined earlier is actually
used. Additionally, improve the comments above and around the
propagate_error=False parameter to clearly explain what the workaround addresses
and when users might encounter similar issues with transformers cache behavior.
- Around line 21-32: Replace the hardcoded local file system paths with publicly
accessible alternatives in the model loading section. Change the MODEL_ID
variable from the local path to use the publicly available Hugging Face model
identifier that is already commented out, and replace the hardcoded
offload_folder path in the AutoModelForCausalLM.from_pretrained call with either
a generic temporary directory path or make it configurable through a variable
that users can easily modify. This will make the example script runnable for end
users without requiring access to specific developer machine paths.
- Around line 59-75: In the preprocess function, add a comment above or within
the assistant role handling block (the elif role == "assistant": section) to
explain that the </think> token is part of DeepSeek-V4's chain-of-thought
reasoning format for educational clarity. Additionally, add error handling to
the function to validate that each message in example["messages"] contains both
"role" and "content" fields before accessing them, either by wrapping the
message processing loop in a try-except block or by adding explicit validation
checks to handle missing or malformed message fields gracefully.
- Line 40: The AutoTokenizer.from_pretrained call hardcodes a model path
"RedHatAI/DeepSeek-V4-Flash-BF16" that differs from the model being used in the
example. Either replace the hardcoded string with the MODEL_ID variable to
ensure consistency throughout the example, or if the Flash and Pro model
tokenizers are intentionally the same, add a clear comment explaining why
different model identifiers are used for the model and tokenizer respectively.
---
Nitpick comments:
In `@examples/quantizing_moe/deepseek_v4_pro_example.py`:
- Around line 81-88: In the tokenize function, add a comment above or next to
the add_special_tokens=False parameter to explain why special tokens are not
being added here. The comment should clarify that special tokens like BOS/EOS
were already added during preprocessing in the preprocess function, so they
should not be added again during tokenization to avoid duplication.
- Line 30: The `max_memory={}` parameter being passed in the function call is
empty and has no practical effect. Either remove this parameter entirely since
an empty dictionary serves no purpose, or replace it with a meaningful example
that demonstrates how users can limit per-device memory allocation (such as
specifying device IDs and memory limits like {0: "40GB", 1: "40GB", "cpu":
"100GB"}) and add a comment explaining the parameter's purpose for educational
clarity.
- Around line 104-121: Add a clarifying comment above the QuantizationModifier
to document the compressor indexer target pattern
`r"re:.*attn\.compressor\.indexer\.q_b_proj$"` that appears in the attention
config_groups, explaining its purpose alongside the existing attention
projection targets. Additionally, remove the empty `ignore=[]` parameter from
the QuantizationModifier initialization as it is unnecessary when not explicitly
needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f787e13-b61c-4a72-bac3-e71a8952c37c
📒 Files selected for processing (1)
examples/quantizing_moe/deepseek_v4_pro_example.py
| # Upstream BUG: norms should be loaded in float32, but usually aren't due to the base | ||
| # model having a quant_config which overrides this. Loading in float32 actually | ||
| # breaks the model definition (it expects bfloat16). Let's force load in bfloat16. | ||
| DeepseekV4PreTrainedModel._keep_in_fp32_modules_strict = set() |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚖️ Poor tradeoff
Add detailed explanation or remove the fragile workaround.
Modifying a private class attribute (_keep_in_fp32_modules_strict) in an example script is problematic for educational purposes. Users learning from this example won't understand:
- Why this workaround is necessary
- What "upstream BUG" it addresses
- When they need to apply it
- Whether it will work with future library versions
Either provide a comprehensive explanation with links to relevant issues/PRs, or if this is truly a temporary workaround for incomplete dependencies (as noted in PR objectives), add a clear TODO comment explaining when this can be removed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 16 - 19, The
comment for the DeepseekV4PreTrainedModel._keep_in_fp32_modules_strict
workaround lacks sufficient context for educational purposes. Enhance the
existing comment to include a comprehensive explanation of the upstream bug
being worked around, link to relevant GitHub issues or pull requests that
document this bug, and add a TODO comment specifying the conditions or library
version when this workaround can be removed. This will help users understand why
the workaround is necessary and when it becomes obsolete.
Source: Path instructions
| # Select model and load it. | ||
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" | ||
| MODEL_ID = "/mnt/nvme-data/engine/kylesayrs/DeepSeek-V4-Pro-BF16" | ||
| # MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" | ||
|
|
||
| with load_context(): | ||
| model = AutoModelForCausalLM.from_pretrained( | ||
| MODEL_ID, | ||
| device_map="auto_offload", | ||
| max_memory={}, | ||
| offload_folder="/mnt/nvme-data/engine/kylesayrs/offload_folder", | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win
Replace hardcoded local paths with publicly accessible model IDs.
The example uses developer-specific local paths that make it non-runnable for users:
- Line 23:
/mnt/nvme-data/engine/kylesayrs/DeepSeek-V4-Pro-BF16 - Line 31:
/mnt/nvme-data/engine/kylesayrs/offload_folder
For an example script, these should use publicly available Hugging Face model IDs (like the commented-out "RedHatAI/DeepSeek-V4-Flash-BF16") and a generic offload folder path or make it configurable.
📝 Suggested fix
-# Select model and load it.
-# MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16"
-MODEL_ID = "/mnt/nvme-data/engine/kylesayrs/DeepSeek-V4-Pro-BF16"
-# MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16"
+# Select model and load it. You can also use local paths if you have the model downloaded.
+MODEL_ID = "RedHatAI/DeepSeek-V4-Flash-BF16" # or "deepseek-ai/DeepSeek-V4-Pro"
+OFFLOAD_FOLDER = "./offload_folder" # Directory for offloading model layers
with load_context():
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
device_map="auto_offload",
max_memory={},
- offload_folder="/mnt/nvme-data/engine/kylesayrs/offload_folder",
+ offload_folder=OFFLOAD_FOLDER,
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 21 - 32,
Replace the hardcoded local file system paths with publicly accessible
alternatives in the model loading section. Change the MODEL_ID variable from the
local path to use the publicly available Hugging Face model identifier that is
already commented out, and replace the hardcoded offload_folder path in the
AutoModelForCausalLM.from_pretrained call with either a generic temporary
directory path or make it configurable through a variable that users can easily
modify. This will make the example script runnable for end users without
requiring access to specific developer machine paths.
Source: Path instructions
| # kluge for the way I saved the decompressed checkpoint | ||
| # mds = model.model.layers[-1].self_attn.wq_a._hf_hook.weights_map.dataset.index | ||
| # mds["model.hc_head.base"] = mds['model.hc_head.hc_base'] | ||
| # mds["model.hc_head.fn"] = mds['model.hc_head.hc_fn'] | ||
| # mds["model.hc_head.scale"] = mds['model.hc_head.hc_scale'] |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Remove commented-out "kluge" code.
These commented lines describe a developer-specific workaround ("kluge") that has no educational value for users learning the library. Example scripts should be clean and production-ready to demonstrate best practices.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 34 - 38,
Remove the commented-out "kluge" workaround code block and its explanatory
comment from the example file. Delete the entire section starting with the
comment "# kluge for the way I saved the decompressed checkpoint" and all
subsequent commented-out lines that follow it which manipulate the mds
weights_map dataset index. This developer-specific workaround code is not
relevant to users learning from the example and should be removed to keep the
example clean and production-ready.
Source: Path instructions
| # mds["model.hc_head.fn"] = mds['model.hc_head.hc_fn'] | ||
| # mds["model.hc_head.scale"] = mds['model.hc_head.hc_scale'] | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained("RedHatAI/DeepSeek-V4-Flash-BF16") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use consistent model ID for tokenizer or explain the mismatch.
The tokenizer is loaded from "RedHatAI/DeepSeek-V4-Flash-BF16" while the model uses a different path. For clarity and correctness in an example:
- If the tokenizers are identical, use the same
MODEL_IDvariable - If they differ intentionally, add a comment explaining why (e.g., if Flash and Pro models share the same tokenizer)
🔧 Suggested fix
-tokenizer = AutoTokenizer.from_pretrained("RedHatAI/DeepSeek-V4-Flash-BF16")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tokenizer = AutoTokenizer.from_pretrained("RedHatAI/DeepSeek-V4-Flash-BF16") | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` at line 40, The
AutoTokenizer.from_pretrained call hardcodes a model path
"RedHatAI/DeepSeek-V4-Flash-BF16" that differs from the model being used in the
example. Either replace the hardcoded string with the MODEL_ID variable to
ensure consistency throughout the example, or if the Flash and Pro model
tokenizers are intentionally the same, add a clear comment explaining why
different model identifiers are used for the model and tokenizer respectively.
Source: Path instructions
| def preprocess(example): | ||
| # DeepSeek-V4 does not have a traditional chat template. | ||
| # Encode manually per https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/main/encoding | ||
| BOS = "<|begin▁of▁sentence|>" | ||
| EOS = "<|end▁of▁sentence|>" | ||
| text = BOS | ||
| for message in example["messages"]: | ||
| role = message["role"] | ||
| content = message["content"] | ||
| if role == "system": | ||
| text += content | ||
| elif role == "user": | ||
| text += f"<|User|>{content}" | ||
| elif role == "assistant": | ||
| text += f"<|Assistant|></think>{content}{EOS}" | ||
|
|
||
| return {"text": text} |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Add explanation for the </think> token and consider error handling.
The preprocessing function has a few educational clarity issues:
-
Unexplained token: Line 73 includes
</think>without explanation. Users should understand this is part of DeepSeek-V4's chain-of-thought format. -
No error handling: The function assumes all messages have "role" and "content" fields. Consider adding validation or a try-except for robustness.
-
Ambiguous characters: The fullwidth vertical lines (|) in special tokens are correct for DeepSeek-V4's format and can be safely ignored from the Ruff warnings.
📚 Suggested improvements
def preprocess(example):
- # DeepSeek-V4 does not have a traditional chat template.
- # Encode manually per https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/main/encoding
+ """
+ DeepSeek-V4 does not have a traditional chat template.
+ Encode manually per https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/main/encoding
+
+ Note: DeepSeek-V4 uses chain-of-thought reasoning. The </think> token marks
+ the end of the model's internal reasoning before providing the final answer.
+ """
BOS = "<|begin▁of▁sentence|>"
EOS = "<|end▁of▁sentence|>"
text = BOS
for message in example["messages"]:
role = message["role"]
content = message["content"]
if role == "system":
text += content
elif role == "user":
text += f"<|User|>{content}"
elif role == "assistant":
text += f"<|Assistant|></think>{content}{EOS}"
+ else:
+ # Skip unknown roles
+ continue
return {"text": text}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def preprocess(example): | |
| # DeepSeek-V4 does not have a traditional chat template. | |
| # Encode manually per https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/main/encoding | |
| BOS = "<|begin▁of▁sentence|>" | |
| EOS = "<|end▁of▁sentence|>" | |
| text = BOS | |
| for message in example["messages"]: | |
| role = message["role"] | |
| content = message["content"] | |
| if role == "system": | |
| text += content | |
| elif role == "user": | |
| text += f"<|User|>{content}" | |
| elif role == "assistant": | |
| text += f"<|Assistant|></think>{content}{EOS}" | |
| return {"text": text} | |
| def preprocess(example): | |
| """ | |
| DeepSeek-V4 does not have a traditional chat template. | |
| Encode manually per https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/main/encoding | |
| Note: DeepSeek-V4 uses chain-of-thought reasoning. The </think> token marks | |
| the end of the model's internal reasoning before providing the final answer. | |
| """ | |
| BOS = "<|begin▁of▁sentence|>" | |
| EOS = "<|end▁of▁sentence|>" | |
| text = BOS | |
| for message in example["messages"]: | |
| role = message["role"] | |
| content = message["content"] | |
| if role == "system": | |
| text += content | |
| elif role == "user": | |
| text += f"<|User|>{content}" | |
| elif role == "assistant": | |
| text += f"<|Assistant|></think>{content}{EOS}" | |
| else: | |
| # Skip unknown roles | |
| continue | |
| return {"text": text} |
🧰 Tools
🪛 Ruff (0.15.18)
[warning] 62-62: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 62-62: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 63-63: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 63-63: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 71-71: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 71-71: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 73-73: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
[warning] 73-73: String contains ambiguous | (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?
(RUF001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 59 - 75, In
the preprocess function, add a comment above or within the assistant role
handling block (the elif role == "assistant": section) to explain that the
</think> token is part of DeepSeek-V4's chain-of-thought reasoning format for
educational clarity. Additionally, add error handling to the function to
validate that each message in example["messages"] contains both "role" and
"content" fields before accessing them, either by wrapping the message
processing loop in a try-except block or by adding explicit validation checks to
handle missing or malformed message fields gracefully.
Sources: Path instructions, Linters/SAST tools
| oneshot( | ||
| model=model, | ||
| processor=tokenizer, | ||
| dataset=ds, | ||
| # recipe=recipe, | ||
| pipeline="sequential", | ||
| max_seq_length=MAX_SEQUENCE_LENGTH, | ||
| num_calibration_samples=NUM_CALIBRATION_SAMPLES, | ||
| sequential_targets=["DeepseekV4DecoderLayer"], | ||
| batch_size=1, | ||
| shuffle_calibration_samples=True, | ||
| propagate_error=False, # work around reliance on transformers cache | ||
| # something weird happens with the cache and propagation | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win
CRITICAL: The recipe parameter is commented out - quantization will not be applied.
Line 130 has # recipe=recipe, commented out, which means the quantization recipe defined in lines 104-121 is never used. Without this parameter, the oneshot() call will not apply any quantization, defeating the entire purpose of this example.
Additionally:
- Line 137-138: The workaround comments for
propagate_error=Falseare incomplete and don't help users understand when or why they might need this parameter.
🐛 Required fix
oneshot(
model=model,
processor=tokenizer,
dataset=ds,
- # recipe=recipe,
+ recipe=recipe,
pipeline="sequential",
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
sequential_targets=["DeepseekV4DecoderLayer"],
batch_size=1,
shuffle_calibration_samples=True,
- propagate_error=False, # work around reliance on transformers cache
- # something weird happens with the cache and propagation
+ # Note: propagate_error=False works around an issue with transformers cache
+ # during sequential quantization of DeepSeek-V4 models
+ propagate_error=False,
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/quantizing_moe/deepseek_v4_pro_example.py` around lines 126 - 139,
The recipe parameter in the oneshot() function call is commented out, preventing
the quantization recipe from being applied during the calibration process.
Uncomment the recipe=recipe, parameter in the oneshot() function call to ensure
the quantization recipe defined earlier is actually used. Additionally, improve
the comments above and around the propagate_error=False parameter to clearly
explain what the workaround addresses and when users might encounter similar
issues with transformers cache behavior.
Source: Path instructions
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
The quality checks have failed. Please run |
Prerequisites