-
Notifications
You must be signed in to change notification settings - Fork 8k
[model] support LFM2.5-Audio with liquid_audio integration #9733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[model] support LFM2.5-Audio with liquid_audio integration #9733
Conversation
Summary of ChangesHello @vovanphuc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the framework's capabilities by introducing comprehensive support for the LiquidAI LFM2.5-Audio model. It integrates the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the LFM2.5-Audio model, including a new data plugin, chat template, and model loader. The implementation is well-structured and follows the existing patterns in the codebase. I've identified a potential correctness issue in how audio sequence lengths are calculated, which might lead to incorrect placeholder expansion for variable-length audios. Additionally, I've noticed the use of magic numbers for token IDs in the model wrapper and a test case that could be strengthened to cover more scenarios. My detailed feedback and suggestions are in the comments below.
src/llamafactory/data/mm_plugin.py
Outdated
| if hasattr(features, "shape"): | ||
| seq_len = (features.shape[-1] - 1) // 8 + 1 | ||
| mm_inputs["audio_seq_lengths"] = [seq_len] * len(audios) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for calculating audio_seq_lengths assumes all audios in a batch have the same length. It computes a single seq_len based on the padded feature length and applies it to all audio files. This can lead to an incorrect number of placeholder tokens for shorter audio files in a batch with variable-length audios.
The fallback path for Hugging Face's feature_extractor is more robust as it uses the attention_mask to determine the actual length of each audio. I recommend a similar approach here. Please check if the liquid_audio processor can return an attention mask or a list of lengths. If not, you might need to compute the sequence lengths based on the lengths of the audios_regularized list before they are padded and passed to the audio_processor.
| self.generation_config = GenerationConfig( | ||
| eos_token_id=config.eos_token_id if hasattr(config, "eos_token_id") else 7, | ||
| pad_token_id=config.pad_token_id if hasattr(config, "pad_token_id") else 0, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GenerationConfig is initialized with hardcoded fallback values for eos_token_id (7) and pad_token_id (0). Using such magic numbers is not ideal as it can lead to subtle bugs if the model's actual token IDs are different. The PR description mentions special tokens with IDs 17, 128, and 129, but not 7 or 0, which makes these defaults more concerning.
It would be more robust to ensure these values are correctly populated from the model's configuration. If these are indeed fixed values for this model family, consider defining them as named constants with explanatory comments.
| self.generation_config = GenerationConfig( | |
| eos_token_id=config.eos_token_id if hasattr(config, "eos_token_id") else 7, | |
| pad_token_id=config.pad_token_id if hasattr(config, "pad_token_id") else 0, | |
| ) | |
| self.generation_config = GenerationConfig( | |
| eos_token_id=getattr(config, "eos_token_id", None), | |
| pad_token_id=getattr(config, "pad_token_id", None), | |
| ) |
- Fix audio seq_lengths calculation to handle variable-length audios (previously assumed all audios had same length) - Add comments documenting magic number token IDs (7=<|im_end|>, 0=<unk>) - Improve test coverage with 3 additional test cases: - Multiple audio placeholders - Text-only messages - get_mm_inputs with no processor
Handle tied weights in depth_embeddings when saving merged model. The embedding.weight and to_logits.weight are shared in each depth embedding layer, causing save_pretrained to fail without this fix.
Add detection of merged/exported models (safetensors format) and load them by first creating base model structure from liquid_audio, then applying the merged weights from safetensors files.
PR Description
Summary
Add support for LiquidAI's LFM2.5-Audio model with full training capability via
liquid_audiopackage integration.LFM2AudioPluginfor audio placeholder handling with correct token boundarieslfm2_audiotemplate with ChatML-style formattingliquid_audio.LFM2AudioModelModel Information
Requirements
LFM2.5-Audio requires the
liquid_audiopackage for model loading: