[model] support LFM2.5-Audio with liquid_audio integration #9733

vovanphuc · 2026-01-08T03:42:45Z

PR Description

Summary

Add support for LiquidAI's LFM2.5-Audio model with full training capability via liquid_audio package integration.

Add LFM2AudioPlugin for audio placeholder handling with correct token boundaries
Add lfm2_audio template with ChatML-style formatting
Add custom model loader wrapper for liquid_audio.LFM2AudioModel
Add model group registration in constants

Model Information

Attribute	Value
Model	LiquidAI/LFM2.5-Audio-1.5B
Architecture	FastConformer encoder + LFM2-1.2B backbone
Modality	Audio-to-Text (ASR/instruction following)
Parameters	1.5B

Requirements

LFM2.5-Audio requires the liquid_audio package for model loading:

pip install liquid-audio

Token Structure

The plugin correctly handles LFM2.5-Audio's audio boundary markers:
<|audio_start|><|reserved_1|><|text_start|>
- <|audio_start|> (token 128): Audio region start
- <|reserved_1|> (token 17): Audio placeholder token
- <|text_start|> (token 129): Audio region end / text start

Validation

- make style - passed
- make quality - passed
- Plugin unit tests - passed
- LoRA training smoke test - passed (100 steps, loss: 10.07 → 0.0004)

Training Verification

llamafactory-cli train \
  --model_name_or_path LiquidAI/LFM2.5-Audio-1.5B \
  --template lfm2_audio \
  --finetuning_type lora \
  --lora_target q_proj,k_proj,v_proj,out_proj,w1,w2,w3 \
  --dataset mllm_audio_demo \
  --output_dir saves/lfm2-audio

Training output:
trainable params: 5,550,080 || all params: 1,459,055,744 || trainable%: 0.3804
train_loss: 1.69 (100 steps)

Related

- LFM2.5 text model: #9726
- LFM2.5-VL vision model: #9729
- [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
- liquid-audio package: https://pypi.org/project/liquid-audio/

gemini-code-assist · 2026-01-08T03:43:07Z

Summary of Changes

Hello @vovanphuc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the framework's capabilities by introducing comprehensive support for the LiquidAI LFM2.5-Audio model. It integrates the liquid_audio package to facilitate audio processing and model loading, allowing users to train and fine-tune this audio-to-text model within the existing ecosystem. The changes ensure proper handling of audio input, tokenization, and model configuration, making the LFM2.5-Audio model a first-class citizen for multimodal tasks.

Highlights

LFM2.5-Audio Model Integration: Added comprehensive support for the LiquidAI LFM2.5-Audio model, enabling full training and inference capabilities through integration with the liquid_audio package.
New Audio Plugin: Introduced LFM2AudioPlugin to manage audio input processing, including validation, feature extraction (prioritizing liquid_audio's processor), and intelligent replacement of audio placeholders with appropriate boundary tokens.
ChatML-style Template: Registered a new lfm2_audio template, defining ChatML-style formatting for user, assistant, system, function, and observation messages, ensuring consistent interaction with the LFM2.5-Audio model.
Custom Model Loader: Implemented a custom model loader wrapper (LFM2AudioModelForCausalLM) to seamlessly integrate liquid_audio.LFM2AudioModel with the HuggingFace PreTrainedModel and GenerationMixin interfaces, facilitating its use within the framework's training and generation pipelines.
Model Group Registration: Registered the LFM2.5-Audio-1.5B model within the system's constants, linking it to the lfm2_audio template and marking it as multimodal for improved discoverability and configuration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the LFM2.5-Audio model, including a new data plugin, chat template, and model loader. The implementation is well-structured and follows the existing patterns in the codebase. I've identified a potential correctness issue in how audio sequence lengths are calculated, which might lead to incorrect placeholder expansion for variable-length audios. Additionally, I've noticed the use of magic numbers for token IDs in the model wrapper and a test case that could be strengthened to cover more scenarios. My detailed feedback and suggestions are in the comments below.

gemini-code-assist · 2026-01-08T03:46:03Z

src/llamafactory/data/mm_plugin.py

+            if hasattr(features, "shape"):
+                seq_len = (features.shape[-1] - 1) // 8 + 1
+                mm_inputs["audio_seq_lengths"] = [seq_len] * len(audios)


The current implementation for calculating audio_seq_lengths assumes all audios in a batch have the same length. It computes a single seq_len based on the padded feature length and applies it to all audio files. This can lead to an incorrect number of placeholder tokens for shorter audio files in a batch with variable-length audios.

The fallback path for Hugging Face's feature_extractor is more robust as it uses the attention_mask to determine the actual length of each audio. I recommend a similar approach here. Please check if the liquid_audio processor can return an attention mask or a list of lengths. If not, you might need to compute the sequence lengths based on the lengths of the audios_regularized list before they are padded and passed to the audio_processor.

gemini-code-assist · 2026-01-08T03:46:03Z

src/llamafactory/model/model_utils/lfm2_audio.py

+        self.generation_config = GenerationConfig(
+            eos_token_id=config.eos_token_id if hasattr(config, "eos_token_id") else 7,
+            pad_token_id=config.pad_token_id if hasattr(config, "pad_token_id") else 0,
+        )


The GenerationConfig is initialized with hardcoded fallback values for eos_token_id (7) and pad_token_id (0). Using such magic numbers is not ideal as it can lead to subtle bugs if the model's actual token IDs are different. The PR description mentions special tokens with IDs 17, 128, and 129, but not 7 or 0, which makes these defaults more concerning.

It would be more robust to ensure these values are correctly populated from the model's configuration. If these are indeed fixed values for this model family, consider defining them as named constants with explanatory comments.

Suggested change

self.generation_config = GenerationConfig(

eos_token_id=config.eos_token_id if hasattr(config, "eos_token_id") else 7,

pad_token_id=config.pad_token_id if hasattr(config, "pad_token_id") else 0,

)

self.generation_config = GenerationConfig(

eos_token_id=getattr(config, "eos_token_id", None),

pad_token_id=getattr(config, "pad_token_id", None),

)

tests/data/test_mm_plugin.py

- Fix audio seq_lengths calculation to handle variable-length audios (previously assumed all audios had same length) - Add comments documenting magic number token IDs (7=<|im_end|>, 0=<unk>) - Improve test coverage with 3 additional test cases: - Multiple audio placeholders - Text-only messages - get_mm_inputs with no processor

Handle tied weights in depth_embeddings when saving merged model. The embedding.weight and to_logits.weight are shared in each depth embedding layer, causing save_pretrained to fail without this fix.

Add detection of merged/exported models (safetensors format) and load them by first creating base model structure from liquid_audio, then applying the merged weights from safetensors files.

vovanphuc added 5 commits January 8, 2026 10:29

[model] add LFM2AudioPlugin for LFM2.5-Audio support

63da627

[feature] add lfm2_audio template and model registration

d9c7a61

[model] add LFM2.5-Audio model loader with liquid_audio integration

130996f

[test] add LFM2.5-Audio plugin tests

2e2062c

[docs] add LFM2.5-Audio to supported models list

e36d8ce

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

vovanphuc temporarily deployed to docker January 8, 2026 07:20 — with GitHub Actions Inactive

vovanphuc added 2 commits January 8, 2026 16:02

[fix] add _keys_to_ignore_on_save for LFM2.5-Audio model export

9994ca2

Handle tied weights in depth_embeddings when saving merged model. The embedding.weight and to_logits.weight are shared in each depth embedding layer, causing save_pretrained to fail without this fix.

[fix] support loading merged LFM2.5-Audio models for finetuning

8151738

Add detection of merged/exported models (safetensors format) and load them by first creating base model structure from liquid_audio, then applying the merged weights from safetensors files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] support LFM2.5-Audio with liquid_audio integration #9733

[model] support LFM2.5-Audio with liquid_audio integration #9733

vovanphuc commented Jan 8, 2026

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[model] support LFM2.5-Audio with liquid_audio integration #9733

Are you sure you want to change the base?

[model] support LFM2.5-Audio with liquid_audio integration #9733

Conversation

vovanphuc commented Jan 8, 2026

Summary

Model Information

Requirements

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant