Skip to content

feat: add MiniMax as alternative LLM provider for data preprocessing#39

Open
octo-patch wants to merge 1 commit intoPKU-YuanGroup:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat: add MiniMax as alternative LLM provider for data preprocessing#39
octo-patch wants to merge 1 commit intoPKU-YuanGroup:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

Add configurable LLM provider support to the data preprocessing captioning scripts. Users can now use MiniMax (MiniMax-M2.7, MiniMax-M2.5) as an alternative to OpenAI GPT-4V for frame and video captioning, or point to any OpenAI-compatible API.

Changes

  • New file data_preprocess/llm_provider.py: Shared provider configuration module with PROVIDER_PRESETS, create_client(), get_model_name(), and clamp_temperature()
  • Modified 3 captioning scripts: step2_1_GPT4V_frame_caption.py, step3_1_GPT4V_video_caption_concise.py, step3_1_GPT4V_video_caption_detail.py — all now accept --provider, --base_url, --model arguments
  • Updated run.sh: Added PROVIDER variable for easy switching
  • Updated README: Added LLM Provider section with MiniMax usage docs
  • 9 files changed, 556 additions, 36 deletions

Usage

# Set MiniMax API key
export MINIMAX_API_KEY="your-api-key"

# Use MiniMax for frame captioning
python data_preprocess/step2_1_GPT4V_frame_caption.py \
    --provider minimax \
    --image_directories ./step_1

# Use MiniMax for video captioning
python data_preprocess/step3_1_GPT4V_video_caption_concise.py \
    --provider minimax \
    --input_file ./2_2_final_useful_gpt_frames_caption.json

Backward Compatibility

  • Default provider remains openai — no changes needed for existing workflows
  • The --api_key argument still works as before
  • Environment variable OPENAI_API_KEY is used as fallback for any provider

Test Plan

  • 26 unit tests covering provider presets, argument parsing, client creation, model resolution, temperature clamping, and function signatures
  • 3 integration tests verifying actual MiniMax API connectivity for text captioning
  • All 29 tests passing

Add configurable LLM provider support to the data preprocessing scripts
(frame captioning and video captioning). Users can now choose between
OpenAI (default) and MiniMax via --provider flag, or use any
OpenAI-compatible API via --base_url and --model.

Changes:
- Add data_preprocess/llm_provider.py: shared provider config module
  with PROVIDER_PRESETS, create_client(), get_model_name()
- Modify step2_1_GPT4V_frame_caption.py: use configurable client/model
- Modify step3_1_GPT4V_video_caption_concise.py: same
- Modify step3_1_GPT4V_video_caption_detail.py: same
- Update run.sh: add PROVIDER variable
- Add 26 unit tests + 3 integration tests
- Update README with MiniMax usage docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant