Skip to content

Latest commit

 

History

History
99 lines (84 loc) · 8.52 KB

File metadata and controls

99 lines (84 loc) · 8.52 KB

MindSpore ONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.3.0, please install MindSpore 2.5.0 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run.

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
  • compatibale with hf diffusers 0.32.2
component features
pipeline support text-to-image,text-to-video,text-to-audio tasks 160+
models support audoencoder & transformers base models same as hf diffusers 50+
schedulers support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

supported models under mindone/examples

task model inference finetune pretrain institute
Image-to-Video hunyuanvideo-i2v 🔥🔥 ✖️ ✖️ Tencent
Text/Image-to-Video wan2.1 🔥🔥🔥 ✖️ ✖️ Alibaba
Text/Image/Speech-to-Video wan2.2 🔥🔥🔥 ✖️ ✖️ Alibaba
Text-to-Image cogview4 🔥🔥🔥 ✖️ ✖️ Zhipuai
Text-to-Video step_video_t2v 🔥🔥 ✖️ ✖️ StepFun
Image-Text-to-Text qwen2_vl 🔥🔥🔥 ✖️ ✖️ Alibaba
Any-to-Any janus 🔥🔥🔥 DeepSeek
Any-to-Any emu3 🔥🔥 BAAI
Class-to-Image var🔥🔥 ByteDance
Text/Image-to-Video hpcai open sora 1.2/2.0 🔥🔥 HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B 🔥🔥 Zhipu
Text-to-Video open sora plan 1.3 🔥🔥 PKU
Text-to-Video hunyuanvideo 🔥🔥 Tencent
Text-to-Video movie gen 30B 🔥🔥 Meta
Video-Encode-Decode magvit Google
Text-to-Image story_diffusion ✖️ ✖️ ByteDance
Image-to-Video dynamicrafter ✖️ ✖️ Tencent
Video-to-Video venhancer ✖️ ✖️ Shanghai AI Lab
Text-to-Video t2v_turbo Google
Image-to-Video svd Stability AI
Text-to-Video animate diff CUHK
Text/Image-to-Video video composer Alibaba
Text-to-Image flux 🔥 ✖️ Black Forest Lab
Text-to-Image stable diffusion 3 🔥 ✖️ Stability AI
Text-to-Image kohya_sd_scripts ✖️ kohya
Text-to-Image stable diffusion xl Stability AI
Text-to-Image stable diffusion Stability AI
Text-to-Image hunyuan_dit Tencent
Text-to-Image pixart_sigma Huawei
Text-to-Image fit Shanghai AI Lab
Class-to-Video latte Shanghai AI Lab
Class-to-Image dit Meta
Text-to-Image t2i-adapter Shanghai AI Lab
Text-to-Image ip adapter Tencent
Text-to-3D mvdream ByteDance
Image-to-3D instantmesh Tencent
Image-to-3D sv3d Stability AI
Text/Image-to-3D hunyuan3d-1.0 Tencent

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava 🔥 ✖️ ✖️ support video and image captioning