Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The gradio loads but the model does not output anything SOLVED VRAM memory issues SOLVED #129

Open
Zirgite opened this issue Feb 2, 2025 · 3 comments

Comments

@Zirgite
Copy link

Zirgite commented Feb 2, 2025

Microsoft Windows [Version 10.0.19045.5371]
(c) Microsoft Corporation. All rights reserved.

D:\Janus\Janus>myenv\Scripts\activate

(myenv) D:\Janus\Janus>python demo/app_januspro.py
Python version is above 3.10, patching the collections module.
D:\Janus\Janus\myenv\Lib\site-packages\transformers\models\auto\image_processing_auto.py:590: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.06s/it]
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Some kwargs in processor config are unused and will not have any effect: ignore_id, add_special_token, image_tag, num_image_tokens, mask_prompt, sft_format.
INFO: Could not find files for the given pattern(s).

To create a public link, set share=True in launch().

@Zirgite Zirgite changed the title The gradio loads but the model does not outut anything The gradio loads but the model does not output anything Feb 2, 2025
@Zirgite
Copy link
Author

Zirgite commented Feb 3, 2025

I have found the solution it is contained here
Tutorial on how to Install and Run the Model on Windows - Not an issue #117
but I asked the AI to resume all the steps I made to have it running.
Still after all those efforts the multi-modal was working and not the image creation. The model was consumming 24 gb ov VRAM ans still stalling. So I had to change the script see in the next block

🚀 How to Install and Run Janus-Pro on Windows

This guide provides step-by-step instructions to successfully install and run Janus-Pro on Windows without running into errors.


1. Install System Dependencies

Before setting up Janus-Pro, ensure you have the following installed:

🔹 Install Microsoft Visual Studio (C++ Build Tools)

  1. Download Visual Studio 2022 Build Tools from:
    🔗 https://visualstudio.microsoft.com/visual-cpp-build-tools/
  2. During installation, select:
    • ✅ "Desktop development with C++"
    • ✅ "Windows 10 SDK"
    • ✅ "C++ CMake tools for Windows"

🔹 Install NVIDIA CUDA Toolkit

  1. Download CUDA 12.3 from:
    🔗 https://developer.nvidia.com/cuda-downloads
  2. Install it and restart your computer.

🔹 Install Python 3.10+ (if not already installed)

  1. Download Python from:
    🔗 https://www.python.org/downloads/windows/
  2. Ensure you check ✅ "Add Python to PATH" during installation.

2. Clone the Janus-Pro Repository

  1. Open Command Prompt (CMD) and navigate to your desired installation folder:

    cd D:\Janus
  2. Clone the Janus-Pro repository:

    git clone https://github.com/deepseek-ai/Janus.git
  3. Navigate into the cloned directory:

    cd Janus

3. Set Up a Virtual Environment

  1. Create a virtual environment:

    python -m venv janus_env
  2. Activate the virtual environment:

    janus_env\Scripts\activate

    ✅ You should now see (janus_env) before your command line.


4. Install Required Python Packages

  1. Upgrade pip and install dependencies:

    pip install --upgrade pip
    pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  2. Remove PyTorch from requirements.txt and pyproject.toml to prevent CPU installation:

    • Open requirements.txt and delete any line that installs torch.
    • Open pyproject.toml and delete any line mentioning torch.
    • Save both files.
  3. Install Janus-Pro dependencies:

    pip install -e .
  4. Install Gradio support:

    pip install -e .[gradio]

    If this fails, install gradio manually:

    pip install gradio

5. Run Janus-Pro

  1. Ensure the virtual environment is activated:

    janus_env\Scripts\activate
  2. Start Janus-Pro:

    python demo/app_januspro.py

✅ If everything is installed correctly, Janus-Pro should now be running! 🚀


Troubleshooting

🔹 "Torch not compiled with CUDA enabled" Error

If you see:

AssertionError: Torch not compiled with CUDA enabled

Make sure you manually install PyTorch first before installing requirements:

pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Then remove torch from requirements.txt and pyproject.toml, then reinstall dependencies:

pip install -e .

🔹 Virtual Environment Activation Fails

If activate doesn’t work in CMD, try using PowerShell:

janus_env\Scripts\Activate.ps1

If you see a security error, run this command in Administrator PowerShell:

Set-ExecutionPolicy Unrestricted -Scope Process

🔹 Gradio or Transformers Module Not Found

If gradio or transformers is missing:

pip install gradio transformers

6. Automating Virtual Environment Activation

Since the virtual environment must be activated every time before running Janus-Pro, create a startup script.

Windows Batch Script (start_janus.bat************)

@echo off
cd /d D:\Janus\Janus
call janus_env\Scripts\activate
python demo/app_januspro.py
pause

Bash Script (start_janus.sh************) for Git Bash or WSL

#!/bin/bash
cd "$(dirname "$0")"
source janus_env/Scripts/activate
python demo/app_januspro.py

Run this script whenever you need to start Janus-Pro automatically.


🎯 Final Notes

✔️ Always clone the repository before setting up the virtual environment.
✔️ Manually install PyTorch before other dependencies.
✔️ Remove torch from requirements.txt and pyproject.toml to prevent CPU-only installation.
✔️ Install dependencies in the correct order (install base first, then Gradio).
✔️ Activate the virtual environment before running the script.
✔️ If issues persist, try restarting your PC after installation.

🚀 ! Enjoy!

@Zirgite
Copy link
Author

Zirgite commented Feb 3, 2025

I had issues with image generation so I needed to optimize the app_januspro
to be easier on the system.
This solved memory issues and I got the image generation running.

import gradio as gr
import torch
from transformers import AutoConfig, AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images
from PIL import Image

import numpy as np
import os
import time
# import spaces  # Import spaces for ZeroGPU compatibility


# Load model and processor
model_path = "deepseek-ai/Janus-Pro-7B"
config = AutoConfig.from_pretrained(model_path)
language_config = config.language_config
language_config._attn_implementation = 'eager'
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path,
                                             language_config=language_config,
                                             trust_remote_code=True)
if torch.cuda.is_available():
    vl_gpt = vl_gpt.to(torch.bfloat16).cuda()
else:
    vl_gpt = vl_gpt.to(torch.float16)

vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'

@torch.inference_mode()
# @spaces.GPU(duration=120) 
# Multimodal Understanding function
def multimodal_understanding(image, question, seed, top_p, temperature):
    # Clear CUDA cache before generating
    torch.cuda.empty_cache()
    
    # set seed
    torch.manual_seed(seed)
    np.random.seed(seed)
    torch.cuda.manual_seed(seed)
    
    conversation = [
        {
            "role": "<|User|>",
            "content": f"<image_placeholder>\n{question}",
            "images": [image],
        },
        {"role": "<|Assistant|>", "content": ""},
    ]
    
    pil_images = [Image.fromarray(image)]
    prepare_inputs = vl_chat_processor(
        conversations=conversation, images=pil_images, force_batchify=True
    ).to(cuda_device, dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16)
    
    
    inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
    
    outputs = vl_gpt.language_model.generate(
        inputs_embeds=inputs_embeds,
        attention_mask=prepare_inputs.attention_mask,
        pad_token_id=tokenizer.eos_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        do_sample=False if temperature == 0 else True,
        use_cache=True,
        temperature=temperature,
        top_p=top_p,
    )
    
    answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
    return answer


def generate(input_ids,
             width,
             height,
             temperature: float = 1,
             num_images: int = 5,
             cfg_weight: float = 5,
             image_token_num: int = 576,
             patch_size: int = 16):
    """
    Sequentially generate images one at a time to minimize VRAM usage.
    """
    all_generated_tokens = []  # List to store generated tokens for each image

    for _ in range(num_images):
        # Prepare tokens for classifier-free guidance (2 branches: conditioned and unconditioned)
        tokens = torch.zeros((2, len(input_ids)), dtype=torch.int).to(cuda_device)
        for i in range(2):
            tokens[i, :] = input_ids
            if i == 1:
                tokens[i, 1:-1] = vl_chat_processor.pad_id

        # Get the input embeddings for the tokens
        inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
        generated_tokens = torch.zeros((image_token_num,), dtype=torch.int).to(cuda_device)
        pkv = None  # Past key values for autoregressive generation

        # Generate tokens sequentially for this image
        for i in range(image_token_num):
            outputs = vl_gpt.language_model.model(
                inputs_embeds=inputs_embeds,
                use_cache=True,
                past_key_values=pkv
            )
            pkv = outputs.past_key_values
            hidden_states = outputs.last_hidden_state
            logits = vl_gpt.gen_head(hidden_states[:, -1, :])

            # Separate conditioned and unconditioned logits
            logit_cond = logits[0, :]
            logit_uncond = logits[1, :]

            # Apply classifier-free guidance
            combined_logits = logit_uncond + cfg_weight * (logit_cond - logit_uncond)
            probs = torch.softmax(combined_logits / temperature, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            generated_tokens[i] = next_token.squeeze(dim=-1)

            # Prepare embeddings for the next token for both branches
            next_token_cat = next_token.repeat(2)
            img_embeds = vl_gpt.prepare_gen_img_embeds(next_token_cat)
            inputs_embeds = img_embeds.unsqueeze(dim=1)

        all_generated_tokens.append(generated_tokens)
        torch.cuda.empty_cache()  # Free VRAM after each image generation

    # Stack tokens for all images and decode to patches
    all_generated_tokens = torch.stack(all_generated_tokens, dim=0)
    patches = vl_gpt.gen_vision_model.decode_code(
        all_generated_tokens.to(dtype=torch.int),
        shape=[num_images, 8, width // patch_size, height // patch_size]
    )

    return all_generated_tokens.to(dtype=torch.int), patches

def unpack(dec, width, height, parallel_size=5):
    dec = dec.to(torch.float32).cpu().numpy().transpose(0, 2, 3, 1)
    dec = np.clip((dec + 1) / 2 * 255, 0, 255)

    visual_img = np.zeros((parallel_size, width, height, 3), dtype=np.uint8)
    visual_img[:, :, :] = dec

    return visual_img



@torch.inference_mode()

def generate_image(prompt, seed=None, guidance=5, t2i_temperature=1.0):
    # Clear CUDA cache and set seed for reproducibility
    torch.cuda.empty_cache()
    if seed is not None:
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        np.random.seed(seed)
    
    # Define dimensions and patch size
    width, height = 384, 384
    patch_size = 16  # Ensure dimensions are multiples of patch_size
    num_images = 1   # Generate 5 images sequentially

    with torch.no_grad():
        messages = [
            {'role': '<|User|>', 'content': prompt},
            {'role': '<|Assistant|>', 'content': ''}
        ]
        text = vl_chat_processor.apply_sft_template_for_multi_turn_prompts(
            conversations=messages,
            sft_format=vl_chat_processor.sft_format,
            system_prompt=''
        )
        text += vl_chat_processor.image_start_tag

        input_ids = torch.LongTensor(tokenizer.encode(text))
        
        # Adjust dimensions to be multiples of patch_size
        adjusted_width = (width // patch_size) * patch_size
        adjusted_height = (height // patch_size) * patch_size
        
        # Call the updated sequential generate function
        output, patches = generate(
            input_ids,
            width=adjusted_width,
            height=adjusted_height,
            temperature=t2i_temperature,
            num_images=num_images,  # Using sequential generation
            cfg_weight=guidance
        )
        
        images = unpack(patches, adjusted_width, adjusted_height, parallel_size=num_images)
        # Resize each image to your desired output dimensions
        return [Image.fromarray(images[i]).resize((768, 768), Image.LANCZOS) for i in range(num_images)]


        

# Gradio interface
with gr.Blocks() as demo:
    gr.Markdown(value="# Multimodal Understanding")
    with gr.Row():
        image_input = gr.Image()
        with gr.Column():
            question_input = gr.Textbox(label="Question")
            und_seed_input = gr.Number(label="Seed", precision=0, value=42)
            top_p = gr.Slider(minimum=0, maximum=1, value=0.95, step=0.05, label="top_p")
            temperature = gr.Slider(minimum=0, maximum=1, value=0.1, step=0.05, label="temperature")
        
    understanding_button = gr.Button("Chat")
    understanding_output = gr.Textbox(label="Response")

    examples_inpainting = gr.Examples(
        label="Multimodal Understanding examples",
        examples=[
            [
                "explain this meme",
                "images/doge.png",
            ],
            [
                "Convert the formula into latex code.",
                "images/equation.png",
            ],
        ],
        inputs=[question_input, image_input],
    )
    
        
    gr.Markdown(value="# Text-to-Image Generation")

    
    
    with gr.Row():
        cfg_weight_input = gr.Slider(minimum=1, maximum=10, value=5, step=0.5, label="CFG Weight")
        t2i_temperature = gr.Slider(minimum=0, maximum=1, value=1.0, step=0.05, label="temperature")

    prompt_input = gr.Textbox(label="Prompt. (Prompt in more detail can help produce better images!)")
    seed_input = gr.Number(label="Seed (Optional)", precision=0, value=12345)

    generation_button = gr.Button("Generate Images")

    image_output = gr.Gallery(label="Generated Images", columns=2, rows=2, height=300)

    examples_t2i = gr.Examples(
        label="Text to image generation examples.",
        examples=[
            "Master shifu racoon wearing drip attire as a street gangster.",
            "The face of a beautiful girl",
            "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
            "A glass of red wine on a reflective surface.",
            "A cute and adorable baby fox with big brown eyes, autumn leaves in the background enchanting,immortal,fluffy, shiny mane,Petals,fairyism,unreal engine 5 and Octane Render,highly detailed, photorealistic, cinematic, natural colors.",
            "The image features an intricately designed eye set against a circular backdrop adorned with ornate swirl patterns that evoke both realism and surrealism. At the center of attention is a strikingly vivid blue iris surrounded by delicate veins radiating outward from the pupil to create depth and intensity. The eyelashes are long and dark, casting subtle shadows on the skin around them which appears smooth yet slightly textured as if aged or weathered over time.\n\nAbove the eye, there's a stone-like structure resembling part of classical architecture, adding layers of mystery and timeless elegance to the composition. This architectural element contrasts sharply but harmoniously with the organic curves surrounding it. Below the eye lies another decorative motif reminiscent of baroque artistry, further enhancing the overall sense of eternity encapsulated within each meticulously crafted detail. \n\nOverall, the atmosphere exudes a mysterious aura intertwined seamlessly with elements suggesting timelessness, achieved through the juxtaposition of realistic textures and surreal artistic flourishes. Each component\u2014from the intricate designs framing the eye to the ancient-looking stone piece above\u2014contributes uniquely towards creating a visually captivating tableau imbued with enigmatic allure.",
        ],
        inputs=prompt_input,
    )
    
    understanding_button.click(
        multimodal_understanding,
        inputs=[image_input, question_input, und_seed_input, top_p, temperature],
        outputs=understanding_output
    )
    
    generation_button.click(
        fn=generate_image,
        inputs=[prompt_input, seed_input, cfg_weight_input, t2i_temperature],
        outputs=image_output
    )

demo.launch()
# demo.queue(concurrency_count=1, max_size=10).launch(server_name="0.0.0.0", server_port=37906, root_path="/path")

@Zirgite
Copy link
Author

Zirgite commented Feb 3, 2025

here is the summirized documentation of the changes
Documentation for VRAM Optimization and Sequential Image Generation in Janus-Pro-7B

  1. Introduction

This document details the changes made to the image generation component of the Janus-Pro-7B project. Originally, the image generation process used a parallel approach that led to excessive VRAM usage and stalled operations. To resolve this, the code has been refactored to generate images sequentially, thereby reducing memory pressure. The multimodal understanding (chat) functionality remains unaffected.
2. Overview

The primary adjustments include:

Removal of the Old Parallel Generation:
The previous generate function, which relied on a parallel_size parameter to generate images in parallel, has been removed.

Introduction of Sequential Image Generation:
A new version of the generate function now accepts a num_images parameter and processes image generation one image at a time. This change significantly minimizes VRAM usage by clearing the GPU memory after each image is generated.

Updates to the generate_image Function:
The generate_image function has been modified to call the sequential generate function and properly adjust image dimensions. This function now handles input prompt processing, dimension adjustment (to ensure multiples of the patch size), and final image resizing.

Memory Management:
Strategic calls to torch.cuda.empty_cache() have been added after generating each image to free VRAM and prevent memory stalls.
  1. Detailed Modifications
    3.1. Removed Parallel Generation

    Old Function Removed:
    The previous version of the generate function that used a parallel_size parameter has been completely removed to avoid naming conflicts and ensure that only the sequential version is active.

3.2. Sequential Image Generation Function

New Function Signature:
The updated generate function now accepts num_images instead of parallel_size.

Sequential Loop:
The function iterates over the range of num_images, generating tokens for one image per iteration.

Classifier-Free Guidance:
Tokens for conditioned and unconditioned branches are generated and combined using classifier-free guidance.

Memory Clearance:
After each image is generated, torch.cuda.empty_cache() is called to release unused VRAM.

Token Decoding:
Generated tokens are stacked and decoded into image patches using vl_gpt.gen_vision_model.decode_code.

3.3. Updated generate_image Function

Input Processing:
The function processes the text prompt, applies the appropriate SFT template, and appends an image start tag.

Dimension Adjustment:
The width and height are adjusted to ensure they are multiples of the patch size (e.g., 16) before passing them to the generate function.

Output Resizing:
The output images are resized (e.g., to 768×768) for presentation.

3.4. Memory Management

VRAM Optimization:
Calls to torch.cuda.empty_cache() are strategically placed after the generation of each image to mitigate VRAM issues.
  1. Function Descriptions
    4.1. multimodal_understanding

    Purpose:
    Processes a given image and a text-based question using the Janus-Pro-7B model.
    Key Operations:
    Sets the seed for reproducibility.
    Prepares input embeddings for multimodal understanding.
    Generates a response by decoding the output tokens.
    Usage:
    This function is used for chat-based interactions and remains unchanged.

4.2. generate (Sequential Version)

Purpose:
Generates image tokens sequentially to minimize VRAM usage.
Parameters:
    input_ids: Encoded input tokens from the text prompt.
    width, height: Dimensions for the generated image.
    temperature: Sampling temperature.
    num_images: Number of images to generate (sequentially).
    cfg_weight: Classifier-free guidance weight.
    image_token_num: Number of tokens per image.
    patch_size: Patch size for the vision model.
Key Operations:
    Iterates over num_images, generating one image at a time.
    Applies classifier-free guidance for token generation.
    Clears GPU memory after generating each image.
    Decodes tokens into image patches.

4.3. generate_image

Purpose:
Converts a text prompt into an image using the sequential generate function.
Key Operations:
    Processes and encodes the text prompt.
    Adjusts image dimensions to be multiples of the patch size.
    Calls the updated sequential generate function.
    Resizes and returns the generated images.
Usage:
This is the function called by the Gradio interface to generate images.

4.4. unpack

Purpose:
Converts decoded image patches (tensor) into a NumPy array and then into image format.
Key Operations:
    Transposes and scales the tensor data.
    Returns formatted images ready for display.
  1. How to Run

    Setup:
    Ensure you have the required dependencies installed, including Gradio, PyTorch, Transformers, and the Janus libraries.

    Launch:
    Run the code using your preferred Python interpreter. The Gradio interface will launch, allowing you to interact with both the multimodal chat and image generation functionalities.

    Testing:
    If VRAM issues persist, reduce the value of num_images (e.g., test with 1 image) or lower the resolution temporarily.

    Monitoring:
    Monitor your GPU’s memory usage to ensure the new sequential generation method is effectively managing VRAM.

  2. Conclusion

The implemented changes allow the Janus-Pro-7B model to generate images sequentially, significantly reducing VRAM usage compared to the previous parallel method. This solution maintains the multimodal understanding capabilities while ensuring that image generation does not stall due to memory constraints. The code is now more robust and suitable for environments with limited GPU resources.

@Zirgite Zirgite changed the title The gradio loads but the model does not output anything The gradio loads but the model does not output anything SOLVED Feb 3, 2025
@Zirgite Zirgite changed the title The gradio loads but the model does not output anything SOLVED The gradio loads but the model does not output anything SOLVED VRAM memory issues SOLVED Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant