vision-language-learning

Here are 13 public repositories matching this topic...

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Nov 26, 2024
Python

shikiw / OPERA

Star

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated Aug 24, 2024
Python

RLHF-V / RLAIF-V

Star

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

chatbot multimodal llava vision-language-learning gpt-4v llava-next rlaif-v minicpm-v

Updated Dec 7, 2024
Python

shikiw / Modality-Integration-Rate

Star

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

chatbot llama multimodal vision-language-model llava vision-language-learning large-multimodal-models gpt-4o

Updated Nov 27, 2024
Python

YunzeMan / Situation3D

Star

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

deep-learning multimodal-learning multi-modal-learning 3d-scene-understanding vision-language-model vision-language-learning

Updated Dec 9, 2024
Python

LooperXX / ManagerTower

Star

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Dec 24, 2024
Python

SHTUPLUS / GITM-MR

Star

The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

vision-and-language vision-and-language-pre-training vision-language-dataset vision-language-model vision-language-learning

Updated Dec 8, 2023
Python

yubin1219 / CrossVLT

Star

Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)

pytorch referring-image-segmentation vision-language-learning

Updated Aug 14, 2024
Python

lyuchenyang / Dialogue-to-Video-Retrieval

Star

Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"

machine-learning deep-learning multimedia neural-networks video-retrieval vision-language-learning

Updated Jul 14, 2023
Python

fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Star

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

story-generation multimodal-learning multimodal multimodal-deep-learning multimodal-data vision-language vision-language-transformer generative-ai vision-language-model multimodal-large-language-models vision-language-learning generative-ai-model agentic-workflow agentic-rag agentic-ai internvl2

Updated Jan 29, 2025
Python

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Star

Streamlit App Combining Vision, Language, and Audio AI Models

conversational-interface conversational-ai multimodal-learning multimodal multimodal-deep-learning multimodal-data conversational-agent conversational-bot vision-language vision-language-transformer generative-ai vision-language-model vision-language-navigation multimodal-large-language-models vision-language-learning vision-language-models internvl internvl2

Updated Jan 27, 2025
Python

Ravi-Teja-konda / TunedLlavaDelights

Star

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

dessert nutrition nutrition-information finetuning multimodal multi-modality gpt4 tranformers dalle2 stable-diffusion chatgpt vision-language-model llava vision-language-learning llama2 gpt4v

Updated Mar 17, 2024
Python

abhinav-neil / socratic-models

Star

Socratic models for multimodal reasoning & image captioning

image-captioning clip multimodal-learning visual-question-answering gpt-3 chain-of-thought flan-t5 vision-language-learning

Updated Jun 4, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-learning

Here are 13 public repositories matching this topic...

AIDC-AI / Ovis

shikiw / OPERA

RLHF-V / RLAIF-V

shikiw / Modality-Integration-Rate

YunzeMan / Situation3D

LooperXX / ManagerTower

SHTUPLUS / GITM-MR

yubin1219 / CrossVLT

lyuchenyang / Dialogue-to-Video-Retrieval

fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Ravi-Teja-konda / TunedLlavaDelights

abhinav-neil / socratic-models

Improve this page

Add this topic to your repo