A sophisticated AI-powered chatbot that can analyze images and engage in multi-language conversations using OpenAI's GPT-4 Vision model, Google Translate API, and MongoDB for chat history persistence.
Watch the project in action:
- 🖼️ Image Analysis: Upload and analyze images using OpenAI's GPT-4 Vision model
- 💬 Conversational AI: Engage in natural conversations about uploaded images or general topics
- 🌍 Multi-language Support: Translate responses to Hindi and Bengali using Google Translate API
- 📚 Chat History: Persistent conversation history stored in MongoDB
- 🎨 Streamlit Interface: User-friendly web interface built with Streamlit
- 🔄 Context Awareness: Maintains conversation context for coherent interactions
- Frontend: Streamlit
- AI Model: OpenAI GPT-4 Vision (gpt-4o)
- Translation: Google Cloud Translate API
- Database: MongoDB
- Image Processing: PIL (Pillow)
- Environment Management: python-dotenv
Before running this project, ensure you have:
- Python 3.7 or higher
- OpenAI API key
- Google Cloud credentials (for translation services)
- MongoDB connection string
- Required Python packages (see requirements.txt)
-
Clone the repository:
git clone https://github.com/rajibsalui/Conversational-Image-Chatbot.git cd Conversational-Image-Chatbot -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the project root and add:OPENAI_API_KEY=your_openai_api_key_here MONGO_URI=your_mongodb_connection_string_here
-
Set up Google Cloud credentials:
- Place your Google Cloud service account credentials file as
credentials.jsonin the project root - Ensure the service account has access to the Google Translate API
- Place your Google Cloud service account credentials file as
-
Start the application:
streamlit run main.py
-
Open your web browser and navigate to the displayed local URL (typically
http://localhost:8501) -
Using the chatbot:
- Upload an image using the sidebar file uploader
- Type your question or prompt in the input field
- Select your preferred language for the response
- Click "Ask Solution" to get AI-generated responses
- Use "Clear Chat" to reset the conversation history
AI_Image_Chatbot/
│
├── main.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── credentials.json # Google Cloud service account credentials
├── README.md # Project documentation
├── 2025-09-13 16-17-37.mp4 # Demo video
└── .env # Environment variables (create this)
The application uses MongoDB to store chat history. Ensure your MongoDB instance is running and accessible via the connection string in your .env file.
The chatbot uses OpenAI's GPT-4 Vision model (gpt-4o) for image analysis and conversation. Make sure you have sufficient API credits and the correct API key.
Translation features require Google Cloud Translate API access. Set up a service account with appropriate permissions and download the credentials JSON file.
- Supports JPG, JPEG, and PNG formats
- Converts images to base64 for API processing
- Provides detailed analysis and answers questions about uploaded images
- Automatic translation to Hindi and Bengali
- Preserves original English responses
- Uses Google Cloud Translate for accurate translations
- Persistent storage in MongoDB
- Displays recent conversations
- Includes both original and translated responses
- Timestamps for conversation tracking
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions:
- Check the Issues section
- Create a new issue with detailed information about your problem
- Include error messages, screenshots, and steps to reproduce
- OpenAI for the powerful GPT-4 Vision model
- Google Cloud for translation services
- Streamlit for the excellent web framework
- MongoDB for reliable data storage
- Support for more image formats
- Additional language support
- Voice input/output capabilities
- Image generation features
- Export chat history functionality
- User authentication system
Built with ❤️ by Rajib Salui