A Python application that analyzes screenshots using Google's Gemini AI and provides answers via typing or text-to-speech. I recommend to modify the prompts to better fit your own needs. If you found this project useful, please drop a ⭐- it means a lot!
- ALT+Q: General question (types answer)
- ALT+C: Code completion question (types answer)
- ALT+M: Multiple choice question (speaks answer)
- ALT+T: Translation to english (speaks answer)
- ALT+E: Detailed text explanation (speaks answer)
- ALT+R: Repeats last TTS response
- ALT+ESC: Exits application
- Install dependencies:
pip install -r requirements.txt
- Create
.env
file with your Gemini API key (check images for help) - Run the application
- Requires internet connection
- Beep sounds indicate startup and shutdown
- Warning! Long TextToSpeech answers can take up to 20 seconds to load
- To make it actually invisible, change the script extension to .pyw
Inspired by Cluely AI.