DocSpeak is an intelligent document analysis and presentation assistance tool powered by Azure OpenAI and Azure TTS services. It automatically analyzes web content, generates presentation outlines, and provides real-time speech synthesis capabilities.
- 🔍 Intelligent Web Content Analysis
- 📑 Automatic PPT Outline Generation
- 🎯 Key Points Extraction
- 🗣️ Speech Script Generation
- 🔊 Real-time Text-to-Speech
- 💾 Content Caching
- 🎨 Intuitive User Interface
- 🔄 Smooth Presentation Control
- Python 3.11+
- Flask
- Azure OpenAI Service
- Azure Text-to-Speech Service
- BeautifulSoup4
- HTML5
- CSS3
- JavaScript (Vanilla)
- Azure OpenAI Service Subscription
- Azure Cognitive Services (Speech Service) Subscription
- Python 3.11 or higher
- Clone the repository:
git clone https://github.com/LuBu0505/DocSpeak.git
cd DocSpeak- Install dependencies:
pip install -r requirements.txt- Run the application:
python app.pyConfigure in the application settings panel:
- API URL
- API Key
Configure in the application settings panel:
- Region
- API Key
- Voice Selection (Multiple Chinese voices supported)
- After startup, visit
http://localhost:5000 - Click the settings button (⚙️) to configure Azure services
- Enter the URL of the webpage you want to analyze
- Click "Analyze Content"
- The system will automatically generate PPT outline and speech script
- Use the playback controller for presentation
- Left/Right arrows to switch pages
- Play button to start voice playback
- Auto-play mode for continuous content playback
DocSpeak/
├── app.py # Main application
├── azure_openai_client.py # Azure OpenAI client
├── azure_tts_client.py # Azure TTS client
├── requirements.txt # Project dependencies
├── static/ # Static resources
│ └── styles/
│ └── main.css
├── templates/ # HTML templates
│ └── index.html
├── data/ # Cache data
└── audio_cache/ # Audio cache
- Web analysis results are cached in the
data/directory - Speech synthesis results are cached in the
audio_cache/directory - Cache uses URL MD5 hash as identifier
- Flask debug mode enabled in development
- Hot reload supported
- Detailed logging in console
- Ensure sufficient Azure quota
- Web scraping may be limited by target websites
- Recommended to use modern browsers
- Audio files are temporarily stored on the server
[Add appropriate license]
- Fork the project
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
If you find any bugs or have suggestions for improvements, please submit an issue.
- Support for more languages
- Additional voice options
- Optimize script generation algorithm
- PDF document analysis support
- Export functionality
POST /analyze
Headers:
- X-OpenAI-Key: Your Azure OpenAI API key
- X-OpenAI-URL: Your Azure OpenAI endpoint URL
Body:
{
"url": "webpage-url-to-analyze"
}
POST /text-to-speech
Headers:
- X-TTS-Key: Your Azure TTS API key
- X-TTS-URL: Your Azure TTS region
- X-TTS-Voice: Voice name (default: zh-CN-XiaoxiaoNeural)
Body:
{
"text": "Text to convert to speech"
}
- Content caching to reduce API calls
- Audio caching to improve playback speed
- Optimized web scraping with BeautifulSoup
- Efficient JSON processing for large content
- API keys stored client-side in localStorage
- No sensitive data stored on server
- Input validation for URLs
- Rate limiting for API requests
python app.pyRecommended to use with:
- Gunicorn
- Nginx
- HTTPS enabled
- Environment variables for configuration
For support:
- Check existing issues
- Create new issues
- Contact project maintainers
- Azure OpenAI Service
- Azure Cognitive Services
- Flask community
- Contributors and testers