DocSpeak for Azure

DocSpeak is an intelligent document analysis and presentation assistance tool powered by Azure OpenAI and Azure TTS services. It automatically analyzes web content, generates presentation outlines, and provides real-time speech synthesis capabilities.

Features

🔍 Intelligent Web Content Analysis
📑 Automatic PPT Outline Generation
🎯 Key Points Extraction
🗣️ Speech Script Generation
🔊 Real-time Text-to-Speech
💾 Content Caching
🎨 Intuitive User Interface
🔄 Smooth Presentation Control

Tech Stack

Backend

Python 3.11+
Flask
Azure OpenAI Service
Azure Text-to-Speech Service
BeautifulSoup4

Frontend

HTML5
CSS3
JavaScript (Vanilla)

Prerequisites

Azure OpenAI Service Subscription
Azure Cognitive Services (Speech Service) Subscription
Python 3.11 or higher

Installation

Clone the repository:

git clone https://github.com/LuBu0505/DocSpeak.git
cd DocSpeak

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Configuration

Azure OpenAI Setup

Configure in the application settings panel:

API URL
API Key

Azure TTS Setup

Configure in the application settings panel:

Region
API Key
Voice Selection (Multiple Chinese voices supported)

Usage Guide

After startup, visit http://localhost:5000
Click the settings button (⚙️) to configure Azure services
Enter the URL of the webpage you want to analyze
Click "Analyze Content"
The system will automatically generate PPT outline and speech script
Use the playback controller for presentation
- Left/Right arrows to switch pages
- Play button to start voice playback
- Auto-play mode for continuous content playback

Project Structure

DocSpeak/
├── app.py                 # Main application
├── azure_openai_client.py # Azure OpenAI client
├── azure_tts_client.py    # Azure TTS client
├── requirements.txt       # Project dependencies
├── static/               # Static resources
│   └── styles/
│       └── main.css
├── templates/            # HTML templates
│   └── index.html
├── data/                # Cache data
└── audio_cache/         # Audio cache

Caching Mechanism

Web analysis results are cached in the data/ directory
Speech synthesis results are cached in the audio_cache/ directory
Cache uses URL MD5 hash as identifier

Development Notes

Flask debug mode enabled in development
Hot reload supported
Detailed logging in console

Important Notes

Ensure sufficient Azure quota
Web scraping may be limited by target websites
Recommended to use modern browsers
Audio files are temporarily stored on the server

License

[Add appropriate license]

Contributing

Fork the project
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

Issue Reporting

If you find any bugs or have suggestions for improvements, please submit an issue.

Future Plans

API Documentation

Endpoints

1. Content Analysis

POST /analyze
Headers:
  - X-OpenAI-Key: Your Azure OpenAI API key
  - X-OpenAI-URL: Your Azure OpenAI endpoint URL
Body:
  {
    "url": "webpage-url-to-analyze"
  }

2. Text-to-Speech Conversion

POST /text-to-speech
Headers:
  - X-TTS-Key: Your Azure TTS API key
  - X-TTS-URL: Your Azure TTS region
  - X-TTS-Voice: Voice name (default: zh-CN-XiaoxiaoNeural)
Body:
  {
    "text": "Text to convert to speech"
  }

Performance Optimization

Content caching to reduce API calls
Audio caching to improve playback speed
Optimized web scraping with BeautifulSoup
Efficient JSON processing for large content

Security Considerations

API keys stored client-side in localStorage
No sensitive data stored on server
Input validation for URLs
Rate limiting for API requests

Deployment

Development

python app.py

Production

Recommended to use with:

Gunicorn
Nginx
HTTPS enabled
Environment variables for configuration

Support

For support:

Check existing issues
Create new issues
Contact project maintainers

Acknowledgments

Azure OpenAI Service
Azure Cognitive Services
Flask community
Contributors and testers

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.azure		.azure
.vscode		.vscode
__pycache__		__pycache__
audio_cache		audio_cache
data		data
static/styles		static/styles
templates		templates
.deployment		.deployment
README.md		README.md
README_CN.md		README_CN.md
README_EN.md		README_EN.md
app.py		app.py
azure_openai_client.py		azure_openai_client.py
azure_tts_client.py		azure_tts_client.py
genvideo.py		genvideo.py
notes.txt		notes.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocSpeak for Azure

Features

Tech Stack

Backend

Frontend

Prerequisites

Installation

Configuration

Azure OpenAI Setup

Azure TTS Setup

Usage Guide

Project Structure

Caching Mechanism

Development Notes

Important Notes

License

Contributing

Issue Reporting

Future Plans

API Documentation

Endpoints

1. Content Analysis

2. Text-to-Speech Conversion

Performance Optimization

Security Considerations

Deployment

Development

Production

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

LuBu0505/DocSpeak

Folders and files

Latest commit

History

Repository files navigation

DocSpeak for Azure

Features

Tech Stack

Backend

Frontend

Prerequisites

Installation

Configuration

Azure OpenAI Setup

Azure TTS Setup

Usage Guide

Project Structure

Caching Mechanism

Development Notes

Important Notes

License

Contributing

Issue Reporting

Future Plans

API Documentation

Endpoints

1. Content Analysis

2. Text-to-Speech Conversion

Performance Optimization

Security Considerations

Deployment

Development

Production

Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages