Skip to content

DocSpeak for Azure is an intelligent content transformation initiative designed for developers who rely on Azure documentation. The project leverages large AI models to automatically convert technical text from official Azure Docs into structured, easy-to-understand video explanations

Notifications You must be signed in to change notification settings

LuBu0505/DocSpeak

Repository files navigation

DocSpeak for Azure

DocSpeak is an intelligent document analysis and presentation assistance tool powered by Azure OpenAI and Azure TTS services. It automatically analyzes web content, generates presentation outlines, and provides real-time speech synthesis capabilities.

Features

  • 🔍 Intelligent Web Content Analysis
  • 📑 Automatic PPT Outline Generation
  • 🎯 Key Points Extraction
  • 🗣️ Speech Script Generation
  • 🔊 Real-time Text-to-Speech
  • 💾 Content Caching
  • 🎨 Intuitive User Interface
  • 🔄 Smooth Presentation Control

Tech Stack

Backend

  • Python 3.11+
  • Flask
  • Azure OpenAI Service
  • Azure Text-to-Speech Service
  • BeautifulSoup4

Frontend

  • HTML5
  • CSS3
  • JavaScript (Vanilla)

Prerequisites

  • Azure OpenAI Service Subscription
  • Azure Cognitive Services (Speech Service) Subscription
  • Python 3.11 or higher

Installation

  1. Clone the repository:
git clone https://github.com/LuBu0505/DocSpeak.git
cd DocSpeak
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py

Configuration

Azure OpenAI Setup

Configure in the application settings panel:

  • API URL
  • API Key

Azure TTS Setup

Configure in the application settings panel:

  • Region
  • API Key
  • Voice Selection (Multiple Chinese voices supported)

Usage Guide

  1. After startup, visit http://localhost:5000
  2. Click the settings button (⚙️) to configure Azure services
  3. Enter the URL of the webpage you want to analyze
  4. Click "Analyze Content"
  5. The system will automatically generate PPT outline and speech script
  6. Use the playback controller for presentation
    • Left/Right arrows to switch pages
    • Play button to start voice playback
    • Auto-play mode for continuous content playback

Project Structure

DocSpeak/
├── app.py                 # Main application
├── azure_openai_client.py # Azure OpenAI client
├── azure_tts_client.py    # Azure TTS client
├── requirements.txt       # Project dependencies
├── static/               # Static resources
│   └── styles/
│       └── main.css
├── templates/            # HTML templates
│   └── index.html
├── data/                # Cache data
└── audio_cache/         # Audio cache

Caching Mechanism

  • Web analysis results are cached in the data/ directory
  • Speech synthesis results are cached in the audio_cache/ directory
  • Cache uses URL MD5 hash as identifier

Development Notes

  • Flask debug mode enabled in development
  • Hot reload supported
  • Detailed logging in console

Important Notes

  • Ensure sufficient Azure quota
  • Web scraping may be limited by target websites
  • Recommended to use modern browsers
  • Audio files are temporarily stored on the server

License

[Add appropriate license]

Contributing

  1. Fork the project
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Issue Reporting

If you find any bugs or have suggestions for improvements, please submit an issue.

Future Plans

  • Support for more languages
  • Additional voice options
  • Optimize script generation algorithm
  • PDF document analysis support
  • Export functionality

API Documentation

Endpoints

1. Content Analysis

POST /analyze
Headers:
  - X-OpenAI-Key: Your Azure OpenAI API key
  - X-OpenAI-URL: Your Azure OpenAI endpoint URL
Body:
  {
    "url": "webpage-url-to-analyze"
  }

2. Text-to-Speech Conversion

POST /text-to-speech
Headers:
  - X-TTS-Key: Your Azure TTS API key
  - X-TTS-URL: Your Azure TTS region
  - X-TTS-Voice: Voice name (default: zh-CN-XiaoxiaoNeural)
Body:
  {
    "text": "Text to convert to speech"
  }

Performance Optimization

  • Content caching to reduce API calls
  • Audio caching to improve playback speed
  • Optimized web scraping with BeautifulSoup
  • Efficient JSON processing for large content

Security Considerations

  • API keys stored client-side in localStorage
  • No sensitive data stored on server
  • Input validation for URLs
  • Rate limiting for API requests

Deployment

Development

python app.py

Production

Recommended to use with:

  • Gunicorn
  • Nginx
  • HTTPS enabled
  • Environment variables for configuration

Support

For support:

  • Check existing issues
  • Create new issues
  • Contact project maintainers

Acknowledgments

  • Azure OpenAI Service
  • Azure Cognitive Services
  • Flask community
  • Contributors and testers

About

DocSpeak for Azure is an intelligent content transformation initiative designed for developers who rely on Azure documentation. The project leverages large AI models to automatically convert technical text from official Azure Docs into structured, easy-to-understand video explanations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published