diff --git a/LLM_PROVIDERS.md b/LLM_PROVIDERS.md new file mode 100644 index 0000000..07e3e59 --- /dev/null +++ b/LLM_PROVIDERS.md @@ -0,0 +1,190 @@ +# OpenAI API Compatible LLM Support + +Vinci Clips now supports multiple LLM providers through an OpenAI-compatible API interface. This allows you to use various AI services for video transcription and analysis. + +## Supported Providers + +### 1. Google Gemini (Default) +- **API Key**: `GEMINI_API_KEY` +- **Models**: `gemini-1.5-flash`, `gemini-2.5-flash`, etc. +- **Features**: Audio transcription + Text analysis +- **Cost**: Free tier available + +### 2. OpenAI +- **API Key**: `OPENAI_API_KEY` +- **Models**: `gpt-3.5-turbo`, `gpt-4`, `gpt-4-turbo`, etc. +- **Features**: Text analysis only (transcription still requires Gemini) +- **Cost**: Pay per use + +### 3. OpenAI-Compatible APIs +- **API Key**: `LLM_API_KEY` +- **Base URL**: `LLM_BASE_URL` +- **Examples**: Perplexity, local APIs, gpt4free proxies +- **Features**: Text analysis only +- **Cost**: Varies by provider + +## Configuration + +### Environment Variables + +Add these to your `backend/.env` file: + +```env +# Primary LLM Provider +LLM_PROVIDER=gemini # or 'openai' + +# Gemini Configuration (recommended for full functionality) +GEMINI_API_KEY=your_gemini_api_key_here +LLM_MODEL=gemini-1.5-flash + +# OR OpenAI Configuration +OPENAI_API_KEY=your_openai_api_key_here +LLM_MODEL=gpt-3.5-turbo + +# OR Custom OpenAI-Compatible API +LLM_API_KEY=your_api_key_here +LLM_BASE_URL=https://api.example.com/v1 +LLM_MODEL=custom-model-name +``` + +### Provider Selection Logic + +1. **Primary Provider**: Set by `LLM_PROVIDER` environment variable +2. **Automatic Fallback**: If primary provider fails, automatically tries available alternatives +3. **Provider Detection**: Automatically detects which providers are configured based on available API keys + +## Usage Examples + +### Using Gemini (Recommended) +```env +LLM_PROVIDER=gemini +GEMINI_API_KEY=AIza...your_key +LLM_MODEL=gemini-1.5-flash +``` + +### Using OpenAI +```env +LLM_PROVIDER=openai +OPENAI_API_KEY=sk-...your_key +LLM_MODEL=gpt-3.5-turbo +``` + +### Using Perplexity AI +```env +LLM_PROVIDER=openai +LLM_API_KEY=pplx-...your_key +LLM_BASE_URL=https://api.perplexity.ai +LLM_MODEL=llama-3.1-sonar-small-128k-online +``` + +### Using Local/Custom API +```env +LLM_PROVIDER=openai +LLM_API_KEY=your_local_key +LLM_BASE_URL=http://localhost:1234/v1 +LLM_MODEL=local-model +``` + +## API Endpoints + +### Check Provider Status +```bash +curl http://localhost:8080/clips/llm/provider-info +``` + +Response: +```json +{ + "status": "success", + "data": { + "provider": "gemini", + "available": ["gemini", "openai"], + "model": "gemini-1.5-flash" + } +} +``` + +## Important Notes + +### Audio Transcription Limitation +- **Audio transcription currently only works with Gemini** due to its file upload API +- OpenAI Whisper integration is planned for future releases +- For now, you can use OpenAI for analysis while keeping Gemini for transcription + +### Fallback Behavior +- If your primary provider fails, the system automatically tries other configured providers +- This ensures high availability even if one service is down + +### Cost Optimization +- **Gemini**: Free tier with generous limits, best for getting started +- **OpenAI**: Pay-per-use, higher quality but more expensive +- **Alternatives**: Often cheaper or free options available + +## Getting API Keys + +### Google Gemini (Free) +1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) +2. Click "Create API key" +3. Copy the key to your `.env` file + +### OpenAI (Paid) +1. Visit [OpenAI API](https://platform.openai.com/api-keys) +2. Create new secret key +3. Copy the key to your `.env` file + +### Perplexity AI (Paid) +1. Visit [Perplexity API](https://www.perplexity.ai/settings/api) +2. Generate API key +3. Set `LLM_BASE_URL=https://api.perplexity.ai` + +## Troubleshooting + +### "No LLM provider configured" +- Ensure you have set at least one of: `GEMINI_API_KEY`, `OPENAI_API_KEY`, or `LLM_API_KEY` +- Check that your API keys are valid and not expired + +### Provider Info Shows Empty Available Array +- This means no valid API keys were detected +- Verify your environment variables are loaded correctly +- Check API key format and validity + +### Analysis Works But Transcription Fails +- This is expected when using only OpenAI/custom providers +- Keep `GEMINI_API_KEY` for transcription functionality +- The system will use Gemini for transcription and your chosen provider for analysis + +## Migration Guide + +### From Gemini-Only Setup +Your existing setup will continue working without changes. To add OpenAI: + +```env +# Keep existing Gemini configuration +GEMINI_API_KEY=your_existing_key + +# Add OpenAI as secondary provider +OPENAI_API_KEY=your_openai_key + +# Optional: Switch primary provider +LLM_PROVIDER=openai +``` + +### For New Installations +Choose your preferred configuration: + +**Option 1: Gemini Only (Recommended for beginners)** +```env +LLM_PROVIDER=gemini +GEMINI_API_KEY=your_gemini_key +LLM_MODEL=gemini-1.5-flash +``` + +**Option 2: Hybrid Setup (Best of both worlds)** +```env +LLM_PROVIDER=openai +GEMINI_API_KEY=your_gemini_key # For transcription +OPENAI_API_KEY=your_openai_key # For analysis +LLM_MODEL=gpt-3.5-turbo +``` + +This setup gives you free transcription with Gemini and high-quality analysis with OpenAI. \ No newline at end of file diff --git a/backend/.env.example b/backend/.env.example index 4d498a9..3861c1d 100644 --- a/backend/.env.example +++ b/backend/.env.example @@ -1,6 +1,21 @@ # Port for the backend server PORT=8080 -# Gemini API Key +# LLM Provider Configuration +# Primary provider: 'gemini' (default) or 'openai' +LLM_PROVIDER=gemini + +# Gemini API Key (Google) GEMINI_API_KEY="ENTER YOUR API KEY HERE" -LLM_MODEL=gemini-2.5-flash + +# OpenAI API Configuration +# Use OPENAI_API_KEY for official OpenAI API +OPENAI_API_KEY="ENTER YOUR OPENAI API KEY HERE" + +# Alternative: Use LLM_API_KEY for OpenAI-compatible APIs (e.g., Perplexity, local APIs) +# LLM_API_KEY="ENTER YOUR API KEY HERE" +# LLM_BASE_URL="https://api.perplexity.ai" # Custom base URL for OpenAI-compatible APIs + +# Model selection (provider-specific) +LLM_MODEL=gemini-2.5-flash # For Gemini: gemini-1.5-flash, gemini-2.5-flash, etc. +# LLM_MODEL=gpt-3.5-turbo # For OpenAI: gpt-3.5-turbo, gpt-4, gpt-4-turbo, etc. diff --git a/backend/package-lock.json b/backend/package-lock.json index c485a21..e7a68af 100644 --- a/backend/package-lock.json +++ b/backend/package-lock.json @@ -19,6 +19,7 @@ "express": "^4.18.2", "fluent-ffmpeg": "^2.1.2", "multer": "^2.0.1", + "openai": "^5.23.1", "redis": "^5.6.0", "uuid": "^9.0.1", "winston": "^3.17.0", @@ -5444,6 +5445,27 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/openai": { + "version": "5.23.1", + "resolved": "https://registry.npmjs.org/openai/-/openai-5.23.1.tgz", + "integrity": "sha512-APxMtm5mln4jhKhAr0d5zP9lNsClx4QwJtg8RUvYSSyxYCTHLNJnLEcSHbJ6t0ori8Pbr9HZGfcPJ7LEy73rvQ==", + "license": "Apache-2.0", + "bin": { + "openai": "bin/cli" + }, + "peerDependencies": { + "ws": "^8.18.0", + "zod": "^3.23.8" + }, + "peerDependenciesMeta": { + "ws": { + "optional": true + }, + "zod": { + "optional": true + } + } + }, "node_modules/p-limit": { "version": "3.1.0", "resolved": "https://registry.npmjs.org/p-limit/-/p-limit-3.1.0.tgz", diff --git a/backend/package.json b/backend/package.json index 3214a98..2983b5b 100644 --- a/backend/package.json +++ b/backend/package.json @@ -19,9 +19,10 @@ "dotenv": "^16.3.1", "express": "^4.18.2", "fluent-ffmpeg": "^2.1.2", - "uuid": "^9.0.1", "multer": "^2.0.1", + "openai": "^5.23.1", "redis": "^5.6.0", + "uuid": "^9.0.1", "winston": "^3.17.0", "winston-daily-rotate-file": "^5.0.0" }, diff --git a/backend/src/routes/analyze.js b/backend/src/routes/analyze.js index 3f0169a..01d1f20 100644 --- a/backend/src/routes/analyze.js +++ b/backend/src/routes/analyze.js @@ -1,7 +1,7 @@ const express = require('express'); const router = express.Router(); const Transcript = require('../models/Transcript'); -const { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } = require('@google/generative-ai'); +const llmService = require('../services/llmService'); router.post('/:transcriptId', async (req, res) => { try { @@ -13,97 +13,10 @@ router.post('/:transcriptId', async (req, res) => { // Join transcript segments into a single string for analysis by the LLM const fullTranscriptText = transcriptDoc.transcript.map(segment => segment.text).join(' '); - const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); - const model = genAI.getGenerativeModel({ - model: process.env.LLM_MODEL || 'gemini-1.5-flash', - }); - - const videoDurationText = transcriptDoc.duration ? ` The video is ${Math.floor(transcriptDoc.duration / 60)}:${String(Math.floor(transcriptDoc.duration % 60)).padStart(2, '0')} long.` : ''; - const maxTimeFormatted = Math.floor(transcriptDoc.duration / 60) + ':' + String(Math.floor(transcriptDoc.duration % 60)).padStart(2, '0'); - const prompt = `Given the following transcript, propose 3-5 video clips that would make engaging short content.${videoDurationText} - -CRITICAL CONSTRAINTS: -- Video duration is EXACTLY ${videoDurationText ? maxTimeFormatted : 'unknown'} - DO NOT suggest any timestamps beyond this -- Each clip should be 30-90 seconds total duration -- All timestamps must be in MM:SS format and within 0:00 to ${maxTimeFormatted} - -You can suggest two types of clips: - -1. SINGLE SEGMENT clips: One continuous segment from start time to end time -2. MULTI-SEGMENT clips: Multiple segments that when combined tell a coherent story - -For single segments: provide 'start' and 'end' times in MM:SS format. -For multi-segments: provide an array of segments in 'segments' field, each with 'start' and 'end' times. - -VALIDATION RULES: -- Every timestamp must be ≤ ${maxTimeFormatted} -- Total duration must be 30-90 seconds -- Focus on complete thoughts or exchanges -- Ensure segments make sense when combined - -Output format: JSON array where each object has: -- 'title': descriptive title -- For single segments: 'start' and 'end' fields -- For multi-segments: 'segments' array with objects containing 'start' and 'end' - -Transcript: ${fullTranscriptText}`; - - const result = await model.generateContent({ - contents: [{ - role: 'user', - parts: [{ text: prompt }], - }], - generationConfig: { - responseMimeType: 'application/json', - responseSchema: { - type: 'ARRAY', - items: { - type: 'OBJECT', - properties: { - title: { type: 'STRING' }, - start: { type: 'STRING' }, - end: { type: 'STRING' }, - segments: { - type: 'ARRAY', - items: { - type: 'OBJECT', - properties: { - start: { type: 'STRING' }, - end: { type: 'STRING' }, - }, - required: ['start', 'end'], - }, - }, - }, - required: ['title'], - propertyOrdering: ['title', 'start', 'end', 'segments'], - }, - }, - }, - safetySettings: [ - { - category: HarmCategory.HARM_CATEGORY_HARASSMENT, - threshold: HarmBlockThreshold.BLOCK_NONE, - }, - { - category: HarmCategory.HARM_CATEGORY_HATE_SPEECH, - threshold: HarmBlockThreshold.BLOCK_NONE, - }, - { - category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, - threshold: HarmBlockThreshold.BLOCK_NONE, - }, - { - category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, - threshold: HarmBlockThreshold.BLOCK_NONE, - }, - ], - }); - - const response = await result.response; - const suggestedClips = JSON.parse(response.text()); + // Use the LLM service for analysis + const suggestedClips = await llmService.analyzeTranscript(fullTranscriptText, transcriptDoc.duration, maxTimeFormatted); // Convert MM:SS time format to seconds for database storage const convertTimeToSeconds = (timeString) => { diff --git a/backend/src/routes/import.js b/backend/src/routes/import.js index 8701af3..0d1e141 100644 --- a/backend/src/routes/import.js +++ b/backend/src/routes/import.js @@ -5,8 +5,7 @@ const fs = require('fs'); const path = require('path'); const { exec } = require('child_process'); const Transcript = require('../models/Transcript'); -const { GoogleGenerativeAI } = require('@google/generative-ai'); -const { GoogleAIFileManager } = require('@google/generative-ai/server'); +const llmService = require('../services/llmService'); const router = express.Router(); @@ -296,51 +295,7 @@ router.post('/url', async (req, res) => { }); // Start transcription - const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); - const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY); - - const uploadResult = await fileManager.uploadFile(mp3DestPath, { - mimeType: 'audio/mpeg', - displayName: mp3FileName - }); - - const model = genAI.getGenerativeModel({ - model: process.env.LLM_MODEL || 'gemini-1.5-flash', - }); - - const audioPart = { fileData: { mimeType: uploadResult.file.mimeType, fileUri: uploadResult.file.uri } }; - - const prompt = "Transcribe the provided audio with word-level timestamps and identify the speaker for each word. Format the output as a JSON array of objects, where each object represents a single word with precise millisecond timing. Each object should have 'start' (in format MM:SS:mmm), 'end' (in format MM:SS:mmm), 'text' (single word), and 'speaker' fields. For example: [{'start': '00:00:000', 'end': '00:00:450', 'text': 'Hello', 'speaker': 'Speaker 1'}, {'start': '00:00:450', 'end': '00:00:890', 'text': 'world', 'speaker': 'Speaker 1'}]"; - - const result = await model.generateContent({ - contents: [{ - role: 'user', - parts: [ - { text: prompt }, - audioPart, - ], - }], - generationConfig: { - responseMimeType: 'application/json', - responseSchema: { - type: 'ARRAY', - items: { - type: 'OBJECT', - properties: { - start: { type: 'STRING' }, - end: { type: 'STRING' }, - text: { type: 'STRING' }, - speaker: { type: 'STRING' }, - }, - required: ['start', 'end', 'text', 'speaker'], - propertyOrdering: ['start', 'end', 'text', 'speaker'], - }, - }, - }, - }); - - const response = await result.response; - const transcriptContent = JSON.parse(response.text()); + const transcriptContent = await llmService.transcribeAudio(mp3DestPath, mp3FileName); // Update transcript with transcription and mark as completed const finalTranscript = await Transcript.findByIdAndUpdate(transcript._id, { diff --git a/backend/src/routes/index.js b/backend/src/routes/index.js index 7f35889..9362eee 100644 --- a/backend/src/routes/index.js +++ b/backend/src/routes/index.js @@ -11,6 +11,7 @@ const importRoutes = require('./import'); const retryRoutes = require('./retry-transcription'); const reframeRoutes = require('./reframe'); const streamerRoutes = require('./streamer'); +const llmInfoRoutes = require('./llm-info'); // Mount specific routes. Order matters for wildcard routes. @@ -24,5 +25,6 @@ router.use('/reframe', reframeRoutes); // AI-powered video reframing for social router.use('/streamer', streamerRoutes); // Streamer & gameplay video processing router.use('/retry', retryRoutes); // Retry failed operations router.use('/admin', fixStatusRoutes); // Admin routes for fixing data issues +router.use('/llm', llmInfoRoutes); // LLM provider information and configuration module.exports = router; \ No newline at end of file diff --git a/backend/src/routes/llm-info.js b/backend/src/routes/llm-info.js new file mode 100644 index 0000000..72e0ade --- /dev/null +++ b/backend/src/routes/llm-info.js @@ -0,0 +1,22 @@ +const express = require('express'); +const router = express.Router(); +const llmService = require('../services/llmService'); + +// Get LLM provider information +router.get('/provider-info', async (req, res) => { + try { + const providerInfo = llmService.getProviderInfo(); + res.status(200).json({ + status: 'success', + data: providerInfo + }); + } catch (error) { + console.error('Error getting provider info:', error); + res.status(500).json({ + status: 'error', + error: 'Failed to get LLM provider information' + }); + } +}); + +module.exports = router; \ No newline at end of file diff --git a/backend/src/routes/retry-transcription.js b/backend/src/routes/retry-transcription.js index aefc5e9..322c2b8 100644 --- a/backend/src/routes/retry-transcription.js +++ b/backend/src/routes/retry-transcription.js @@ -1,7 +1,6 @@ const express = require('express'); const Transcript = require('../models/Transcript'); -const { GoogleGenerativeAI } = require('@google/generative-ai'); -const { GoogleAIFileManager } = require('@google/generative-ai/server'); +const llmService = require('../services/llmService'); const { Storage } = require('@google-cloud/storage'); const fs = require('fs'); const path = require('path'); @@ -62,62 +61,13 @@ router.post('/retry/:transcriptId', async (req, res) => { return res.status(500).json({ error: 'Failed to download MP3 for transcription' }); } - // Initialize Gemini API - const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); - const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY); - - // Upload the MP3 file to Gemini File API - console.log('Uploading MP3 to Gemini API...'); - const uploadResult = await fileManager.uploadFile(tempMp3Path, { - mimeType: 'audio/mpeg', - displayName: transcript.originalFilename.replace(/\.mp4$/, '.mp3') - }); - - const model = genAI.getGenerativeModel({ - model: process.env.LLM_MODEL || 'gemini-1.5-flash', - }); + // Use LLM service for transcription + console.log('Starting transcription with LLM service...'); - const audioPart = { fileData: { mimeType: uploadResult.file.mimeType, fileUri: uploadResult.file.uri } }; - - const prompt = "Transcribe the provided audio into segments with start and end times, and identify the speaker for each segment. Format the output as a JSON array of objects, where each object has 'start', 'end', 'text', and 'speaker' fields. For example: [{'start': '00:00', 'end': '00:05', 'text': 'Hello world.', 'speaker': 'Speaker 1'}]"; - - console.log('Sending transcription request to Gemini...'); - - // Add timeout to the API call - const timeoutPromise = new Promise((_, reject) => { - setTimeout(() => reject(new Error('Transcription timeout after 5 minutes')), 5 * 60 * 1000); - }); - - const transcriptionPromise = model.generateContent({ - contents: [{ - role: 'user', - parts: [ - { text: prompt }, - audioPart, - ], - }], - generationConfig: { - responseMimeType: 'application/json', - responseSchema: { - type: 'ARRAY', - items: { - type: 'OBJECT', - properties: { - start: { type: 'STRING' }, - end: { type: 'STRING' }, - text: { type: 'STRING' }, - speaker: { type: 'STRING' }, - }, - required: ['start', 'end', 'text', 'speaker'], - propertyOrdering: ['start', 'end', 'text', 'speaker'], - }, - }, - }, - }); - - const result = await Promise.race([transcriptionPromise, timeoutPromise]); - const response = await result.response; - const transcriptContent = JSON.parse(response.text()); + const transcriptContent = await llmService.transcribeAudio( + tempMp3Path, + transcript.originalFilename.replace(/\.mp4$/, '.mp3') + ); console.log(`Transcription completed with ${transcriptContent.length} segments`); diff --git a/backend/src/routes/upload.js b/backend/src/routes/upload.js index 8453c84..257eb67 100644 --- a/backend/src/routes/upload.js +++ b/backend/src/routes/upload.js @@ -5,8 +5,7 @@ const { exec } = require('child_process'); const fs = require('fs'); const path = require('path'); const Transcript = require('../models/Transcript'); -const { GoogleGenerativeAI } = require('@google/generative-ai'); -const { GoogleAIFileManager } = require('@google/generative-ai/server'); +const llmService = require('../services/llmService'); // The application will now use Application Default Credentials (ADC) in all environments. // For local development, authenticate by running `gcloud auth application-default login`. @@ -115,52 +114,8 @@ router.post('/file', upload.single('video'), async (req, res) => { transcript = await Transcript.findByIdAndUpdate(transcript._id, transcript, { new: true }); console.log(`Updated transcript ${transcript._id} status: transcribing`); - const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); - const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY); - - // Upload the local MP3 file directly to Gemini File API - const uploadResult = await fileManager.uploadFile(mp3DestPath, { - mimeType: 'audio/mpeg', - displayName: mp3FileName - }); - - const model = genAI.getGenerativeModel({ - model: process.env.LLM_MODEL || 'gemini-1.5-flash', - }); - - const audioPart = { fileData: { mimeType: uploadResult.file.mimeType, fileUri: uploadResult.file.uri } }; - - const prompt = "Transcribe the provided audio with word-level timestamps and identify the speaker for each word. Format the output as a JSON array of objects, where each object represents a single word with precise millisecond timing. Each object should have 'start' (in format MM:SS:mmm), 'end' (in format MM:SS:mmm), 'text' (single word), and 'speaker' fields. For example: [{'start': '00:00:000', 'end': '00:00:450', 'text': 'Hello', 'speaker': 'Speaker 1'}, {'start': '00:00:450', 'end': '00:00:890', 'text': 'world', 'speaker': 'Speaker 1'}]"; - - const result = await model.generateContent({ - contents: [{ - role: 'user', - parts: [ - { text: prompt }, - audioPart, - ], - }], - generationConfig: { - responseMimeType: 'application/json', - responseSchema: { - type: 'ARRAY', - items: { - type: 'OBJECT', - properties: { - start: { type: 'STRING' }, - end: { type: 'STRING' }, - text: { type: 'STRING' }, - speaker: { type: 'STRING' }, - }, - required: ['start', 'end', 'text', 'speaker'], - propertyOrdering: ['start', 'end', 'text', 'speaker'], - }, - }, - }, - }); - - const response = await result.response; - const transcriptContent = JSON.parse(response.text()); + // Use the LLM service for transcription + const transcriptContent = await llmService.transcribeAudio(mp3DestPath, mp3FileName); // Update existing transcript with all data and mark as completed transcript.transcript = transcriptContent; diff --git a/backend/src/services/llmService.js b/backend/src/services/llmService.js new file mode 100644 index 0000000..d8fa480 --- /dev/null +++ b/backend/src/services/llmService.js @@ -0,0 +1,255 @@ +const { GoogleGenerativeAI, GoogleAIFileManager, HarmCategory, HarmBlockThreshold } = require('@google/generative-ai'); +const OpenAI = require('openai'); + +class LLMService { + constructor() { + this.provider = process.env.LLM_PROVIDER || 'gemini'; + this.initializeProviders(); + } + + initializeProviders() { + // Initialize Gemini (default/fallback) + if (process.env.GEMINI_API_KEY) { + this.geminiClient = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); + // Don't initialize file manager here - create it when needed + } + + // Initialize OpenAI-compatible clients + if (process.env.OPENAI_API_KEY || process.env.LLM_API_KEY) { + const config = { + apiKey: process.env.OPENAI_API_KEY || process.env.LLM_API_KEY, + }; + + // Support custom base URL for OpenAI-compatible APIs + if (process.env.LLM_BASE_URL) { + config.baseURL = process.env.LLM_BASE_URL; + } + + this.openaiClient = new OpenAI(config); + } + } + + async analyzeTranscript(transcriptText, videoDuration, maxTimeFormatted) { + const prompt = `Given the following transcript, propose 3-5 video clips that would make engaging short content. The video is ${Math.floor(videoDuration / 60)}:${String(Math.floor(videoDuration % 60)).padStart(2, '0')} long. + +CRITICAL CONSTRAINTS: +- Video duration is EXACTLY ${maxTimeFormatted} - DO NOT suggest any timestamps beyond this +- Each clip should be 30-90 seconds total duration +- All timestamps must be in MM:SS format and within 0:00 to ${maxTimeFormatted} + +You can suggest two types of clips: + +1. SINGLE SEGMENT clips: One continuous segment from start time to end time +2. MULTI-SEGMENT clips: Multiple segments that when combined tell a coherent story + +For single segments: provide 'start' and 'end' times in MM:SS format. +For multi-segments: provide an array of segments in 'segments' field, each with 'start' and 'end' times. + +VALIDATION RULES: +- Every timestamp must be ≤ ${maxTimeFormatted} +- Total duration must be 30-90 seconds +- Focus on complete thoughts or exchanges +- Ensure segments make sense when combined + +Output format: JSON array where each object has: +- 'title': descriptive title +- For single segments: 'start' and 'end' fields +- For multi-segments: 'segments' array with objects containing 'start' and 'end' + +Transcript: ${transcriptText}`; + + try { + if (this.provider === 'openai' && this.openaiClient) { + return await this.analyzeWithOpenAI(prompt); + } else if (this.provider === 'gemini' && this.geminiClient) { + return await this.analyzeWithGemini(prompt); + } else { + // Fallback to available provider + if (this.geminiClient) { + return await this.analyzeWithGemini(prompt); + } else if (this.openaiClient) { + return await this.analyzeWithOpenAI(prompt); + } else { + throw new Error('No LLM provider configured. Please set GEMINI_API_KEY or OPENAI_API_KEY/LLM_API_KEY'); + } + } + } catch (error) { + console.error(`LLM analysis failed with ${this.provider}:`, error); + // Try fallback if primary fails + if (this.provider === 'openai' && this.geminiClient) { + console.log('Falling back to Gemini...'); + return await this.analyzeWithGemini(prompt); + } else if (this.provider === 'gemini' && this.openaiClient) { + console.log('Falling back to OpenAI...'); + return await this.analyzeWithOpenAI(prompt); + } + throw error; + } + } + + async analyzeWithGemini(prompt) { + const model = this.geminiClient.getGenerativeModel({ + model: process.env.LLM_MODEL || 'gemini-1.5-flash', + }); + + const result = await model.generateContent({ + contents: [{ + role: 'user', + parts: [{ text: prompt }], + }], + generationConfig: { + responseMimeType: 'application/json', + responseSchema: { + type: 'ARRAY', + items: { + type: 'OBJECT', + properties: { + title: { type: 'STRING' }, + start: { type: 'STRING' }, + end: { type: 'STRING' }, + segments: { + type: 'ARRAY', + items: { + type: 'OBJECT', + properties: { + start: { type: 'STRING' }, + end: { type: 'STRING' }, + }, + required: ['start', 'end'], + }, + }, + }, + required: ['title'], + propertyOrdering: ['title', 'start', 'end', 'segments'], + }, + }, + }, + safetySettings: [ + { + category: HarmCategory.HARM_CATEGORY_HARASSMENT, + threshold: HarmBlockThreshold.BLOCK_NONE, + }, + { + category: HarmCategory.HARM_CATEGORY_HATE_SPEECH, + threshold: HarmBlockThreshold.BLOCK_NONE, + }, + { + category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, + threshold: HarmBlockThreshold.BLOCK_NONE, + }, + { + category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, + threshold: HarmBlockThreshold.BLOCK_NONE, + }, + ], + }); + + const response = await result.response; + return JSON.parse(response.text()); + } + + async analyzeWithOpenAI(prompt) { + const response = await this.openaiClient.chat.completions.create({ + model: process.env.LLM_MODEL || 'gpt-3.5-turbo', + messages: [ + { + role: 'system', + content: 'You are a video editing assistant that analyzes transcripts and suggests engaging clips. Always respond with valid JSON only.' + }, + { + role: 'user', + content: prompt + } + ], + response_format: { type: 'json_object' }, + temperature: 0.7, + }); + + const content = response.choices[0].message.content; + const parsed = JSON.parse(content); + + // OpenAI might return an object with clips array, normalize to array + return Array.isArray(parsed) ? parsed : (parsed.clips || []); + } + + async transcribeAudio(audioFilePath, originalFilename) { + // Note: Audio transcription is more complex as it requires file handling + // For now, this will only work with Gemini due to its file upload API + // OpenAI Whisper requires different handling and is typically used for pure transcription + + if (!this.geminiClient) { + throw new Error('Audio transcription currently requires Gemini API. Please set GEMINI_API_KEY.'); + } + + return await this.transcribeWithGemini(audioFilePath, originalFilename); + } + + async transcribeWithGemini(audioFilePath, originalFilename) { + // Create file manager when needed + const geminiFileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY); + + // Upload the audio file to Gemini File API + const uploadResult = await geminiFileManager.uploadFile(audioFilePath, { + mimeType: 'audio/mpeg', + displayName: originalFilename || 'audio.mp3' + }); + + const model = this.geminiClient.getGenerativeModel({ + model: process.env.LLM_MODEL || 'gemini-1.5-flash', + }); + + const audioPart = { fileData: { mimeType: uploadResult.file.mimeType, fileUri: uploadResult.file.uri } }; + + const prompt = "Transcribe the provided audio with word-level timestamps and identify the speaker for each word. Format the output as a JSON array of objects, where each object represents a single word with precise millisecond timing. Each object should have 'start' (in format MM:SS:mmm), 'end' (in format MM:SS:mmm), 'text' (single word), and 'speaker' fields. For example: [{'start': '00:00:000', 'end': '00:00:450', 'text': 'Hello', 'speaker': 'Speaker 1'}, {'start': '00:00:450', 'end': '00:00:890', 'text': 'world', 'speaker': 'Speaker 1'}]"; + + const result = await model.generateContent({ + contents: [{ + role: 'user', + parts: [ + { text: prompt }, + audioPart, + ], + }], + generationConfig: { + responseMimeType: 'application/json', + responseSchema: { + type: 'ARRAY', + items: { + type: 'OBJECT', + properties: { + start: { type: 'STRING' }, + end: { type: 'STRING' }, + text: { type: 'STRING' }, + speaker: { type: 'STRING' }, + }, + required: ['start', 'end', 'text', 'speaker'], + propertyOrdering: ['start', 'end', 'text', 'speaker'], + }, + }, + }, + }); + + const response = await result.response; + return JSON.parse(response.text()); + } + + // Helper method to check which providers are available + getAvailableProviders() { + const providers = []; + if (this.geminiClient) providers.push('gemini'); + if (this.openaiClient) providers.push('openai'); + return providers; + } + + // Get current provider info + getProviderInfo() { + return { + provider: this.provider, + available: this.getAvailableProviders(), + model: process.env.LLM_MODEL || (this.provider === 'gemini' ? 'gemini-1.5-flash' : 'gpt-3.5-turbo') + }; + } +} + +// Export singleton instance +module.exports = new LLMService(); \ No newline at end of file