This module enables Lyra to analyze LinkedIn drafts that include images. It extracts structured context from images, fuses it with draft text, and attributes prediction signals to their source.
This is NOT a captioning feature. It is a signal extraction + fusion feature for recruiting intelligence.
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Draft Text │────▶│ buildFused │────▶│ ML Prediction │
└─────────────────┘ │ InputText() │ │ (existing) │
└──────────────────┘ └─────────────────┘
┌─────────────────┐ │ │
│ Images[] │────▶┌──────▼──────────┐ │
└─────────────────┘ │ extractImage │ │
│ Context() │ │
│ (Gemini Vision) │ │
└─────────────────┘ │
▼
┌─────────────────┐
│ attributeSignals│
│ () │
└─────────────────┘
│
▼
┌─────────────────┐
│ Final Response │
│ with Attribution│
└─────────────────┘
Request:
{
"draft_text": "Excited to share our new feature...",
"images": [
{
"id": "img1",
"mime": "image/png",
"data": "<base64-encoded-image>"
},
{
"id": "img2",
"mime": "image/jpeg",
"url": "https://example.com/image.jpg"
}
]
}Response:
{
"fused_input_text": "[DRAFT TEXT]\nExcited to share...\n\n[IMAGE CONTEXT]\n- Image img1: Screenshot of product dashboard showing metrics...",
"image_context_objects": [
{
"image_id": "img1",
"high_level_type": "product_ui",
"primary_subjects": ["dashboard", "metrics"],
"setting": ["office", "technology"],
"brand_markers": ["company logo"],
"visual_tone": "professional",
"engineering_signals": {
"contains_code": false,
"contains_architecture_diagram": false,
"contains_metrics_or_dashboards": true,
"contains_dev_tools": false,
"contains_open_source_markers": false
},
"recruiting_signals": {
"contains_hiring_language": false,
"contains_role_titles": [],
"contains_compensation_or_benefits": false,
"contains_company_values": [],
"contains_event_or_booth": false
},
"risk_signals": {
"contains_political_content": false,
"contains_sensitive_or_inflammatory_language": false,
"contains_personal_attack_or_harassment": false,
"contains_unsafe_or_illegal_activity": false,
"contains_sexual_content": false
},
"image_text_snippets": ["Q4 Results"],
"keywords": ["dashboard", "metrics", "analytics", "growth"],
"one_line_context_summary": "Product dashboard screenshot showing quarterly metrics and analytics"
}
],
"predictions": {
"role_distribution_top5": [
{ "role": "Product Manager", "pct": 28.5 },
{ "role": "Engineering Manager", "pct": 22.3 }
],
"narratives": [
{ "name": "product_launch", "prob": 0.72 },
{ "name": "thought_leadership", "prob": 0.45 }
],
"risk": {
"risk_class": "Helpful",
"risk_probs": { "Harmful": 0.05, "Helpful": 0.72, "Harmless": 0.23 },
"risk_level": "Low",
"primary_risk_reason": "Product announcement with clear value"
}
},
"attribution": {
"text_driven_signals": ["narrative:product_launch", "risk:Helpful"],
"image_driven_signals": ["signal:engineering-context"],
"mixed_signals": ["role:Product Manager"]
}
}TypeScript interfaces for all image analysis types.
Vision API adapter using Gemini 2.0 Flash.
Key function:
extractImageContext(image: ImageInput): Promise<ImageContext>Extracts structured JSON from image:
high_level_type: screenshot, code, infographic, etc.engineering_signals: code, architecture, metrics, dev-toolsrecruiting_signals: hiring language, role titles, eventsrisk_signals: political, inflammatory, harassment, unsafekeywords: semantic keywords for attributionone_line_context_summary: factual description
Builds fused input text for ML consumption.
Key function:
buildFusedInputText(draftText: string, imageContexts: ImageContext[]): stringFormat:
[DRAFT TEXT]
{draft_text}
[IMAGE CONTEXT]
- Image {id}: {one_line_context_summary}
- type: {high_level_type}
- engineering: {key signals}
- recruiting: {key signals}
- risk: {key signals}
- keywords: {top 5}
Attributes prediction signals to text, image, or both.
Key function:
attributeSignals(draftText, imageContexts, predictions): SignalAttributionRules:
- Signal keyword only in image →
image_driven_signals - Signal keyword only in text →
text_driven_signals - Signal keyword in both →
mixed_signals
Main orchestration.
Key function:
analyzeWithImages(request: AnalyzeWithImagesRequest): Promise<AnalyzeWithImagesResponse>interface ImageContext {
image_id: string
high_level_type: 'screenshot' | 'infographic' | 'code' | 'product_ui' |
'people_photo' | 'office_photo' | 'event_photo' |
'meme' | 'document' | 'chart' | 'other'
primary_subjects: string[]
setting: string[]
brand_markers: string[]
visual_tone: 'professional' | 'casual' | 'hype' | 'serious' |
'playful' | 'controversial' | 'unknown'
engineering_signals: EngineeringSignals
recruiting_signals: RecruitingSignals
risk_signals: RiskSignals
image_text_snippets: string[] // max 3
keywords: string[] // max 10
one_line_context_summary: string
}- Raw images are NEVER logged - only extracted JSON context
- No private attribute inference - no age, ethnicity, etc.
- Conservative risk flags - only set true if clearly present
- Short summaries - factual, neutral, minimal
Environment Variables:
GEMINI_API_KEY=your-api-key
NEXT_PUBLIC_ML_API_URL=http://localhost:8000Vision Model: Gemini 2.0 Flash (fast, cost-effective)
The VisionAPIAdapter interface allows swapping providers:
interface VisionAPIAdapter {
extractImageContext(image: ImageInput): Promise<ImageContext>
}To add OpenAI Vision or Claude:
- Create new adapter file (e.g.,
openai-adapter.ts) - Implement the
extractImageContextfunction - Update import in
index.ts
| File | Purpose |
|---|---|
types/image-analysis.ts |
TypeScript interfaces |
lib/image-analysis/vision-adapter.ts |
Gemini Vision extraction |
lib/image-analysis/fusion.ts |
Text + image fusion |
lib/image-analysis/attribution.ts |
Signal attribution |
lib/image-analysis/index.ts |
Main orchestration |
app/api/analyze-with-images/route.ts |
API endpoint |
✅ TypeScript: 0 errors
✅ Production build: successful
✅ API route: /api/analyze-with-images ready