Skip to content

AI News Pipeline

Mundo edited this page Apr 8, 2026 · 2 revisions

GitHub Trending Pipeline

Overview

Every morning at 8am Sydney time, the GitHub Trending pipeline scrapes trending TypeScript and JavaScript repos, deduplicates against recently sent repos, curates the top picks with an LLM, sends a formatted digest via Telegram, and saves the repos to the database.

Pipeline Flow

Scrape GitHub Trending (TS + JS)
    ↓ all repos
Dedup against DB (last 7 days)
    ↓ new repos only
trendingCuratorAgent (LLM ranks, summarizes & tags)
    ↓ curated repos
trendingTelegramAgent (formatted message)
    ↓
saveTrendingRepos() (PostgreSQL)

Step-by-Step

Step 1: Scrape

The trendingScrapeTool fetches GitHub trending pages for TypeScript and JavaScript:

  • https://github.com/trending/typescript?since=daily
  • https://github.com/trending/javascript?since=daily

It parses the HTML to extract: repo name, description, language, total stars, and stars gained today. Repos appearing on both lists are deduplicated.

Step 2: Dedup

Before sending to the LLM, repos are checked against the github_trending table. Any repo already saved in the last 7 days is filtered out. This prevents the same trending repo from appearing in multiple digests.

If the DB query fails, all scraped repos are passed through (graceful degradation).

Step 3: Curate

The trendingCuratorAgent receives the new repos and uses Claude to:

  • Select the top 5-8 most interesting repos
  • Write a 1-2 sentence summary for each explaining why it's interesting
  • Add 3-5 lowercase tags per repo for classification (e.g. ai, framework, bundler, devtools)
  • Rank by relevance to TS/JS developers
  • Preserve all original fields (stars, language, etc.)

Step 4: Send via Telegram

The trendingTelegramAgent formats the curated repos into a Telegram message with:

  • Numbered emoji headers (1️⃣, 2️⃣, 3️⃣, ...)
  • Bold repo names
  • Star count with today's gains and language
  • Italic summaries
  • Hashtag tags
  • Clickable "View on GitHub" links

Example output:

🔥 GitHub Trending — Daily Digest
📅 2026-04-08  ·  TypeScript / JavaScript

━━━━━━━━━━━━━━━━━━━━━━

1️⃣ vercel/ai
⭐ 15,234 (+342 today) · TypeScript
Full-stack AI SDK with streaming, tool calls, and structured outputs
🏷 #ai #framework #typescript #vercel
🔗 View on GitHub

2️⃣ oxc-project/oxc
⭐ 8,901 (+189 today) · Rust
Blazing fast JS/TS linter and parser — drop-in ESLint replacement
🏷 #tooling #linter #performance
🔗 View on GitHub

━━━━━━━━━━━━━━━━━━━━━━
📊 2 repos  ·  Powered by GitHub Trending

Step 5: Save to Database

Each curated repo is saved to the github_trending table with repo name, URL, description, language, stars, today's stars, summary, tags, sent status, and timestamps.

Configuration

Environment Variable Description
TELEGRAM_BOT_TOKEN From @BotFather on Telegram
TELEGRAM_CHAT_ID From @userinfobot on Telegram
DATABASE_URL PostgreSQL (Neon) connection string
ANTHROPIC_API_KEY For LLM curation step

No external search API key needed — data comes directly from GitHub's public trending pages.

Running Manually

pnpm news

Future: Vector Search

The tags column enables future classification and filtering. When ready for semantic search:

  1. Add embeddings via pgvector on Neon
  2. Use tags + embeddings for hybrid search (keyword + semantic)
  3. Build a personal knowledge base of trending repos over time

Clone this wiki locally