Skip to content

Create rss collection with feed syncing #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

samwillis
Copy link
Collaborator

@samwillis samwillis commented Aug 3, 2025

This PR introduces comprehensive RSS and Atom feed collection capabilities to TanStack DB, enabling seamless integration of external feed data with automatic polling, deduplication, and full TypeScript support.

note that he failing "test" in CI is just the preview package as this package doesn't exist in npm yet

✨ Key Features

  • �� RSS 2.0 & Atom 1.0 Support: Dedicated collection options for both feed formats with automatic type detection
  • 🔄 Intelligent Polling: Configurable polling intervals with automatic error recovery and manual refresh capabilities
  • ✨ Built-in Deduplication: Automatic deduplication based on feed item IDs/GUIDs with configurable memory limits
  • 🔧 Custom Transform Functions: Flexible transformation of feed data to match your application's schema
  • 📝 Full TypeScript Support: Complete type safety with schema inference and generic type support
  • ��️ Mutation Handlers: Support for onInsert, onUpdate, and onDelete callbacks
  • ⚡ TanStack DB Integration: Seamless integration with TanStack DB's optimistic update system

🚀 Quick Start

import { createCollection } from "@tanstack/db"
import { rssCollectionOptions } from "@tanstack/rss-db-collection"

const blogFeed = createCollection({
  ...rssCollectionOptions<BlogPost>({
    feedUrl: "https://blog.example.com/rss.xml",
    pollingInterval: 5 * 60 * 1000, // 5 minutes
    getKey: (item) => item.id,
    transform: (item) => ({
      id: item.guid || item.link || "",
      title: item.title || "",
      description: item.description || "",
      link: item.link || "",
      publishedAt: new Date(item.pubDate || Date.now()),
      author: item.author,
    }),
  }),
})

🔧 Configuration Options

  • Polling Control: Configurable intervals, manual refresh, and automatic start options
  • HTTP Configuration: Custom timeouts, user agents, and headers
  • Memory Management: Configurable deduplication cache limits with automatic cleanup
  • Error Handling: Comprehensive error types and graceful failure recovery
  • Schema Integration: Full support for Standard Schema validation

🧪 Testing

  • 41 comprehensive tests covering RSS and Atom functionality
  • Error scenarios including network failures, parsing errors, and format mismatches
  • Multiple fetch scenarios testing progressive additions and state management
  • Edge cases including malformed feeds and timeout handling

📦 Package Structure

@tanstack/rss-db-collection/
├── src/
│   ├── rss.ts          # Core RSS/Atom collection logic
│   ├── errors.ts       # Comprehensive error types
│   └── index.ts        # Public API exports
├── tests/
│   ├── rss.test.ts     # RSS/Atom functionality tests
│   ├── errors.test.ts  # Error handling tests
│   └── mutations.test.ts # Mutation handler tests
└── README.md           # Comprehensive documentation

�� Dependencies

  • @tanstack/db - Core collection functionality
  • fast-xml-parser - XML parsing for RSS/Atom feeds
  • debug - Debug logging support
  • @standard-schema/spec - Schema validation support

Open in Cursor Open in Web

Learn more about Cursor Agents

Copy link

changeset-bot bot commented Aug 3, 2025

🦋 Changeset detected

Latest commit: 04d2af6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@tanstack/rss-db-collection Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@samwillis samwillis marked this pull request as ready for review August 4, 2025 10:43
@samwillis samwillis requested a review from KyleAMathews August 4, 2025 10:44
@KyleAMathews
Copy link
Collaborator

A few things from o3:

  • rowUpdateMode defaults to "partial". News/blog items are immutable; using "full" avoids needless diffing and surprises.
  • Update vs insert deduplication. A feed item whose GUID is stable but whose content later changes will currently be skipped (the GUID is already in the seen set). Treat that as an update (or expose a hook so the user can decide).
  • Time zone / date parsing. new Date(item.pubDate) is locale‑dependent for RSS; use a strict RFC 2822/3339 parser to avoid NaN dates.
  • db‑ivm export changes. Re‑exporting MultiSet, RootStreamBuilder etc. from db‑ivm/src/index.ts is unrelated and should be dropped to keep that package’s public surface stable.
  • All other option creators are CollectionOptions. Keeping that pattern (rssCollectionOptions) is correct – but the docs introduce the term “RSSCollection” which could confuse users. Rename doc heading to “RSS Collection Options” to stay consistent.
  • Default polling interval.
    Five minutes is reasonable for blogs but too aggressive for most podcasts; perhaps derive a smarter default based on the feed’s /sy:updatePeriod if present.

@samwillis samwillis force-pushed the cursor/create-rss-collection-with-feed-syncing-b402 branch from fd63e59 to 40e4023 Compare August 5, 2025 16:38
@samwillis samwillis force-pushed the cursor/create-rss-collection-with-feed-syncing-b402 branch from 40e4023 to 04d2af6 Compare August 5, 2025 16:41
@@ -5,8 +5,9 @@ RSS/Atom feed collection for TanStack DB - sync data from RSS and Atom feeds wit
## Features

- **📡 RSS & Atom Support**: Dedicated option creators for RSS 2.0 and Atom 1.0 feeds
- **🔄 Automatic Polling**: Configurable polling intervals with intelligent error recovery and manual refresh capability
- **✨ Deduplication**: Built-in deduplication based on feed item IDs/GUIDs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this readme accessible from the website? Or if not or that's awkward (links breaking and whatever) let's move the docs with the rest of the docs and the readme can just link to Tanstack.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants