Skip to content

overlogic/media-search-demo

Repository files navigation

Media Search Demo

A media search application built with Next.js and Orama full-text search. Allows searching media items by keyword, credit, and date.

Tech Stack

  • Next.js 16 (App Router, React Compiler enabled)
  • React 19
  • Orama (full-text search)
  • Tailwind CSS
  • TypeScript

Getting Started

Prerequisites

Install dependencies

bun install

Run the development server

bun run dev

To run with initial data:

INITIAL_ITEMS=./data/media-items.json bun run dev

Build for production

bun run build
bun run start

Solution Overview

High-Level Approach

Build a lightweight, in-memory full-text search application using Next.js 16, React 19, and Orama as the search engine. The focus is on delivering a polished, functional search experience with clean architecture rather than over-engineering for hypothetical scale.

The solution prioritizes:

  • Correctness over completeness — ship well-implemented features, defer ambiguous requirements
  • Modern idioms — Server Actions, React Compiler, URL-driven state
  • Developer experience — type-safe end-to-end with modern tooling inluding Bun as the runtime and Orama as a well-designed in-memory search solution

Assumptions

  • Dataset size: The provided dataset sample is representative of the metadata shape. The solution is designed to handle 10K+ items comfortably, and Orama can scale to hundreds of thousands in-memory.
  • Restrictions field: The exact business requirements for restriction extraction are ambiguous — what constitutes a restriction, how they should be grouped, and what the user expects to filter on is unclear. This was deliberately deferred pending clarification (see "Next Steps").
  • Deployment: Single-process Node.js/Bun deployment. No external database or search service required.
  • Data format: Raw items always arrive in the documented shape (suchtext, bildnummer, fotografen, datum, hoehe, breite) with German-format dates (DD.MM.YYYY).

Design Decisions

Server Actions over REST API

The challenge suggests a GET /api/search endpoint. This implementation uses Next.js Server Actions instead — a deliberate choice:

  • Type safety: Server Actions provide end-to-end TypeScript types between client and server with zero boilerplate
  • Idiomatic: Server Actions are the recommended data-fetching pattern in Next.js App Router
  • Colocation: Search logic lives next to the components that consume it, reducing indirection
  • Equivalent capability: Server Actions are RPC over HTTP POST — functionally identical to a REST endpoint, but with better DX

Orama as the Search Engine

Orama is a full-text search engine that runs entirely in-process. This eliminates external dependencies while providing:

  • BM25-based relevance scoring
  • Built-in tokenization and stemming
  • Faceted filtering (enum[] for credits, numeric ranges for dates)
  • Sorting and pagination
  • Upsert support for live data updates

URL-Driven State

All search state (query, filters, sort, pagination) is stored in URL search params. This provides:

  • Shareable/bookmarkable search URLs
  • Browser back/forward navigation
  • No client-side state synchronization issues

Search & Relevance Strategy

Fields searched:

Field Orama Property Boost Purpose
suchtext description 2.0x Primary search target — contains most metadata
bildnummer id 0.5x Allows direct lookup by image number

Relevance scoring is handled by Orama's built-in BM25 algorithm with the boost weights above. The description field is weighted 4x more than id, reflecting that most user queries target descriptive content rather than numeric IDs.

Minimum query length is 3 characters to avoid overly broad matches and reduce noise.

Known limitation: fotografen (credits) is stored as enum[] for efficient filtering but is not included in full-text search. This means searching for a photographer name only works via the credit filter, not the search box. Adding it as a searchable text field would be a trivial change.

Filtering

  • Credits: Multi-select filter using containsAny — matches items with any of the selected credits
  • Date range: Numeric between filter on precomputed timestamps — efficient range queries without string parsing at query time

Architecture Overview

┌─────────────────────────────────────────────────┐
│                   Client (React 19)             │
│                                                 │
│  SearchPage ──► URL SearchParams (state)        │
│       │                                         │
│  ┌────┴─────┬──────────┬──────────┬──────────┐  │
│  │SearchInput│CreditFilter│DateRange│SortToggle│ │
│  └──────────┴──────────┴──────────┴──────────┘  │
│       │                                         │
│  ResultsList + Pagination                       │
└────────────────┬────────────────────────────────┘
                 │ Server Action (POST)
┌────────────────▼────────────────────────────────┐
│              Server (Next.js 16)                │
│                                                 │
│  actions.ts ──► orama.ts ──► Orama DB (memory)  │
│                    │                            │
│              analytics.ts (tracking)            │
│                                                 │
│  instrumentation.ts (startup data load)         │
└─────────────────────────────────────────────────┘

Data flow:

  1. Startup: instrumentation.ts reads media-items.json, validates with Zod, converts via convert.ts, upserts into Orama
  2. Search: Client updates URL params → useEffect triggers Server Action → Orama search → results rendered
  3. Ingestion: /insert page accepts raw JSON → validates → upserts into Orama (non-blocking)

Preprocessing Strategy

Preprocessing happens at ingestion time (both startup and runtime insertion):

Transformation What Why
Date normalization DD.MM.YYYYYYYY-MM-DD ISO + Unix timestamp Enables date sorting and range filtering without runtime parsing
Credit splitting "A / B"["A", "B"] Enables per-credit filtering via enum[]
Type coercion String numbers → number Enables numeric comparisons for dimensions and dates
Zod validation Schema check on all incoming data Rejects malformed items at the boundary

Delegated to Orama:

  • Tokenization of description field
  • Stop-word handling
  • BM25 index construction and maintenance
  • Stemming

This split is intentional: explicit preprocessing handles domain-specific transformations (e.g., date formats, credit delimiters), while Orama handles language-agnostic search indexing.

Updating the index: New items are upserted via insertItems(), which runs the same preprocessing pipeline. Orama's upsert is non-blocking and updates the index incrementally — no full rebuild required.

Scaling Approach

Current: In-Memory (10K–100K+ items)

Orama runs entirely in-process with no external dependencies. For the challenge's 10K requirement, this is more than sufficient. Orama can handle hundreds of thousands of documents in memory with sub-millisecond query times.

Continuous ingestion is supported via the upsert function — new items are indexed incrementally without blocking ongoing searches. The /insert endpoint demonstrates this mechanism. At one item per minute, this adds negligible overhead.

Production Scale: Millions of Items

For millions of items in a production environment, the architecture would evolve:

  1. Search engine: Replace Orama with a dedicated search service (Elasticsearch, Meilisearch, Typesense, etc) that provides horizontal scaling, persistence, and replication
  2. Persistent store: PostgreSQL or similar as the source of truth, with the search engine as a derived index
  3. Ingestion pipeline: Message queue (e.g., SQS, Kafka) between data producers and the search index — decouples ingestion rate from indexing throughput

The migration path is clean because the search interface (searchMedia, insertItems) is already abstracted — swapping Orama for an external engine requires changing only orama.ts.

Next Steps

  • Restrictions extraction: After clarifying requirements, implement regex-based extraction of restriction tokens from suchtext (e.g., /[A-Z]+x[A-Z]+(?:x[A-Z]+)*/g), normalize them, store as a filterable field, and add filter to the UI
  • Credits text search: Add fotografen as a searchable string field alongside the existing enum[] filter field
  • Persistence: Switch to external search engine for production
  • Error boundaries: Add React error boundaries for graceful degradation
  • Accessibility audit: Expand keyboard navigation and screen reader support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors