Extract Lucene query tests into engine-agnostic JSON test suite by Copilot · Pull Request #2 · infinilabs/lucene

Copilot · 2026-05-15T04:38:00Z

Extracts core Lucene search query tests into a portable, implementation-independent JSON format — decoupling the semantic contract (given documents + query → expected results) from Java/Lucene internals. Designed to validate a Rust-based engine implementation against the same behavioral spec.

Format

Each test file references a reusable dataset + schema, defines queries, and asserts on results:

{
  "dataset": "fuzzy-words",
  "schema": "single-keyword-field",
  "tests": [
    {
      "id": "fuzzy-ordering-bbbbb",
      "description": "Results ordered by edit distance: bbbbb(0), abbbb(1), aabbb(2)",
      "query": {
        "type": "fuzzy",
        "field": "field",
        "value": "bbbbb",
        "max_edits": 2,
        "prefix_length": 0
      },
      "expected": {
        "count": 3,
        "ordered": ["bbbbb", "abbbb", "aabbb"]
      }
    }
  ]
}

What's included

search-test-suite/ — self-contained at repo root
14 datasets, 8 schemas — reusable across test files
70 test cases across 23 files covering 10 query types: term, boolean, phrase, fuzzy, prefix, wildcard, range, regexp, match_all, match_none
JSON Schema (schema.json) for structural validation of test files
Java reference runner — SearchTestSuiteRunner.java executes the suite against Lucene to verify spec correctness
Rust runner placeholder with type definitions and implementation guide

Extraction criteria

Only tests following the index docs → query → assert hits pattern were extracted. Skipped: equals/hashCode, rewrite internals, scorer/weight, randomized stress tests, codec-specific tests — anything coupled to Lucene implementation details rather than search semantics.

Expected results support

count (exact), count_min (lower bound), ordered (strict relevance order), hits.must_contain / hits.must_not_contain (set membership), match_field (which stored field to check).

…format Extract Lucene core query tests into engine-agnostic JSON test suite with: - 14 reusable datasets covering all query types - 8 field schemas (keyword, text, positions) - 70 test cases across 10 query types in 23 test files - JSON Schema for validation - Java reference test runner - Rust runner placeholder with implementation guide Agent-Logs-Url: https://github.com/infinilabs/lucene/sessions/38e89113-7701-4f8d-ad64-20e8ae5d87ca Co-authored-by: medcl <64487+medcl@users.noreply.github.com>

Copilot AI assigned Copilot and medcl May 15, 2026

Copilot created this pull request from a session on behalf of medcl May 15, 2026 04:38 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract Lucene query tests into engine-agnostic JSON test suite#2

Extract Lucene query tests into engine-agnostic JSON test suite#2
Copilot wants to merge 1 commit into
mainfrom
copilot/test-query-logic-accuracy

Copilot AI commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 15, 2026

Format

What's included

Extraction criteria

Expected results support

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants