Skip to content

fhello23/needle-haystack-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🪡 Needle Haystack

A fast, concurrent CLI tool to find specific IDs across all columns in your PostgreSQL database.

Ever needed to find where a UUID appears in your database? This tool scans every column of every table, checking if your target IDs exist anywhere.

Features

  • 🔍 Auto-detects ID types - UUID, BigInt, or Text
  • Highly concurrent - Configurable parallelism with semaphore-based rate limiting
  • 🎯 Smart filtering - Restrict by schema, column name, or column pattern
  • 📊 Progress tracking - Live progress bar for large databases
  • 🛡️ Error resilient - Continues scanning even if individual columns fail
  • 🔌 Supabase ready - Works with connection poolers out of the box

Installation

# Clone the repository
git clone https://github.com/yourusername/needle_haystack_rust.git
cd needle_haystack_rust

Setup

  1. Create a .env file with your database connection:
DATABASE_URL=postgres://user:password@host:port/database

Supabase users: Use the connection pooler URL (port 6543), not direct connection (5432)

  1. Create a needles.txt file with IDs to search (one per line):
# Comments are supported
id-example-1
id-example-2
id-example-3

Usage

# Basic usage (searches all schemas)
cargo run --release

# Specify a different IDs file
cargo run --release -- --ids my_ids.txt

# Restrict to public schema only
cargo run --release -- --schema public

# Only check columns named "id"
cargo run --release -- --column-exact id

# Only check columns containing "_id" in their name
cargo run --release -- --column-contains _id

# Stop after first match (faster)
cargo run --release -- --first-only

# Adjust concurrency for Supabase (conservative)
cargo run --release -- --concurrency 8 --pool-size 5

# Adjust concurrency for on-premise (aggressive)
cargo run --release -- --concurrency 32 --pool-size 20

Options

Flag Default Description
--ids needles.txt Path to file containing IDs
--concurrency 16 Max concurrent column checks
--pool-size 10 Max database connections
--schema all Restrict to specific schema
--column-exact - Only check columns with this exact name
--column-contains - Only check columns containing this substring
--first-only false Stop after finding the first match
--timeout-secs 30 Query timeout per column
--pool-timeout-secs 60 Connection pool acquire timeout

Example Output

Connecting with pool_size=10, concurrency=16, timeouts=60s/30s
Loaded 3 IDs of type 'uuid'
Candidate columns to check: 847
⠹ [00:00:12] [################>-----------------------] 342/847 (40%) checking public.users.id
✓ FOUND: public.profiles.user_id
✓ FOUND: public.orders.customer_id
⠹ [00:00:28] [########################################] 847/847 (100%) done

=== Summary ===
Columns checked: 847/847
Matches found: 2
  → public.profiles.user_id
  → public.orders.customer_id

How It Works

  1. Parses IDs and auto-detects their type (UUID, integer, or text)
  2. Queries pg_catalog to find all columns matching the ID type
  3. For each candidate column, runs: SELECT EXISTS(SELECT 1 FROM table WHERE column = ANY($1))
  4. Uses connection pooling and semaphores to control concurrency
  5. Reports all locations where the IDs were found

Performance Tips

  • Supabase: Start with --concurrency 8 --pool-size 5 and increase gradually
  • On-premise: Can go up to --concurrency 64 --pool-size 30 depending on your max_connections
  • Large tables: Increase --timeout-secs if you have tables with millions of rows
  • Quick check: Use --first-only if you just need to confirm the ID exists somewhere

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages