IMDb Data Downloader & Manager — MVP Implementation Plan

Background & Problem

Build a production-grade Python desktop application for downloading, organizing, storing, versioning, and exporting IMDb-related data. The app wraps two public repos (PyMovieDb for data acquisition, imdb-scraper as architectural reference) and adds a complete orchestration layer: GUI, database, versioning, exports, logging, and bootstrap automation.

User Review Required

Important

Database choice for MVP: The brief specifies MariaDB/MySQL via SQLAlchemy. For the MVP, I recommend SQLite via SQLAlchemy so the app runs out of the box with zero database setup. The SQLAlchemy models will be identical — switching to MariaDB later is a one-line connection-string change. This dramatically lowers the barrier to first-run success.

Important

Python version: The system has Python 3.14.3. PyMovieDb on PyPI may have compatibility nuances with 3.14. I'll test during installation and fall back to vendoring the relevant source if needed.

Important

imdb-scraper strategy: Rather than cloning/depending on imdb-scraper at runtime, I'll use it purely as an architectural reference for database patterns. This avoids an unlicensed dependency and a Flask/BeautifulSoup/matplotlib stack the user doesn't need. The MVP will only depend on PyMovieDb for data acquisition.

Proposed Changes

MVP Scope (what will be built now)

Feature	Status
Tkinter GUI (search, download, view, export, log viewer)	✅ Build
PyMovieDb integration (search by name/ID, person lookup)	✅ Build
SQLite database via SQLAlchemy ORM	✅ Build
Record versioning / history snapshots	✅ Build
Folder-based JSON storage	✅ Build
Content-hash duplicate detection	✅ Build
JSON & CSV export	✅ Build
Timestamped file logging + GUI log panel	✅ Build
Bootstrap / dependency checker	✅ Build
Configuration system (INI file)	✅ Build
Comprehensive error handling	✅ Build
Professional README.md	✅ Build

Deferred to v2 (well-commented stubs)

Feature	Notes
MariaDB/MySQL backend	Connection string swap; models ready
SQL dump export	Needs DB-specific tooling
Auto-clone dependency repos from GitHub	Stubs + README instructions
Retry queue for failed fetches	Commented architecture
Migration helpers	Alembic integration stub
Actor/title history browser panel	GUI stub
Configurable endpoint sets	Config stub

Project File Structure

c:\Users\Administrator\GitHub\get-imdb-json\
├── app.py                          # Entry point
├── README.md                       # Professional documentation
├── requirements.txt                # Dependencies
├── config/
│   ├── __init__.py
│   ├── settings.py                 # Config loader / dataclass
│   └── default.ini                 # Default configuration
├── gui/
│   ├── __init__.py
│   ├── main_window.py              # Main Tkinter window
│   ├── dialogs.py                  # Setup wizard, error dialogs
│   └── status_panel.py             # Status bar, log viewer panel
├── services/
│   ├── __init__.py
│   ├── bootstrap_service.py        # Dependency & DB health checks
│   ├── download_service.py         # IMDb data acquisition orchestrator
│   ├── export_service.py           # JSON/CSV export
│   ├── validation_service.py       # Input validation (IMDb IDs, names)
│   └── history_service.py          # Versioning / snapshot logic
├── db/
│   ├── __init__.py
│   ├── engine.py                   # SQLAlchemy engine + session factory
│   └── models.py                   # ORM models (Actor, Title, TitleDetail, TitleHistory)
├── storage/
│   ├── __init__.py
│   ├── folder_manager.py           # Folder hierarchy creation
│   ├── json_writer.py              # JSON read/write with atomic safety
│   └── snapshot_manager.py         # Content-hash snapshots
├── integrations/
│   ├── __init__.py
│   └── pymoviedb_adapter.py        # Adapter wrapping PyMovieDb calls
├── utils/
│   ├── __init__.py
│   ├── logger.py                   # Logging setup (file + GUI handler)
│   ├── hashing.py                  # SHA-256 content hashing
│   ├── paths.py                    # Path constants & helpers
│   └── errors.py                   # Custom exception hierarchy
├── logs/                           # Runtime log files (gitignored)
├── data/                           # Downloaded JSON data (gitignored)
└── tests/                          # Future test suite
    └── __init__.py

Component Details

[NEW] app.py

Entry point. Initializes logging, loads config, runs bootstrap checks, launches the Tkinter GUI.

Config Component

[NEW] settings.py

@dataclass AppConfig with all tunables (DB URL, data dir, log level, etc.)
Loads from default.ini, overridable by environment variables
Creates directories on first access

[NEW] default.ini

[database]
url = sqlite:///data/imdb_data.db

[storage]
data_dir = data
log_dir = logs

[app]
log_level = INFO

Database Component

[NEW] engine.py

get_engine(db_url) — creates SQLAlchemy engine
get_session_factory(engine) — returns sessionmaker
init_db(engine) — creates all tables via Base.metadata.create_all
Context manager session_scope() for safe commit/rollback

[NEW] models.py

Four SQLAlchemy ORM models matching the ER diagram:

Actor — imdb_id, name, raw_json, created_at, updated_at, is_active
Title — imdb_id, title, type, year, raw_json, actor_id FK, timestamps, is_active
TitleDetail — title_id FK, data_type, raw_json, timestamps, is_active
TitleHistory — title_id FK, snapshot_hash, raw_json, archived_at, reason

Integration Component

[NEW] pymoviedb_adapter.py

Thin adapter around PyMovieDb.IMDB:

search_title(name, year, tv) → parsed dict
get_title_by_id(imdb_id) → parsed dict
get_person_by_name(name) → parsed dict
get_person_by_id(imdb_id) → parsed dict
search_person(name) → parsed dict
All methods return (data, error) tuples — never raise to caller
JSON parse safety (PyMovieDb returns JSON strings, not dicts)

Services Component

[NEW] download_service.py

Orchestrates the full download flow:

Validate input
Call adapter
Compute content hash
Check for duplicates
Save to folder storage
Upsert to database (version old record first)
Return result summary

[NEW] history_service.py

archive_record(session, model_instance, reason) — copies current state to TitleHistory
get_history(session, title_id) — returns all historical snapshots
Never deletes — only marks is_active = False

[NEW] export_service.py

export_json(session, output_path, filters) — export DB records to JSON
export_csv(session, output_path, filters) — export to CSV
Respects active/inactive filters

[NEW] validation_service.py

validate_imdb_id(text) — checks tt\d{7,} or nm\d{7,} patterns
detect_input_type(text) — classifies as IMDB_ID, PERSON_ID, TITLE_NAME, PERSON_NAME
sanitize_filename(text) — safe filesystem names

[NEW] bootstrap_service.py

check_dependencies() — verifies PyMovieDb is importable
check_database(db_url) — tests DB connectivity
ensure_directories(config) — creates data/logs folders
Returns structured health report for GUI display

Storage Component

[NEW] folder_manager.py

Creates data/{type}/{imdb_id}/{timestamp}/ structure
Ensures no collisions

[NEW] json_writer.py

Atomic JSON write (write to temp, rename)
Pretty-print with 2-space indent
Read-back with validation

[NEW] snapshot_manager.py

SHA-256 hash of JSON content
Skip-if-duplicate logic
Snapshot metadata sidecar files

GUI Component

[NEW] main_window.py

Tkinter main window with:

Search frame — input field, search type dropdown (Title/Person/IMDb ID), search button
Results frame — treeview showing results, download button
Details frame — display selected record details
Action buttons — Export JSON, Export CSV, Init Database, Refresh
Log panel — scrolled text widget showing live log output
Status bar — current operation status
Dark theme using ttk.Style customization
Threading for non-blocking downloads

[NEW] dialogs.py

SetupWizardDialog — shown when DB is missing, guides through setup
ErrorDialog — friendly error display with technical details expandable
ExportDialog — choose export format and path

[NEW] status_panel.py

Reusable status bar widget
Log viewer with level-based coloring

Utils Component

[NEW] logger.py

Configures root logger with file handler (rotating, timestamped)
Custom TkinterHandler that pushes log records to GUI
Format: 2026-04-02 14:21:09 [ERROR] services.download_service: message

[NEW] hashing.py

compute_hash(data: str) -> str — SHA-256

[NEW] paths.py

PROJECT_ROOT, DATA_DIR, LOGS_DIR constants
ensure_dir(path) helper

[NEW] errors.py

IMDbAppError base exception
DownloadError, DatabaseError, ValidationError, BootstrapError subclasses

Documentation

[MODIFY] README.md

Complete rewrite with: project overview, features, prerequisites, installation, database setup, usage guide, architecture diagram (Mermaid), folder structure, troubleshooting, licensing notes, contribution guidance.

[NEW] requirements.txt

PyMovieDb>=0.1.0
SQLAlchemy>=2.0

Open Questions

Important

SQLite vs MariaDB for MVP — I strongly recommend SQLite for MVP (zero setup, portable). The models are identical either way. Do you agree, or do you want MariaDB from day one?

Important

2. imdb-scraper as reference only — Since it has no license and adds heavy dependencies (Flask, matplotlib, lxml), I plan to use it only as a design reference, not as a runtime dependency. Is that acceptable?

Note

3. Scope for MVP — The plan builds all core features (GUI, DB, versioning, exports, logging) as working code, with v2 features as well-commented stubs. Does this scope feel right?

Verification Plan

Automated Tests

pip install -r requirements.txt completes without errors
python app.py launches the GUI without crashes
Search for a known title (e.g., "The Shawshank Redemption") returns results
Download stores JSON to data/ folder and inserts to SQLite
Re-downloading the same data creates a history snapshot (no duplicates)
Export JSON and CSV produce valid output files
Log file appears in logs/ with correct format

Manual Verification

Visual inspection of GUI layout and dark theme
Test with invalid IMDb IDs to verify error handling
Test with network disabled to verify graceful degradation

FilesExpand file tree

implementation_planv01.md

Latest commit

History

implementation_planv01.md

File metadata and controls

IMDb Data Downloader & Manager — MVP Implementation Plan

Background & Problem

User Review Required

Proposed Changes

MVP Scope (what will be built now)

Deferred to v2 (well-commented stubs)

Project File Structure

Component Details

[NEW] app.py

Config Component

[NEW] settings.py

[NEW] default.ini

Database Component

[NEW] engine.py

[NEW] models.py

Integration Component

[NEW] pymoviedb_adapter.py

Services Component

[NEW] download_service.py

[NEW] history_service.py

[NEW] export_service.py

[NEW] validation_service.py

[NEW] bootstrap_service.py

Storage Component

[NEW] folder_manager.py

[NEW] json_writer.py

[NEW] snapshot_manager.py

GUI Component

[NEW] main_window.py

[NEW] dialogs.py

[NEW] status_panel.py

Utils Component

[NEW] logger.py

[NEW] hashing.py

[NEW] paths.py

[NEW] errors.py

Documentation

[MODIFY] README.md

[NEW] requirements.txt

Open Questions

Verification Plan

Automated Tests

Manual Verification