Build a production-grade Python desktop application for downloading, organizing, storing, versioning, and exporting IMDb-related data. The app wraps two public repos (PyMovieDb for data acquisition, imdb-scraper as architectural reference) and adds a complete orchestration layer: GUI, database, versioning, exports, logging, and bootstrap automation.
Important
Database choice for MVP: The brief specifies MariaDB/MySQL via SQLAlchemy. For the MVP, I recommend SQLite via SQLAlchemy so the app runs out of the box with zero database setup. The SQLAlchemy models will be identical — switching to MariaDB later is a one-line connection-string change. This dramatically lowers the barrier to first-run success.
Important
Python version: The system has Python 3.14.3. PyMovieDb on PyPI may have compatibility nuances with 3.14. I'll test during installation and fall back to vendoring the relevant source if needed.
Important
imdb-scraper strategy: Rather than cloning/depending on imdb-scraper at runtime, I'll use it purely as an architectural reference for database patterns. This avoids an unlicensed dependency and a Flask/BeautifulSoup/matplotlib stack the user doesn't need. The MVP will only depend on PyMovieDb for data acquisition.
| Feature | Status |
|---|---|
| Tkinter GUI (search, download, view, export, log viewer) | ✅ Build |
| PyMovieDb integration (search by name/ID, person lookup) | ✅ Build |
| SQLite database via SQLAlchemy ORM | ✅ Build |
| Record versioning / history snapshots | ✅ Build |
| Folder-based JSON storage | ✅ Build |
| Content-hash duplicate detection | ✅ Build |
| JSON & CSV export | ✅ Build |
| Timestamped file logging + GUI log panel | ✅ Build |
| Bootstrap / dependency checker | ✅ Build |
| Configuration system (INI file) | ✅ Build |
| Comprehensive error handling | ✅ Build |
| Professional README.md | ✅ Build |
| Feature | Notes |
|---|---|
| MariaDB/MySQL backend | Connection string swap; models ready |
| SQL dump export | Needs DB-specific tooling |
| Auto-clone dependency repos from GitHub | Stubs + README instructions |
| Retry queue for failed fetches | Commented architecture |
| Migration helpers | Alembic integration stub |
| Actor/title history browser panel | GUI stub |
| Configurable endpoint sets | Config stub |
c:\Users\Administrator\GitHub\get-imdb-json\
├── app.py # Entry point
├── README.md # Professional documentation
├── requirements.txt # Dependencies
├── config/
│ ├── __init__.py
│ ├── settings.py # Config loader / dataclass
│ └── default.ini # Default configuration
├── gui/
│ ├── __init__.py
│ ├── main_window.py # Main Tkinter window
│ ├── dialogs.py # Setup wizard, error dialogs
│ └── status_panel.py # Status bar, log viewer panel
├── services/
│ ├── __init__.py
│ ├── bootstrap_service.py # Dependency & DB health checks
│ ├── download_service.py # IMDb data acquisition orchestrator
│ ├── export_service.py # JSON/CSV export
│ ├── validation_service.py # Input validation (IMDb IDs, names)
│ └── history_service.py # Versioning / snapshot logic
├── db/
│ ├── __init__.py
│ ├── engine.py # SQLAlchemy engine + session factory
│ └── models.py # ORM models (Actor, Title, TitleDetail, TitleHistory)
├── storage/
│ ├── __init__.py
│ ├── folder_manager.py # Folder hierarchy creation
│ ├── json_writer.py # JSON read/write with atomic safety
│ └── snapshot_manager.py # Content-hash snapshots
├── integrations/
│ ├── __init__.py
│ └── pymoviedb_adapter.py # Adapter wrapping PyMovieDb calls
├── utils/
│ ├── __init__.py
│ ├── logger.py # Logging setup (file + GUI handler)
│ ├── hashing.py # SHA-256 content hashing
│ ├── paths.py # Path constants & helpers
│ └── errors.py # Custom exception hierarchy
├── logs/ # Runtime log files (gitignored)
├── data/ # Downloaded JSON data (gitignored)
└── tests/ # Future test suite
└── __init__.py
Entry point. Initializes logging, loads config, runs bootstrap checks, launches the Tkinter GUI.
@dataclass AppConfigwith all tunables (DB URL, data dir, log level, etc.)- Loads from
default.ini, overridable by environment variables - Creates directories on first access
[database]
url = sqlite:///data/imdb_data.db
[storage]
data_dir = data
log_dir = logs
[app]
log_level = INFOget_engine(db_url)— creates SQLAlchemy engineget_session_factory(engine)— returnssessionmakerinit_db(engine)— creates all tables viaBase.metadata.create_all- Context manager
session_scope()for safe commit/rollback
Four SQLAlchemy ORM models matching the ER diagram:
- Actor —
imdb_id,name,raw_json,created_at,updated_at,is_active - Title —
imdb_id,title,type,year,raw_json,actor_idFK, timestamps,is_active - TitleDetail —
title_idFK,data_type,raw_json, timestamps,is_active - TitleHistory —
title_idFK,snapshot_hash,raw_json,archived_at,reason
Thin adapter around PyMovieDb.IMDB:
search_title(name, year, tv)→ parsed dictget_title_by_id(imdb_id)→ parsed dictget_person_by_name(name)→ parsed dictget_person_by_id(imdb_id)→ parsed dictsearch_person(name)→ parsed dict- All methods return
(data, error)tuples — never raise to caller - JSON parse safety (PyMovieDb returns JSON strings, not dicts)
Orchestrates the full download flow:
- Validate input
- Call adapter
- Compute content hash
- Check for duplicates
- Save to folder storage
- Upsert to database (version old record first)
- Return result summary
archive_record(session, model_instance, reason)— copies current state to TitleHistoryget_history(session, title_id)— returns all historical snapshots- Never deletes — only marks
is_active = False
export_json(session, output_path, filters)— export DB records to JSONexport_csv(session, output_path, filters)— export to CSV- Respects active/inactive filters
validate_imdb_id(text)— checkstt\d{7,}ornm\d{7,}patternsdetect_input_type(text)— classifies as IMDB_ID, PERSON_ID, TITLE_NAME, PERSON_NAMEsanitize_filename(text)— safe filesystem names
check_dependencies()— verifies PyMovieDb is importablecheck_database(db_url)— tests DB connectivityensure_directories(config)— creates data/logs folders- Returns structured health report for GUI display
- Creates
data/{type}/{imdb_id}/{timestamp}/structure - Ensures no collisions
- Atomic JSON write (write to temp, rename)
- Pretty-print with 2-space indent
- Read-back with validation
- SHA-256 hash of JSON content
- Skip-if-duplicate logic
- Snapshot metadata sidecar files
Tkinter main window with:
- Search frame — input field, search type dropdown (Title/Person/IMDb ID), search button
- Results frame — treeview showing results, download button
- Details frame — display selected record details
- Action buttons — Export JSON, Export CSV, Init Database, Refresh
- Log panel — scrolled text widget showing live log output
- Status bar — current operation status
- Dark theme using
ttk.Stylecustomization - Threading for non-blocking downloads
SetupWizardDialog— shown when DB is missing, guides through setupErrorDialog— friendly error display with technical details expandableExportDialog— choose export format and path
- Reusable status bar widget
- Log viewer with level-based coloring
- Configures root logger with file handler (rotating, timestamped)
- Custom
TkinterHandlerthat pushes log records to GUI - Format:
2026-04-02 14:21:09 [ERROR] services.download_service: message
compute_hash(data: str) -> str— SHA-256
PROJECT_ROOT,DATA_DIR,LOGS_DIRconstantsensure_dir(path)helper
IMDbAppErrorbase exceptionDownloadError,DatabaseError,ValidationError,BootstrapErrorsubclasses
Complete rewrite with: project overview, features, prerequisites, installation, database setup, usage guide, architecture diagram (Mermaid), folder structure, troubleshooting, licensing notes, contribution guidance.
PyMovieDb>=0.1.0
SQLAlchemy>=2.0
Important
- SQLite vs MariaDB for MVP — I strongly recommend SQLite for MVP (zero setup, portable). The models are identical either way. Do you agree, or do you want MariaDB from day one?
Important
2. imdb-scraper as reference only — Since it has no license and adds heavy dependencies (Flask, matplotlib, lxml), I plan to use it only as a design reference, not as a runtime dependency. Is that acceptable?
Note
3. Scope for MVP — The plan builds all core features (GUI, DB, versioning, exports, logging) as working code, with v2 features as well-commented stubs. Does this scope feel right?
pip install -r requirements.txtcompletes without errorspython app.pylaunches the GUI without crashes- Search for a known title (e.g., "The Shawshank Redemption") returns results
- Download stores JSON to
data/folder and inserts to SQLite - Re-downloading the same data creates a history snapshot (no duplicates)
- Export JSON and CSV produce valid output files
- Log file appears in
logs/with correct format
- Visual inspection of GUI layout and dark theme
- Test with invalid IMDb IDs to verify error handling
- Test with network disabled to verify graceful degradation