A production-grade Python desktop application for downloading, organizing, storing, versioning, and exporting IMDb-related data. Built with a clean architecture, robust error handling, and a polished dark-themed Tkinter interface.
- Search & Download β Enter an IMDb ID (
tt0111161), title name, or person/actor name - Automatic Input Detection β Auto-classifies input as title ID, person ID, or search query
- Rich Data Display β View downloaded metadata in a sortable treeview with double-click details
- SQLite Database β All records persisted via SQLAlchemy ORM (ready for MariaDB/MySQL swap)
- Record Versioning β Every update archives the previous version to
title_historyβ no data is ever lost - Duplicate Detection β SHA-256 content hashing prevents redundant snapshots
- Folder-Based Storage β Organized as
data/{type}/{imdb_id}/{timestamp}/data.json
- JSON Export β Full record export with parsed data
- CSV Export β Flat tabular export for spreadsheet analysis
- Offline Archive β Complete folder structure for reliable offline backup and restoration
- Embedded Log Viewer β Live log panel inside the GUI with color-coded levels
- File Logging β Rotating log files with timestamps and stack traces
- Startup Health Checks β Bootstrap verifies PyMovieDb, directories, and database on launch
- Dark Theme β Professional Catppuccin Mocha color scheme
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.12+ | Tested on 3.12, 3.13, 3.14 |
| pip | Latest | Package installer |
| Tkinter | Built-in | Included with standard Python install |
Note: No external database server needed for the MVP. SQLite is included with Python.
git clone https://github.com/filipetorresdecarvalho/get-imdb-json.git
cd get-imdb-jsonpython -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txtThis installs:
- PyMovieDb β IMDb data acquisition library (scrapes IMDb.com)
- SQLAlchemy β ORM for database operations
python app.pyThe app will:
- Load configuration from
config/default.ini - Run bootstrap health checks
- Create
data/andlogs/directories - Initialize the SQLite database
- Open the GUI window
Enter a title ID (e.g. tt0111161) or person ID (e.g. nm0000151) and click Download.
Enter a movie title (e.g. The Shawshank Redemption) or person name (e.g. Morgan Freeman).
Select the appropriate type from the dropdown if auto-detection isn't working.
Double-click any row in the results table to view the full JSON data.
Click Export to choose format (JSON/CSV), scope (Titles/Persons/Everything), and destination.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tkinter GUI β
β βββββββββββββββ ββββββββββββ βββββββββββββββββ β
β β Search Input β β Results β β Log Panel β β
β β + Type Combo β β Treeview β β (live logs) β β
β ββββββββ¬βββββββ ββββββ¬ββββββ βββββββββββββββββ β
βββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββββββ
β β
ββββββββΌβββββββββββββββΌβββββββββββ
β Services Layer β
β βββββββββββββββββββββββββββ β
β β Download Service β β
β β (orchestration core) β β
β ββββββββ¬βββββββββββββββββββ β
β ββββββββΌβββββββ ββββββββββ β
β β Validation β βHistory β β
β β Service β βService β β
β βββββββββββββββ ββββββββββ β
β ββββββββββββββββ βββββββββ β
β βExport Service β β Boot β β
β ββββββββββββββββ βstrap β β
βββββββββββββββββββββββ΄βββββββββββ
β β
ββββββββΌβββββββ βββββΌβββββββββββββ
β Integration β β Storage β
β (Adapter) β β βββββββββββββ β
β PyMovieDb β β β Folder β β
β β β β Manager β β
ββββββββ¬βββββββ β βββββββββββββ€ β
β β βJSON Writerβ β
ββββββββΌβββββββ β βββββββββββββ€ β
β IMDb.com β β β Snapshot β β
β (external) β β β Manager β β
βββββββββββββββ β βββββββββββββ β
ββββββββββββββββββ
β
ββββββββΌβββββββββββββββββββββββ
β Database (SQLite) β
β βββββββββ ββββββββββββββ β
β βActors β β Titles β β
β βββββββββ ββββββββββββββ€ β
β βTitleDetail β β
β ββββββββββββββ€ β
β βTitleHistoryβ β
β ββββββββββββββ β
βββββββββββββββββββββββββββββββ
get-imdb-json/
βββ app.py # Entry point
βββ README.md # This file
βββ requirements.txt # Python dependencies
β
βββ config/
β βββ settings.py # Config loader (INI + env vars)
β βββ default.ini # Default configuration
β
βββ gui/
β βββ main_window.py # Main Tkinter window (dark theme)
β βββ dialogs.py # Bootstrap, export, detail dialogs
β βββ status_panel.py # Status bar + log viewer widget
β
βββ services/
β βββ bootstrap_service.py # Startup health checks
β βββ download_service.py # Download orchestration pipeline
β βββ export_service.py # JSON/CSV export
β βββ validation_service.py # Input validation & classification
β βββ history_service.py # Record versioning / snapshots
β
βββ db/
β βββ engine.py # SQLAlchemy engine + session
β βββ models.py # ORM models (Actor, Title, etc.)
β
βββ storage/
β βββ folder_manager.py # Directory hierarchy management
β βββ json_writer.py # Atomic JSON read/write
β βββ snapshot_manager.py # Content-hash deduplication
β
βββ integrations/
β βββ pymoviedb_adapter.py # PyMovieDb wrapper (safe API)
β
βββ utils/
β βββ logger.py # Logging setup + Tkinter handler
β βββ hashing.py # SHA-256 utilities
β βββ paths.py # Path constants & helpers
β βββ errors.py # Custom exception hierarchy
β
βββ logs/ # Runtime logs (gitignored)
βββ data/ # Downloaded data (gitignored)
βββ tests/ # Future test suite
[database]
url = sqlite:///data/imdb_data.db
[storage]
data_dir = data
log_dir = logs
[app]
log_level = INFO
window_title = IMDb Data Downloader & Manager
request_timeout = 30| Variable | Config Key | Example |
|---|---|---|
IMDB_APP_DB_URL |
database.url | mysql+pymysql://user:pass@localhost/imdb |
IMDB_APP_DATA_DIR |
storage.data_dir | D:\imdb_data |
IMDB_APP_LOG_DIR |
storage.log_dir | D:\imdb_logs |
IMDB_APP_LOG_LEVEL |
app.log_level | DEBUG |
The SQLAlchemy models work with any supported backend. To switch:
-
Install the MySQL driver:
pip install pymysql
-
Update
config/default.ini:[database] url = mysql+pymysql://user:password@localhost:3306/imdb_db
-
Create the database:
CREATE DATABASE imdb_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-
Restart the app β tables are created automatically.
This project uses PyMovieDb as the primary IMDb data acquisition layer. PyMovieDb provides:
get_by_id()/get_by_name()β Movie/TV-series lookupperson_by_id()/person_by_name()β Person/celebrity lookupsearch()β Search with filters (year, TV, person)popular_movies()/popular_tv()β Browse popular content
The imdb-scraper project was used as an architectural reference for database storage patterns and scraping workflows, but is not a runtime dependency (no license detected).
| Feature | PyMovieDb | imdb-scraper | This Project |
|---|---|---|---|
| IMDb data scraping | β | β | Via PyMovieDb |
| Desktop GUI | β | β | β Tkinter |
| Record versioning | β | β | β Full history |
| Content deduplication | β | β | β SHA-256 |
| Folder-based archive | β | β | β Timestamped |
| JSON/CSV export | β | β | β Multiple formats |
| Startup health checks | β | β | β Bootstrap |
| Embedded log viewer | β | β | β Live GUI panel |
| Error handling | Basic | Basic | β Full hierarchy |
| Database versioning | β | β | β History table |
pip install PyMovieDbEnsure only one instance of the app is running. The app uses WAL mode for better concurrency.
- IMDb may be blocking scraping requests. Try again after a few minutes.
- Check the log panel for specific error details.
- Verify your internet connection.
Ensure your Python installation includes Tkinter (usually bundled on Windows and macOS).
Run the app from a directory where you have write permissions, or set custom paths:
set IMDB_APP_DATA_DIR=D:\my_imdb_data
set IMDB_APP_LOG_DIR=D:\my_imdb_logs
python app.pyThis project is licensed under the MIT License.
- PyMovieDb: MIT License
- SQLAlchemy: MIT License
- imdb-scraper: No license detected (used as reference only, not bundled)
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- MariaDB/MySQL as configurable backend
- Auto-clone dependency repos from GitHub on first launch
- Retry queue for failed downloads with exponential back-off
- Actor/title history browser panel in GUI
- Batch download from a list of IDs
- SQL dump export
- Alembic database migrations
- Settings/preferences dialog
- Response caching (in-memory LRU)
- Packaging as standalone executable (PyInstaller)