Skip to content

filipetorresdecarvalho/get-imdb-json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 IMDb Data Downloader & Manager

A production-grade Python desktop application for downloading, organizing, storing, versioning, and exporting IMDb-related data. Built with a clean architecture, robust error handling, and a polished dark-themed Tkinter interface.

Python 3.12+ License: MIT SQLAlchemy 2.0 PyMovieDb


✨ Features

Core Functionality

  • Search & Download β€” Enter an IMDb ID (tt0111161), title name, or person/actor name
  • Automatic Input Detection β€” Auto-classifies input as title ID, person ID, or search query
  • Rich Data Display β€” View downloaded metadata in a sortable treeview with double-click details

Data Management

  • SQLite Database β€” All records persisted via SQLAlchemy ORM (ready for MariaDB/MySQL swap)
  • Record Versioning β€” Every update archives the previous version to title_history β€” no data is ever lost
  • Duplicate Detection β€” SHA-256 content hashing prevents redundant snapshots
  • Folder-Based Storage β€” Organized as data/{type}/{imdb_id}/{timestamp}/data.json

Export & Backup

  • JSON Export β€” Full record export with parsed data
  • CSV Export β€” Flat tabular export for spreadsheet analysis
  • Offline Archive β€” Complete folder structure for reliable offline backup and restoration

Developer Experience

  • Embedded Log Viewer β€” Live log panel inside the GUI with color-coded levels
  • File Logging β€” Rotating log files with timestamps and stack traces
  • Startup Health Checks β€” Bootstrap verifies PyMovieDb, directories, and database on launch
  • Dark Theme β€” Professional Catppuccin Mocha color scheme

πŸ“‹ Prerequisites

Requirement Version Notes
Python 3.12+ Tested on 3.12, 3.13, 3.14
pip Latest Package installer
Tkinter Built-in Included with standard Python install

Note: No external database server needed for the MVP. SQLite is included with Python.


πŸš€ Setup

1. Clone the Repository

git clone https://github.com/filipetorresdecarvalho/get-imdb-json.git
cd get-imdb-json

2. Create a Virtual Environment (Recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

This installs:

  • PyMovieDb β€” IMDb data acquisition library (scrapes IMDb.com)
  • SQLAlchemy β€” ORM for database operations

4. Run the Application

python app.py

The app will:

  1. Load configuration from config/default.ini
  2. Run bootstrap health checks
  3. Create data/ and logs/ directories
  4. Initialize the SQLite database
  5. Open the GUI window

πŸ’» Usage

Searching by IMDb ID

Enter a title ID (e.g. tt0111161) or person ID (e.g. nm0000151) and click Download.

Searching by Name

Enter a movie title (e.g. The Shawshank Redemption) or person name (e.g. Morgan Freeman). Select the appropriate type from the dropdown if auto-detection isn't working.

Viewing Details

Double-click any row in the results table to view the full JSON data.

Exporting Data

Click Export to choose format (JSON/CSV), scope (Titles/Persons/Everything), and destination.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Tkinter GUI                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Search Input β”‚  β”‚ Results  β”‚  β”‚  Log Panel    β”‚  β”‚
β”‚  β”‚ + Type Combo β”‚  β”‚ Treeview β”‚  β”‚  (live logs)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚              β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚       Services Layer           β”‚
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
   β”‚  β”‚   Download Service      β”‚   β”‚
   β”‚  β”‚  (orchestration core)   β”‚   β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚ Validation  β”‚  β”‚History β”‚  β”‚
   β”‚  β”‚  Service    β”‚  β”‚Service β”‚  β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚Export Service β”‚  β”‚ Boot  β”‚  β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚strap β”‚  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜β”€β”€β”˜
          β”‚              β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Integration β”‚  β”‚    Storage      β”‚
   β”‚  (Adapter)  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
   β”‚  PyMovieDb  β”‚  β”‚  β”‚ Folder    β”‚ β”‚
   β”‚             β”‚  β”‚  β”‚ Manager   β”‚ β”‚
   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
          β”‚         β”‚  β”‚JSON Writerβ”‚ β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
   β”‚  IMDb.com   β”‚  β”‚  β”‚ Snapshot  β”‚ β”‚
   β”‚  (external) β”‚  β”‚  β”‚ Manager   β”‚ β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚      Database (SQLite)      β”‚
   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
   β”‚  β”‚Actors β”‚  β”‚  Titles    β”‚  β”‚
   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”˜  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
   β”‚             β”‚TitleDetail β”‚  β”‚
   β”‚             β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
   β”‚             β”‚TitleHistoryβ”‚  β”‚
   β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Folder Structure

get-imdb-json/
β”œβ”€β”€ app.py                          # Entry point
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ requirements.txt                # Python dependencies
β”‚
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ settings.py                 # Config loader (INI + env vars)
β”‚   └── default.ini                 # Default configuration
β”‚
β”œβ”€β”€ gui/
β”‚   β”œβ”€β”€ main_window.py              # Main Tkinter window (dark theme)
β”‚   β”œβ”€β”€ dialogs.py                  # Bootstrap, export, detail dialogs
β”‚   └── status_panel.py             # Status bar + log viewer widget
β”‚
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ bootstrap_service.py        # Startup health checks
β”‚   β”œβ”€β”€ download_service.py         # Download orchestration pipeline
β”‚   β”œβ”€β”€ export_service.py           # JSON/CSV export
β”‚   β”œβ”€β”€ validation_service.py       # Input validation & classification
β”‚   └── history_service.py          # Record versioning / snapshots
β”‚
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ engine.py                   # SQLAlchemy engine + session
β”‚   └── models.py                   # ORM models (Actor, Title, etc.)
β”‚
β”œβ”€β”€ storage/
β”‚   β”œβ”€β”€ folder_manager.py           # Directory hierarchy management
β”‚   β”œβ”€β”€ json_writer.py              # Atomic JSON read/write
β”‚   └── snapshot_manager.py         # Content-hash deduplication
β”‚
β”œβ”€β”€ integrations/
β”‚   └── pymoviedb_adapter.py        # PyMovieDb wrapper (safe API)
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ logger.py                   # Logging setup + Tkinter handler
β”‚   β”œβ”€β”€ hashing.py                  # SHA-256 utilities
β”‚   β”œβ”€β”€ paths.py                    # Path constants & helpers
β”‚   └── errors.py                   # Custom exception hierarchy
β”‚
β”œβ”€β”€ logs/                           # Runtime logs (gitignored)
β”œβ”€β”€ data/                           # Downloaded data (gitignored)
└── tests/                          # Future test suite

βš™οΈ Configuration

config/default.ini

[database]
url = sqlite:///data/imdb_data.db

[storage]
data_dir = data
log_dir = logs

[app]
log_level = INFO
window_title = IMDb Data Downloader & Manager
request_timeout = 30

Environment Variable Overrides

Variable Config Key Example
IMDB_APP_DB_URL database.url mysql+pymysql://user:pass@localhost/imdb
IMDB_APP_DATA_DIR storage.data_dir D:\imdb_data
IMDB_APP_LOG_DIR storage.log_dir D:\imdb_logs
IMDB_APP_LOG_LEVEL app.log_level DEBUG

πŸ”€ Switching to MariaDB/MySQL

The SQLAlchemy models work with any supported backend. To switch:

  1. Install the MySQL driver:

    pip install pymysql
  2. Update config/default.ini:

    [database]
    url = mysql+pymysql://user:password@localhost:3306/imdb_db
  3. Create the database:

    CREATE DATABASE imdb_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  4. Restart the app β€” tables are created automatically.


πŸ“¦ Dependency Strategy

This project uses PyMovieDb as the primary IMDb data acquisition layer. PyMovieDb provides:

  • get_by_id() / get_by_name() β€” Movie/TV-series lookup
  • person_by_id() / person_by_name() β€” Person/celebrity lookup
  • search() β€” Search with filters (year, TV, person)
  • popular_movies() / popular_tv() β€” Browse popular content

The imdb-scraper project was used as an architectural reference for database storage patterns and scraping workflows, but is not a runtime dependency (no license detected).

What This Project Adds (Beyond the Dependencies)

Feature PyMovieDb imdb-scraper This Project
IMDb data scraping βœ… βœ… Via PyMovieDb
Desktop GUI ❌ ❌ βœ… Tkinter
Record versioning ❌ ❌ βœ… Full history
Content deduplication ❌ ❌ βœ… SHA-256
Folder-based archive ❌ ❌ βœ… Timestamped
JSON/CSV export ❌ ❌ βœ… Multiple formats
Startup health checks ❌ ❌ βœ… Bootstrap
Embedded log viewer ❌ ❌ βœ… Live GUI panel
Error handling Basic Basic βœ… Full hierarchy
Database versioning ❌ ❌ βœ… History table

πŸ”§ Troubleshooting

PyMovieDb not found

pip install PyMovieDb

SQLite "database is locked"

Ensure only one instance of the app is running. The app uses WAL mode for better concurrency.

No data returned for a valid ID

  • IMDb may be blocking scraping requests. Try again after a few minutes.
  • Check the log panel for specific error details.
  • Verify your internet connection.

GUI looks wrong / fonts not rendering

Ensure your Python installation includes Tkinter (usually bundled on Windows and macOS).

Permission errors on data/ or logs/

Run the app from a directory where you have write permissions, or set custom paths:

set IMDB_APP_DATA_DIR=D:\my_imdb_data
set IMDB_APP_LOG_DIR=D:\my_imdb_logs
python app.py

πŸ“„ License

This project is licensed under the MIT License.

Dependency Licenses

  • PyMovieDb: MIT License
  • SQLAlchemy: MIT License
  • imdb-scraper: No license detected (used as reference only, not bundled)

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ—ΊοΈ Roadmap (v2)

  • MariaDB/MySQL as configurable backend
  • Auto-clone dependency repos from GitHub on first launch
  • Retry queue for failed downloads with exponential back-off
  • Actor/title history browser panel in GUI
  • Batch download from a list of IDs
  • SQL dump export
  • Alembic database migrations
  • Settings/preferences dialog
  • Response caching (in-memory LRU)
  • Packaging as standalone executable (PyInstaller)

About

get-imdb-json

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages