-
Notifications
You must be signed in to change notification settings - Fork 0
📦 Add local database management utilities to support remote DB access directly in Python package #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jeremy-wayland
wants to merge
15
commits into
main
Choose a base branch
from
15-download-db
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…environments Add comprehensive utilities module to enable downloading and running the US physician referral networks database locally. This addresses connectivity issues and firewall restrictions that prevent users from accessing the remote Datasette instance. ### New Features: - `download_and_launch_local_datasette()`: Downloads SQLite database and starts local Datasette server - `download_file()`: Robust file downloader with progress reporting and error handling - `update_env_file()`: Automatic .env file management for switching between local/remote URLs - `stop_local_datasette()`: Graceful termination of local Datasette processes by port or PID - `list_datasette_processes()`: Process discovery and management utilities ### Technical Implementation: - Comprehensive error handling for missing dependencies (datasette, psutil) - Configurable Datasette settings (SQL timeout, max rows, CSV streaming) - Process lifecycle management with graceful shutdown and force-kill fallback - Progress reporting for large file downloads (~8GB database) - Cross-platform compatibility with proper subprocess management - Verbose logging mode for debugging and monitoring ### Testing: - Complete test suite with 17 test cases covering all functionality - Mock-based testing for external dependencies (requests, subprocess, psutil) - Edge case handling (timeouts, missing processes, access denied scenarios) - Temporary directory isolation for filesystem operations - Process management testing including graceful and forced termination ### Integration: - Seamless integration with existing Apparent workflow - Automatic fallback between remote and local database URLs - Environment variable management for easy configuration switching - Compatible with existing data pulling and network analysis functionality Resolves #15 - Download DB functionality for local development and restricted environments
emsimons
reviewed
Sep 24, 2025
…ovide alternative launch instructions for Datasette
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses Issue #15 by implementing comprehensive local database management utilities in the Python package, enabling users to programmatically download the remote SQL database and launch a local Datasette instance for offline access or environments with restricted connectivity.
🚀 Main Features Implemented
Core Functions Added to
apparent/utils.py
:download_and_launch_local_datasette()
- The primary function that:https://apparent.topology.rocks/us_physician_referral_networks.db
)stop_local_datasette()
- Process management utility that:psutil
for robust cross-platform process managementSupporting utilities:
download_file()
- Robust file downloader with progress reporting for large files (~8GB database)update_env_file()
- Automatic .env management for seamless local/remote URL switchinglist_datasette_processes()
- Process discovery and monitoring utilities🔧 Technical Implementation Highlights
datasette
,psutil
)📊 Why This Matters
💡 Usage Examples
🧪 Testing
📈 Impact
This implementation transforms the existing bash-script-only database access (from
tests/run-integration-tests.sh
) into a first-class Python API, making the functionality accessible to:Files Changed
apparent/utils.py
(551 lines added) - New utilities module with all core functionalitytests/test_utils.py
(419 lines added) - Comprehensive test suiteTotal: 970+ lines of new functionality
Closes #15 - Successfully implements the requested
download_and_launch_local_db()
functionality (implemented asdownload_and_launch_local_datasette()
for clarity) along with comprehensive database management utilities.