A robust Python scraper for collecting insider trading data from openinsider.com.
- Multi-threaded data collection for high performance
- Intelligent caching system to minimize server load
- Configurable filters for transaction types and values
- Flexible data export in CSV and Parquet formats
- Comprehensive logging and error handling
- Automatic retry mechanism for failed requests
- Progress tracking with progress bar
- Docker support for easy deployment
- Clone the repository:
git clone [email protected]:sd3v/openinsiderData.git
cd openinsiderData
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
All settings are managed through config.yaml
:
output:
directory: data # Output directory for scraped data
filename: insider # Base filename for output files
format: csv # Output format (csv or parquet)
scraping:
start_year: 2024 # Start year
start_month: 3 # Start month
max_workers: 10 # Number of parallel downloads
retry_attempts: 3 # Number of retry attempts
timeout: 30 # Request timeout in seconds
filters:
min_transaction_value: 50000 # Minimum transaction value in USD
transaction_types: # Transaction types to include
- P - Purchase
- S - Sale
- F - Tax
exclude_companies: [] # Companies to exclude (by ticker)
min_shares_traded: 100 # Minimum number of shares
logging:
level: INFO # Logging level (DEBUG, INFO, WARNING, ERROR)
file: scraper.log # Log file name
rotate_logs: true # Enable log rotation
max_log_size: 10 # Max log size in MB
cache:
enabled: true # Enable caching
directory: .cache # Cache directory
max_age: 24 # Cache max age in hours
Run the scraper:
python openinsider_scraper.py
Build the container:
docker build -t openinsider-scraper .
Run the container:
docker run -v $(pwd)/data:/app/data openinsider-scraper
Available transaction types:
- P - Purchase
- S - Sale
- F - Tax
- D - Disposition
- G - Gift
- X - Exercise
- M - Options Exercise
- C - Conversion
- W - Will/Inheritance
- H - Holdings
- O - Other
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- If you encounter rate limiting, adjust the
max_workers
setting - For memory issues, try using Parquet format for large datasets
- Check the log file for detailed error messages
This tool is for educational purposes only. Ensure you comply with the website's terms of service and local regulations when scraping data.