A Python toolkit for interacting with Dataverse repositories
⚠️ Early Development: This project is in its early development stages. While functional, the API may change and some features are still being implemented. We welcome your feedback and contributions!
dartfx-dataverse is a Python package that facilitates programmatic interactions with Dataverse server installations via their API. The package focuses on discovery and access rather than content management, making it ideal for researchers, data scientists, and developers who need to search and retrieve data from Dataverse repositories.
- 🔍 Powerful Search: Advanced search capabilities with filtering, faceting, and geographic queries
- 🌍 Server Discovery: Retrieve information about known Dataverse installations worldwide
- 🛡️ Type-Safe: Built with Pydantic models for robust data validation
- ⚡ Performance: Built-in request caching for improved performance
- đź”§ Configurable: Flexible error handling, SSL verification, and session management
- 📚 Well-Documented: Comprehensive documentation with examples
- Retrieve server installation information and metadata
- Search datasets, dataverses, and files
- Advanced search with filters, facets, and geographic queries
- Paginated result handling
- Comprehensive error handling
- Request caching support
- Python 3.12 or higher
- uv or pip for package management
Note: This package is not yet published on PyPI. Please use the development installation method below.
To install the package, clone the repository and install locally:
-
Clone the Repository:
git clone https://github.com/DataArtifex/dataverse-toolkit.git cd dataverse-toolkit -
Install in Editable Mode:
Using uv (recommended):
uv pip install -e ".[dev]"Or using pip:
pip install -e ".[dev]" -
Using Hatch (Recommended for Development):
# Install Hatch uv tool install hatch # Activate development environment hatch shell # Run tests hatch run test
Once stable, this package will be released on PyPI. Installation will then be:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dartfx-dataverse
uv pip install dartfx-dataversepip install dartfx-dataverseGet a list of known Dataverse installations worldwide:
from dartfx.dataverse import fetch_dataverse_installations
# Fetch all known installations
installations = fetch_dataverse_installations()
# Display first 5
for installation in installations[:5]:
print(f"{installation.name}: {installation.hostname}")Create a connection to a specific Dataverse server:
from dartfx.dataverse import DataverseServer, ServerInstallation
# Create server installation object
harvard = ServerInstallation(
name="Harvard Dataverse",
hostname="dataverse.harvard.edu"
)
# Create server connection
server = DataverseServer(installation=harvard)
# Get server information
info = server.get_server_info()
print(f"Server version: {info['data']['version']}")Perform searches with various options:
from dartfx.dataverse import SearchParameters
# Simple search
results = server.search_simple("climate change")
print(f"Found {results['data']['total_count']} results")
# Advanced search with parameters
params = SearchParameters(
q="climate change",
type="dataset",
per_page=20,
sort="date",
order="desc",
show_facets=True
)
results = server.search(params)
for item in results['data']['items']:
print(f"- {item['name']}")# Search with filters
params = SearchParameters(
q="*",
type="dataset",
fq=[
"publicationDate:[2020 TO *]", # From 2020 onwards
"authorName:Smith" # Author is Smith
]
)
# Geographic search
params = SearchParameters(
q="environment",
geo_point="42.3601,-71.0589", # Boston, MA
geo_radius="50" # 50 km radius
)
# Search with metadata fields
params = SearchParameters(
q="health",
metadata_fields=["citation", "identifier", "subjects"]
)Comprehensive documentation is available, including:
- Installation Guide: Detailed installation instructions and requirements
- Quick Start: Get up and running in minutes
- Usage Guide: In-depth coverage of all features
- API Reference: Complete API documentation
- Examples: Real-world use cases and code examples
- Contributing Guide: How to contribute to the project
Visit the full documentation for more details.
This is an early development release. The core functionality is working, but APIs may change.
- Pydantic models for search results
- Enhanced error messages and debugging
- Batch operation support
- Progress indicators for long-running operations
- Dataset metadata retrieval (DDI, Dublin Core, DataCite)
- File metadata retrieval
- Support for additional metadata formats (Croissant, schema.org)
- Dataset and file download capabilities
- Download progress tracking
- Resume interrupted downloads
- Stable API
- Complete test coverage
- Performance optimizations
- Full documentation
We welcome contributions! Here's how you can help:
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR-USERNAME/dataverse-toolkit.git - Create a feature branch:
git checkout -b feature/your-feature-name - Set up development environment:
uv tool install hatch hatch shell
# Run tests
hatch run test
# Run tests with coverage
hatch run cov
# Type checking
hatch run types:check
# Format code
ruff format .
# Lint code
ruff check . --fix- Make your changes and add tests
- Ensure all tests pass:
hatch run test - Commit your changes:
git commit -am 'Add some feature' - Push to your fork:
git push origin feature/your-feature-name - Submit a pull request
See the Contributing Guide for detailed guidelines.
This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.
This project is licensed under the MIT License - see the LICENSE.txt file for details.
- Built with Pydantic for data validation
- Uses Requests and requests-cache for HTTP operations
- Developed using Hatch project manager
- Documentation built with Sphinx
- Documentation: https://dataverse-toolkit.readthedocs.io/
- Source Code: https://github.com/DataArtifex/dataverse-toolkit
- Issue Tracker: https://github.com/DataArtifex/dataverse-toolkit/issues
- PyPI: https://pypi.org/project/dartfx-dataverse/
- Dataverse Project: https://dataverse.org/
If you encounter issues or have questions:
- Check the documentation
- Search existing issues
- Create a new issue if needed
If you use this package in your research, please cite:
@software{dartfx_dataverse,
author = {Heus, Pascal},
title = {dartfx-dataverse: A Python toolkit for Dataverse repositories},
year = {2024},
url = {https://github.com/DataArtifex/dataverse-toolkit}
}Maintained by Data Artifex | Author: Pascal Heus