A full-stack application for managing scientific datasets and their providers. Built with FastAPI backend and React frontend, containerized with Docker for easy deployment.
- Overview
- Features
- Getting Started
- Architecture
- Environment Setup
- Development Workflow
- Production Deployment
- API Documentation
- Traefik Integration
- Maintenance Mode
- Troubleshooting
- Contributing
- License
Aggregator is designed to catalog and manage scientific datasets and their providers. It allows users to browse datasets, administrators to manage user access, and provides a comprehensive API for integration with other systems.
-
User Authentication and Authorization
- JWT-based authentication
- Role-based access control
- Provider-specific permissions
-
Provider Management
- Create, read, update, and delete data providers
- Associate metadata with providers
-
Dataset Management
- Organize datasets under providers
- Track dataset sources and landing pages
- Manage XML archives and useful links
-
Modern Web Interface
- Responsive React-based frontend
- User-friendly dashboard
-
API Integration
- RESTful API for all operations
- Legacy harvesting support
- Comprehensive documentation
As a user of Aggregator, you can:
-
Access the Platform:
- Navigate to the application URL in your web browser
- Log in with your provided credentials
-
Browse Datasets:
- View available datasets organized by provider
- Access dataset details and related resources
-
Use Dataset Resources:
- Follow links to dataset landing pages
- Access XML archives
- Use provided useful links for additional information
As an administrator, you have additional capabilities:
-
User Management:
- Create new user accounts
- Assign roles and permissions
- Manage provider-specific access
-
Provider Administration:
- Add new data providers to the system
- Update provider information
- Remove providers when necessary
-
Dataset Administration:
- Add, update, or remove datasets
- Manage dataset metadata
- Organize datasets under appropriate providers
As a developer working with Aggregator:
-
Local Development Setup:
# Clone the repository git clone <repository-url> cd aggregator # Copy and configure environment files cp .env.example .env cp backend/.env.example backend/.env cp frontend/.env.example frontend/.env # IMPORTANT: Generate a secure SECRET_KEY for backend/.env: # python -c "import secrets; print(secrets.token_hex(32))" # Replace the default SECRET_KEY with your generated key # Start development environment docker-compose up
-
API Integration:
- Use the API documentation at
/docsto understand available endpoints - Authenticate with JWT tokens
- Make API calls to integrate with your systems
- Use the API documentation at
Aggregator follows a modern microservices architecture:
- Backend: FastAPI application providing RESTful API endpoints
- Frontend: React single-page application for the user interface
- Database: PostgreSQL database for persistent storage
- Reverse Proxy: Traefik for routing, load balancing, and service discovery
The application uses Traefik as a modern reverse proxy and load balancer:
- Automatic Service Discovery: Traefik automatically discovers services through Docker labels
- Path-Based Routing:
/api/*routes are directed to the backend service- All other routes are directed to the frontend service
- Network Isolation: Services are connected through a dedicated Docker network (
app-network) - Security:
- Only containers explicitly enabled with
traefik.enable=truelabel are exposed - SSL/TLS configuration (commented out but ready for production use)
- API dashboard is disabled by default for security
- Only containers explicitly enabled with
To add a new service to Traefik:
- Connect the service to the
app-networkin docker-compose - Add the following labels to your service:
labels: - "traefik.enable=true" - "traefik.http.routers.[service-name].rule=PathPrefix(`/your-path`)" - "traefik.http.routers.[service-name].entrypoints=web" - "traefik.http.services.[service-name].loadbalancer.server.port=[internal-port]"
For production deployments, uncomment and configure the HTTPS sections in traefik/traefik.yml and update the acme.json file permissions to 600.
The application includes a maintenance mode feature that can be enabled during deployments or updates:
When enabled, a maintenance page is displayed to users while the application services are being updated. This is implemented using Traefik's dynamic routing rules:
- A dedicated
maintenancecontainer serves a static maintenance page - The container has a higher priority route that intercepts all traffic
- Backend and frontend services are temporarily disabled in Traefik routing
-
Using CI/CD Variables:
- In GitLab, go to Settings > CI/CD > Variables
- Add variable
MAINTENANCE_MODEwith valuetrue - Run a deployment to enable maintenance mode
- Set back to
falseand re-deploy when maintenance is complete
-
For a Single Deployment:
- When manually triggering a pipeline, set variable
MAINTENANCE_MODE=true - After completing maintenance, run another deployment with
MAINTENANCE_MODE=false
- When manually triggering a pipeline, set variable
You can also directly enable/disable maintenance mode on the server:
# Enable maintenance mode
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" maintenance
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" backend
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" frontend
# Disable maintenance mode
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" maintenance
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" backend
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" frontendThe maintenance page is located at /maintenance/index.html and can be customized:
- Content: Modify the HTML to change the maintenance message
- Styling: Update the CSS in the style section
- Countdown: By default, the page shows a 30-minute countdown
- Behavior: The page will automatically refresh after the countdown ends
The maintenance page includes:
- GFBio branding
- Informative message about the maintenance
- Visual countdown timer
- Automatic refresh to check if the service is back online
The application uses environment variables for configuration:
-
Root
.envfile:# Database configuration DB_USER=user DB_PASSWORD=password DB_NAME=dbname # Security SECRET_KEY=your-secret-key # Frontend configuration (for development) REACT_APP_API_URL=http://localhost:8000 -
Environment-specific configuration:
- Development: Uses local directories mounted as volumes for hot-reloading
- Production: Uses built Docker images with optimized settings
-
Start the Development Environment:
docker-compose up
-
Backend Development:
- Edit files in the
backend/directory - FastAPI hot-reloads changes automatically
- Access API documentation at
http://localhost:8000/docs
- Edit files in the
-
Frontend Development:
- Edit files in the
frontend/directory - React development server hot-reloads changes
- Access frontend at
http://localhost:3000
- Edit files in the
-
Database Migrations:
# Inside the backend container alembic revision --autogenerate -m "description" alembic upgrade head
-
Build and Start Production Services:
docker-compose -f docker-compose.prod.yml up -d
-
Access the Application:
- Frontend:
http://your-server - Backend API:
http://your-server/api
- Frontend:
-
Scaling Considerations:
- Adjust memory limits in
docker-compose.prod.ymlif needed - Consider using a container orchestration platform for larger deployments
- Adjust memory limits in
Once the application is running, you can access:
- Interactive API documentation:
http://localhost:8000/docs(development) orhttp://your-server/api/docs(production) - Alternative API documentation:
http://localhost:8000/redoc(development) orhttp://your-server/api/redoc(production)
-
Database Connection Errors:
- Verify database credentials in
.env - Ensure PostgreSQL service is running
- Check network connectivity between containers
- Verify database credentials in
-
Frontend Not Loading:
- Check browser console for JavaScript errors
- Verify API URL configuration
- Ensure Nginx is properly configured
-
API Request Failures:
- Verify authentication token is valid
- Check CORS configuration
- Ensure proper permissions for the requested operation
-
Docker Issues:
- Run
docker-compose downand thendocker-compose upto rebuild - Check Docker logs with
docker-compose logs - Verify Docker and Docker Compose versions
- Run
-
Traefik Routing Issues:
- Verify container labels are correctly configured
- Check Traefik logs with
docker-compose logs traefik - Ensure your service is connected to the
app-network - Check that the container has
traefik.enable=truelabel
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature-name - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.