|
| 1 | +# AGENTS.md — AmpliconRepository |
| 2 | + |
| 3 | +A genomics data repository (ampliconrepository.org) for storing, browsing, and analysing DNA amplicon results produced by [AmpliconSuiteAggregator](https://github.com/AmpliconSuite/AmpliconSuiteAggregator). Built with Django + Mezzanine CMS. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Critical: Environment Setup Before Any Django Command |
| 8 | + |
| 9 | +**Always** source `caper/config.sh` before running any Django management command. It sets all required env vars (MongoDB URI, OAuth secrets, S3, Neo4j, email). |
| 10 | + |
| 11 | +```bash |
| 12 | +# Required pattern |
| 13 | +source caper/config.sh && cd caper && python manage.py <command> |
| 14 | + |
| 15 | +# Or use the helper script from project root |
| 16 | +./run_django_command.sh <command> |
| 17 | +``` |
| 18 | + |
| 19 | +Never commit `caper/config.sh` or `caper/.env` to version control. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Architecture Overview |
| 24 | + |
| 25 | +### Dual-Database Design (key non-obvious detail) |
| 26 | +The app uses **two completely separate databases**: |
| 27 | + |
| 28 | +| Database | Purpose | Access | |
| 29 | +|---|---|---| |
| 30 | +| **SQLite** (`caper/caper.sqlite3`) | Django auth, sessions, Mezzanine CMS pages | Django ORM via `models.py` | |
| 31 | +| **MongoDB** (`DB_URI_SECRET` env var) | All project/sample/feature data | PyMongo directly via `utils.py` globals | |
| 32 | + |
| 33 | +**Do not** use Django ORM for project/sample data — all project queries go through `collection_handle` from `caper/caper/utils.py`. The `dbrouters.py` `RunsDBRouter` is a leftover artefact and not actively routing; real MongoDB access bypasses Django's ORM entirely. |
| 34 | + |
| 35 | +### Third Database: Neo4j |
| 36 | +Co-amplification graph data is stored in Neo4j (bolt port 7687). See `caper/caper/neo4j_utils.py`. The driver connects using `NEO4J_PASSWORD_SECRET` env var. |
| 37 | + |
| 38 | +### Key Global Handles (defined at module level in `utils.py`) |
| 39 | +```python |
| 40 | +collection_handle # MongoDB 'projects' collection (secondary-preferred reads) |
| 41 | +collection_handle_primary # Same collection, primary reads (for writes/admin) |
| 42 | +audit_log_handle # MongoDB 'project_audit_log' collection |
| 43 | +fs_handle # GridFS handle (large files / tarballs) |
| 44 | +``` |
| 45 | +These are imported directly across `views.py`, `search.py`, `site_stats.py`, etc. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Code Structure |
| 50 | + |
| 51 | +``` |
| 52 | +caper/caper/ # Main Django app |
| 53 | + views.py # ~5000 lines — primary request handlers |
| 54 | + views_admin.py # Admin-only pages (stats, delete, email) |
| 55 | + views_apis.py # REST upload API (FileUploadView, ProjectFileAddView) |
| 56 | + utils.py # MongoDB connection + all shared helpers (1000+ lines) |
| 57 | + models.py # SQLite-backed Django models (auth admin actions only) |
| 58 | + settings.py # All config; reads env vars set by config.sh |
| 59 | + neo4j_utils.py # Co-amplification graph load/query |
| 60 | + search.py # MongoDB-based project/sample search |
| 61 | + extra_metadata.py # CSV/TSV/XLSX metadata attachment to samples |
| 62 | + gridfs_cache.py # Django cache wrapper around GridFS reads |
| 63 | + tar_utils.py # Stream-extract files from GridFS-stored tarballs |
| 64 | + site_stats.py # Aggregated stats stored in MongoDB 'site_statistics' |
| 65 | + context_processor.py # Also stores system flags (shutdown, registration) in MongoDB |
| 66 | + schema_validate.py # JSON schema validation for project documents |
| 67 | + management/commands/create_project.py # CLI to create a project from local/HTTP/S3 file |
| 68 | +caper/templates/ # Django templates (Mezzanine host-themes loader) |
| 69 | +caper/schema/ # schema.json for validating MongoDB project documents |
| 70 | +``` |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Data Model (MongoDB) |
| 75 | + |
| 76 | +Projects live in the `projects` collection. Notable fields: |
| 77 | +- `private`: `"private"` | `"public"` | `"hidden_public"` (use `utils.normalize_visibility_field()` when reading legacy boolean values) |
| 78 | +- `current: True` — only the latest version of a renamed/updated project |
| 79 | +- `previous_versions` — list of prior project `_id`s (version chain) |
| 80 | +- `delete: False` — soft-delete flag |
| 81 | +- `runs` — dict of run-name → list of sample dicts |
| 82 | +- `project_members` — comma-separated usernames/emails controlling access |
| 83 | + |
| 84 | +Files (tarballs from AmpliconSuiteAggregator) are stored in **GridFS** and referenced by ObjectId within the project document. Use `tar_utils.extract_from_project_tarfile()` to stream-extract specific paths without writing the full tar to disk. |
| 85 | + |
| 86 | +--- |
| 87 | + |
| 88 | +## Developer Workflows |
| 89 | + |
| 90 | +### Local dev server |
| 91 | +```bash |
| 92 | +source caper/config.sh && cd caper && python manage.py runserver |
| 93 | +# visit http://localhost:8000 |
| 94 | +``` |
| 95 | + |
| 96 | +### Docker dev (simplest for new setup) |
| 97 | +```bash |
| 98 | +mkdir -p logs tmp .aws .git |
| 99 | +docker compose -f docker-compose-dev.yml build --no-cache |
| 100 | +docker compose -f docker-compose-dev.yml up -d |
| 101 | +# visit http://localhost:8000 |
| 102 | +docker compose -f docker-compose-dev.yml down |
| 103 | +``` |
| 104 | + |
| 105 | +### Create a project from CLI |
| 106 | +```bash |
| 107 | +source caper/config.sh && cd caper && \ |
| 108 | + python manage.py create_project <project_name> <username> <path_or_url.tar.gz> \ |
| 109 | + --visibility public --description "My project" |
| 110 | +``` |
| 111 | +Accepts local paths, HTTP URLs, or `s3://` URIs. |
| 112 | + |
| 113 | +### Purge local MongoDB data |
| 114 | +```bash |
| 115 | +python purge-local-db.py |
| 116 | +``` |
| 117 | + |
| 118 | +### Do NOT commit |
| 119 | +- `caper/caper.sqlite3` |
| 120 | +- `caper/config.sh` / `.env` |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Auth & Social Login |
| 125 | + |
| 126 | +- Uses `django-allauth` with **Google** and **Globus** OAuth2 providers. |
| 127 | +- `CustomAccountAdapter` and `SocialAccountAdapter` (in `utils.py`) prevent username/email cross-collisions and respect the `registration_disabled` flag stored in MongoDB `system_settings`. |
| 128 | +- `ACCOUNT_EMAIL_VERIFICATION = 'none'` — email verification is off. |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +## Mezzanine CMS Integration |
| 133 | + |
| 134 | +Mezzanine provides the CMS page tree, admin UI (Grappelli), and URL catch-all. **Add all custom URL patterns above** the `path("", include("mezzanine.urls"))` line in `urls.py` — Mezzanine's catch-all will shadow anything placed after it. |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## Test-Driven Development |
| 139 | + |
| 140 | +For bug fixes and new features, **start by writing a failing test** before touching production code. This keeps changes focused and verifiable. |
| 141 | + |
| 142 | +### Workflow |
| 143 | + |
| 144 | +1. **Write a failing test** that reproduces the bug or exercises the new behaviour. |
| 145 | +2. Confirm the test fails for the right reason. |
| 146 | +3. Implement the minimal code change to make the test pass. |
| 147 | +4. Verify no existing tests regressed. |
| 148 | + |
| 149 | +### Running tests |
| 150 | + |
| 151 | +Tests live in `tests/`. There are two suites: |
| 152 | + |
| 153 | +```bash |
| 154 | +# Fast suite (mocked DB — no live MongoDB required) |
| 155 | +source caper/config.sh && cd caper && python -m pytest ../tests/ -m "not slow" -v |
| 156 | + |
| 157 | +# Slow suite (requires live MongoDB — the default for new tests) |
| 158 | +source caper/config.sh && cd caper && python -m pytest ../tests/ -m slow -v |
| 159 | + |
| 160 | +# Full suite |
| 161 | +source caper/config.sh && cd caper && python -m pytest ../tests/ -v |
| 162 | +``` |
| 163 | + |
| 164 | +New tests go in the **slow suite** by default (mark with `@pytest.mark.slow`). Only move a test to the fast suite if it genuinely requires no database access and can be fully covered by mocks. |
| 165 | + |
| 166 | +```python |
| 167 | +import pytest |
| 168 | + |
| 169 | +@pytest.mark.slow |
| 170 | +def test_my_feature(client, live_mongo): |
| 171 | + # arrange → act → assert |
| 172 | + ... |
| 173 | +``` |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## PR Checklist |
| 178 | + |
| 179 | +- Never include `caper.sqlite3` in commits or PRs. |
| 180 | +- Minimum manual smoke-test: home page, CCLE project page, any CCLE sample page. |
| 181 | +- Versioned releases use tag pattern `v<major>.<minor>.<patch>_<MMDDYY>` (e.g., `v1.0.1_072523`). |
| 182 | + |
0 commit comments