Skip to content

Commit ffbc36c

Browse files
committed
adding agents.md with TDD directive
1 parent d9359fb commit ffbc36c

1 file changed

Lines changed: 182 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# AGENTS.md — AmpliconRepository
2+
3+
A genomics data repository (ampliconrepository.org) for storing, browsing, and analysing DNA amplicon results produced by [AmpliconSuiteAggregator](https://github.com/AmpliconSuite/AmpliconSuiteAggregator). Built with Django + Mezzanine CMS.
4+
5+
---
6+
7+
## Critical: Environment Setup Before Any Django Command
8+
9+
**Always** source `caper/config.sh` before running any Django management command. It sets all required env vars (MongoDB URI, OAuth secrets, S3, Neo4j, email).
10+
11+
```bash
12+
# Required pattern
13+
source caper/config.sh && cd caper && python manage.py <command>
14+
15+
# Or use the helper script from project root
16+
./run_django_command.sh <command>
17+
```
18+
19+
Never commit `caper/config.sh` or `caper/.env` to version control.
20+
21+
---
22+
23+
## Architecture Overview
24+
25+
### Dual-Database Design (key non-obvious detail)
26+
The app uses **two completely separate databases**:
27+
28+
| Database | Purpose | Access |
29+
|---|---|---|
30+
| **SQLite** (`caper/caper.sqlite3`) | Django auth, sessions, Mezzanine CMS pages | Django ORM via `models.py` |
31+
| **MongoDB** (`DB_URI_SECRET` env var) | All project/sample/feature data | PyMongo directly via `utils.py` globals |
32+
33+
**Do not** use Django ORM for project/sample data — all project queries go through `collection_handle` from `caper/caper/utils.py`. The `dbrouters.py` `RunsDBRouter` is a leftover artefact and not actively routing; real MongoDB access bypasses Django's ORM entirely.
34+
35+
### Third Database: Neo4j
36+
Co-amplification graph data is stored in Neo4j (bolt port 7687). See `caper/caper/neo4j_utils.py`. The driver connects using `NEO4J_PASSWORD_SECRET` env var.
37+
38+
### Key Global Handles (defined at module level in `utils.py`)
39+
```python
40+
collection_handle # MongoDB 'projects' collection (secondary-preferred reads)
41+
collection_handle_primary # Same collection, primary reads (for writes/admin)
42+
audit_log_handle # MongoDB 'project_audit_log' collection
43+
fs_handle # GridFS handle (large files / tarballs)
44+
```
45+
These are imported directly across `views.py`, `search.py`, `site_stats.py`, etc.
46+
47+
---
48+
49+
## Code Structure
50+
51+
```
52+
caper/caper/ # Main Django app
53+
views.py # ~5000 lines — primary request handlers
54+
views_admin.py # Admin-only pages (stats, delete, email)
55+
views_apis.py # REST upload API (FileUploadView, ProjectFileAddView)
56+
utils.py # MongoDB connection + all shared helpers (1000+ lines)
57+
models.py # SQLite-backed Django models (auth admin actions only)
58+
settings.py # All config; reads env vars set by config.sh
59+
neo4j_utils.py # Co-amplification graph load/query
60+
search.py # MongoDB-based project/sample search
61+
extra_metadata.py # CSV/TSV/XLSX metadata attachment to samples
62+
gridfs_cache.py # Django cache wrapper around GridFS reads
63+
tar_utils.py # Stream-extract files from GridFS-stored tarballs
64+
site_stats.py # Aggregated stats stored in MongoDB 'site_statistics'
65+
context_processor.py # Also stores system flags (shutdown, registration) in MongoDB
66+
schema_validate.py # JSON schema validation for project documents
67+
management/commands/create_project.py # CLI to create a project from local/HTTP/S3 file
68+
caper/templates/ # Django templates (Mezzanine host-themes loader)
69+
caper/schema/ # schema.json for validating MongoDB project documents
70+
```
71+
72+
---
73+
74+
## Data Model (MongoDB)
75+
76+
Projects live in the `projects` collection. Notable fields:
77+
- `private`: `"private"` | `"public"` | `"hidden_public"` (use `utils.normalize_visibility_field()` when reading legacy boolean values)
78+
- `current: True` — only the latest version of a renamed/updated project
79+
- `previous_versions` — list of prior project `_id`s (version chain)
80+
- `delete: False` — soft-delete flag
81+
- `runs` — dict of run-name → list of sample dicts
82+
- `project_members` — comma-separated usernames/emails controlling access
83+
84+
Files (tarballs from AmpliconSuiteAggregator) are stored in **GridFS** and referenced by ObjectId within the project document. Use `tar_utils.extract_from_project_tarfile()` to stream-extract specific paths without writing the full tar to disk.
85+
86+
---
87+
88+
## Developer Workflows
89+
90+
### Local dev server
91+
```bash
92+
source caper/config.sh && cd caper && python manage.py runserver
93+
# visit http://localhost:8000
94+
```
95+
96+
### Docker dev (simplest for new setup)
97+
```bash
98+
mkdir -p logs tmp .aws .git
99+
docker compose -f docker-compose-dev.yml build --no-cache
100+
docker compose -f docker-compose-dev.yml up -d
101+
# visit http://localhost:8000
102+
docker compose -f docker-compose-dev.yml down
103+
```
104+
105+
### Create a project from CLI
106+
```bash
107+
source caper/config.sh && cd caper && \
108+
python manage.py create_project <project_name> <username> <path_or_url.tar.gz> \
109+
--visibility public --description "My project"
110+
```
111+
Accepts local paths, HTTP URLs, or `s3://` URIs.
112+
113+
### Purge local MongoDB data
114+
```bash
115+
python purge-local-db.py
116+
```
117+
118+
### Do NOT commit
119+
- `caper/caper.sqlite3`
120+
- `caper/config.sh` / `.env`
121+
122+
---
123+
124+
## Auth & Social Login
125+
126+
- Uses `django-allauth` with **Google** and **Globus** OAuth2 providers.
127+
- `CustomAccountAdapter` and `SocialAccountAdapter` (in `utils.py`) prevent username/email cross-collisions and respect the `registration_disabled` flag stored in MongoDB `system_settings`.
128+
- `ACCOUNT_EMAIL_VERIFICATION = 'none'` — email verification is off.
129+
130+
---
131+
132+
## Mezzanine CMS Integration
133+
134+
Mezzanine provides the CMS page tree, admin UI (Grappelli), and URL catch-all. **Add all custom URL patterns above** the `path("", include("mezzanine.urls"))` line in `urls.py` — Mezzanine's catch-all will shadow anything placed after it.
135+
136+
---
137+
138+
## Test-Driven Development
139+
140+
For bug fixes and new features, **start by writing a failing test** before touching production code. This keeps changes focused and verifiable.
141+
142+
### Workflow
143+
144+
1. **Write a failing test** that reproduces the bug or exercises the new behaviour.
145+
2. Confirm the test fails for the right reason.
146+
3. Implement the minimal code change to make the test pass.
147+
4. Verify no existing tests regressed.
148+
149+
### Running tests
150+
151+
Tests live in `tests/`. There are two suites:
152+
153+
```bash
154+
# Fast suite (mocked DB — no live MongoDB required)
155+
source caper/config.sh && cd caper && python -m pytest ../tests/ -m "not slow" -v
156+
157+
# Slow suite (requires live MongoDB — the default for new tests)
158+
source caper/config.sh && cd caper && python -m pytest ../tests/ -m slow -v
159+
160+
# Full suite
161+
source caper/config.sh && cd caper && python -m pytest ../tests/ -v
162+
```
163+
164+
New tests go in the **slow suite** by default (mark with `@pytest.mark.slow`). Only move a test to the fast suite if it genuinely requires no database access and can be fully covered by mocks.
165+
166+
```python
167+
import pytest
168+
169+
@pytest.mark.slow
170+
def test_my_feature(client, live_mongo):
171+
# arrange → act → assert
172+
...
173+
```
174+
175+
---
176+
177+
## PR Checklist
178+
179+
- Never include `caper.sqlite3` in commits or PRs.
180+
- Minimum manual smoke-test: home page, CCLE project page, any CCLE sample page.
181+
- Versioned releases use tag pattern `v<major>.<minor>.<patch>_<MMDDYY>` (e.g., `v1.0.1_072523`).
182+

0 commit comments

Comments
 (0)