Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: Integration Tests

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
test:
runs-on: ubuntu-latest

services:
mongodb:
image: mongo:6
ports:
- 27017:27017
options: >-
--health-cmd "mongosh --eval 'db.runCommand({ ping: 1 })'"
--health-interval 10s
--health-timeout 5s
--health-retries 5

env:
# MongoDB — points to the service container above
DB_URI_SECRET: mongodb://localhost:27017
DB_NAME: caper-ci-test

# Django
DJANGO_SECRET_KEY: ci-test-secret-key-not-for-production
DJANGO_SETTINGS_MODULE: caper.settings
ACCOUNT_DEFAULT_HTTP_PROTOCOL: http
SECURE_SSL_REDIRECT: "FALSE"
AMPLICON_ENV: ci

# OAuth providers — placeholder values; OAuth is not exercised in CI tests.
# To run browser tests that actually log in via OAuth, store real keys as
# GitHub repository secrets and reference them here with ${{ secrets.NAME }}.
GOOGLE_SECRET_KEY: ${{ secrets.GOOGLE_SECRET_KEY || 'ci-placeholder' }}
GLOBUS_SECRET_KEY: ${{ secrets.GLOBUS_SECRET_KEY || 'ci-placeholder' }}
RECAPTCHA_PRIVATE_KEY: ${{ secrets.RECAPTCHA_PRIVATE_KEY || 'ci-placeholder' }}
RECAPTCHA_PUBLIC_KEY: ${{ secrets.RECAPTCHA_PUBLIC_KEY || 'ci-placeholder' }}

# S3 — disabled in CI; downloads stay local
S3_FILE_DOWNLOADS: "FALSE"
S3_STATIC_FILES: "FALSE"

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: pip

- name: Install dependencies
run: |
pip install --upgrade pip
pip install -r requirements.txt
pip install pytest

- name: Run fast integration tests
# Runs only tests marked `integration` that are NOT slow, functional, or browser.
# This excludes:
# slow — full aggregation pipeline (requires AmpliconSuiteAggregator + minutes)
# functional — require `loaded_datasets` fixture (depend on slow tests completing)
# browser — require a running dev server and playwright
run: |
pytest -m "integration and not slow and not functional and not browser" \
-v --tb=short
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
config.sh*
caper/config.env
*.swp
.env
/caper/caper.sqlite3
Expand Down
187 changes: 187 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# AGENTS.md — AmpliconRepository

A genomics data repository (ampliconrepository.org) for storing, browsing, and analysing DNA amplicon results produced by [AmpliconSuiteAggregator](https://github.com/AmpliconSuite/AmpliconSuiteAggregator). Built with Django + Mezzanine CMS.

---

## Critical: Environment Setup Before Any Django Command

**Always** source `caper/config.sh` before running any Django management command. It sets all required env vars (MongoDB URI, OAuth secrets, S3, Neo4j, email).

```bash
# Required pattern
source caper/config.sh && cd caper && python manage.py <command>

# Or use the helper script from project root
./run_django_command.sh <command>
```

Never commit `caper/config.sh` or `caper/.env` to version control.

---

## Architecture Overview

### Dual-Database Design (key non-obvious detail)
The app uses **two completely separate databases**:

| Database | Purpose | Access |
|---|---|---|
| **SQLite** (`caper/caper.sqlite3`) | Django auth, sessions, Mezzanine CMS pages | Django ORM via `models.py` |
| **MongoDB** (`DB_URI_SECRET` env var) | All project/sample/feature data | PyMongo directly via `utils.py` globals |

**Do not** use Django ORM for project/sample data — all project queries go through `collection_handle` from `caper/caper/utils.py`. The `dbrouters.py` `RunsDBRouter` is a leftover artefact and not actively routing; real MongoDB access bypasses Django's ORM entirely.

### Third Database: Neo4j
Co-amplification graph data is stored in Neo4j (bolt port 7687). See `caper/caper/neo4j_utils.py`. The driver connects using `NEO4J_PASSWORD_SECRET` env var.

### Key Global Handles (defined at module level in `utils.py`)
```python
collection_handle # MongoDB 'projects' collection (secondary-preferred reads)
collection_handle_primary # Same collection, primary reads (for writes/admin)
audit_log_handle # MongoDB 'project_audit_log' collection
fs_handle # GridFS handle (large files / tarballs)
```
These are imported directly across `views.py`, `search.py`, `site_stats.py`, etc.

---

## Code Structure

```
caper/caper/ # Main Django app
views.py # ~5000 lines — primary request handlers
views_admin.py # Admin-only pages (stats, delete, email)
views_apis.py # REST upload API (FileUploadView, ProjectFileAddView)
utils.py # MongoDB connection + all shared helpers (1000+ lines)
models.py # SQLite-backed Django models (auth admin actions only)
settings.py # All config; reads env vars set by config.sh
neo4j_utils.py # Co-amplification graph load/query
search.py # MongoDB-based project/sample search
extra_metadata.py # CSV/TSV/XLSX metadata attachment to samples
gridfs_cache.py # Django cache wrapper around GridFS reads
tar_utils.py # Stream-extract files from GridFS-stored tarballs
site_stats.py # Aggregated stats stored in MongoDB 'site_statistics'
context_processor.py # Also stores system flags (shutdown, registration) in MongoDB
schema_validate.py # JSON schema validation for project documents
management/commands/create_project.py # CLI to create a project from local/HTTP/S3 file
caper/templates/ # Django templates (Mezzanine host-themes loader)
caper/schema/ # schema.json for validating MongoDB project documents
```

---

## Data Model (MongoDB)

Projects live in the `projects` collection. Notable fields:
- `private`: `"private"` | `"public"` | `"hidden_public"` (use `utils.normalize_visibility_field()` when reading legacy boolean values)
- `current: True` — only the latest version of a renamed/updated project
- `previous_versions` — list of prior project `_id`s (version chain)
- `delete: False` — soft-delete flag
- `runs` — dict of run-name → list of sample dicts
- `project_members` — comma-separated usernames/emails controlling access

Files (tarballs from AmpliconSuiteAggregator) are stored in **GridFS** and referenced by ObjectId within the project document. Use `tar_utils.extract_from_project_tarfile()` to stream-extract specific paths without writing the full tar to disk.

---

## Developer Workflows

### Local dev server
```bash
source caper/config.sh && cd caper && python manage.py runserver
# visit http://localhost:8000
```

### Docker dev (simplest for new setup)
```bash
mkdir -p logs tmp .aws .git
docker compose -f docker-compose-dev.yml build --no-cache
docker compose -f docker-compose-dev.yml up -d
# visit http://localhost:8000
docker compose -f docker-compose-dev.yml down
```

### Create a project from CLI
```bash
source caper/config.sh && cd caper && \
python manage.py create_project <project_name> <username> <path_or_url.tar.gz> \
--visibility public --description "My project"
```
Accepts local paths, HTTP URLs, or `s3://` URIs.

### Purge local MongoDB data
```bash
python purge-local-db.py
```

### Do NOT commit
- `caper/caper.sqlite3`
- `caper/config.sh` / `.env`

---

## Auth & Social Login

- Uses `django-allauth` with **Google** and **Globus** OAuth2 providers.
- `CustomAccountAdapter` and `SocialAccountAdapter` (in `utils.py`) prevent username/email cross-collisions and respect the `registration_disabled` flag stored in MongoDB `system_settings`.
- `ACCOUNT_EMAIL_VERIFICATION = 'none'` — email verification is off.

---

## Mezzanine CMS Integration

Mezzanine provides the CMS page tree, admin UI (Grappelli), and URL catch-all. **Add all custom URL patterns above** the `path("", include("mezzanine.urls"))` line in `urls.py` — Mezzanine's catch-all will shadow anything placed after it.

---

## Test-Driven Development

For bug fixes and new features, **start by writing a failing test** before touching production code. This keeps changes focused and verifiable.

### Workflow

1. **Write a failing test** that reproduces the bug or exercises the new behaviour.
2. Confirm the test fails for the right reason.
3. Implement the minimal code change to make the test pass.
4. Verify no existing tests regressed.

### Running tests

Tests live in `tests/`. There are two suites:

```bash
# Fast suite (mocked DB — no live MongoDB required)
source caper/config.sh && cd caper && python -m pytest ../tests/ -m "not slow" -v

# Slow suite (requires live MongoDB — the default for new tests)
source caper/config.sh && cd caper && python -m pytest ../tests/ -m slow -v

# Full suite
source caper/config.sh && cd caper && python -m pytest ../tests/ -v
```

### Test datasets

Additional test data files (tarballs, sample inputs, etc.) are available on Google Drive:
https://drive.google.com/drive/folders/1lp6NUPWg1C-72CQQeywucwX0swnBFDvu?usp=drive_link

New tests go in the **slow suite** by default (mark with `@pytest.mark.slow`). Only move a test to the fast suite if it genuinely requires no database access and can be fully covered by mocks.

```python
import pytest

@pytest.mark.slow
def test_my_feature(client, live_mongo):
# arrange → act → assert
...
```

---

## PR Checklist

- Never include `caper.sqlite3` in commits or PRs.
- Minimum manual smoke-test: home page, CCLE project page, any CCLE sample page.
- Versioned releases use tag pattern `v<major>.<minor>.<patch>_<MMDDYY>` (e.g., `v1.0.1_072523`).

71 changes: 60 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,14 @@ docker inspect -f \

# Running the automated tests <a name="tests"></a>

The test suite exercises project creation and editing end-to-end against a live MongoDB instance.
The test suite is organized into several tiers by speed and prerequisites, all driven by `pytest` from the repository root.

| Marker | Description | Prerequisites |
|--------|-------------|---------------|
| `integration` | Unit-style tests against live MongoDB; fast (< 5 s each) | MongoDB running |
| `slow` | Full aggregation pipeline per test (several minutes each) | MongoDB + AmpliconSuiteAggregator |
| `functional` | End-to-end view tests that depend on pre-loaded datasets | Everything above |
| `browser` | Playwright browser tests; require a running dev server | Everything above + dev server |

## Prerequisites

Expand All @@ -355,7 +362,9 @@ The same environment you use to run the development server is required:

2. **Install the test runner** (one-time, if not already present):
```bash
pip install pytest pytest-django
pip install pytest
# For browser tests only:
pip install pytest-playwright && playwright install chromium
```

3. **MongoDB running locally** — the tests write to the `caper-dev` database on `localhost:27017`:
Expand All @@ -367,30 +376,70 @@ The same environment you use to run the development server is required:
```bash
source caper/config.sh
```
The tests read `caper/config.env` automatically, but `config.sh` must have been sourced at least once in the current shell session so that any shell-level exports your environment depends on are present.
The tests read `caper/config.env` automatically, but `config.sh` must have been sourced
at least once in the current shell so that shell-level exports are present.

5. **AmpliconSuiteAggregator available** (for `slow` and `functional` tests only) — confirm
`AGGREGATOR_DEV_PATH` in `config.sh` points to a valid aggregator installation.
Tests that use sample-name remapping require v6 or later.

5. **AmpliconSuiteAggregator available** — confirm `AGGREGATOR_DEV_PATH` in `config.sh` points to a valid aggregator installation. Tests that use sample-name remapping require v6 or later.
6. **Test datasets present** — place the following files in `test_data/`:
- `one_amprepo_sample.tar.gz` + `one_amprepo_sample.xlsx` — 1 sample, hg19 (tracked in git)
- `Contino_unagg_040423.tar.gz` — 9 samples, hg38 (download from the shared Google Drive)
- `two_hg38_samples_no_ecdna.tar.gz` — 2 hg38 samples, no ecDNA (Google Drive)

The [testing datasets](https://drive.google.com/drive/folders/1lp6NUPWg1C-72CQQeywucwX0swnBFDvu?usp=share_link)
are available on Google Drive.

## Running the tests

From the top-level `caper/` directory (where `pytest.ini` lives):
All commands are run from the **repository root** (where `pytest.ini` lives).

**Fast integration tests only** (no aggregation; safe to run any time):
```bash
pytest -m "integration and not slow and not functional and not browser" -v
```

**Full integration + functional tests** (requires AmpliconSuiteAggregator; ~10 min):
```bash
pytest -m "integration and not browser" -v
```

**End-to-end project lifecycle** (slow, creates and aggregates real projects):
```bash
pytest -m "slow and integration" -v
```

Expected output with a correctly configured environment:
**Browser tests** (requires a running dev server on port 8000):
```bash
# Terminal 1: start the dev server
cd caper && python manage.py runserver

# Terminal 2: run browser tests
pytest -m browser --base-url http://localhost:8000 -v
```

Expected output for fast integration tests with a correctly configured environment:
```
tests/test_create_edit_project.py::test_create_tar_only PASSED
tests/test_create_edit_project.py::test_create_tar_and_metadata_no_remap PASSED
tests/test_create_edit_project.py::test_create_tar_and_metadata_with_remap PASSED
tests/test_create_edit_project.py::test_create_then_edit_with_remap PASSED
tests/test_api.py::test_background_task_status_returns_200 PASSED
tests/test_error_handling.py::test_create_project_without_file PASSED
tests/test_error_handling.py::test_project_page_nonexistent_id PASSED
tests/test_error_handling.py::test_download_nonexistent_project PASSED
```

## Continuous Integration

Pushes and pull requests to `main` automatically run the fast integration tests via
GitHub Actions (`.github/workflows/tests.yml`). The workflow spins up a MongoDB 6
service container and runs `pytest -m "integration and not slow and not functional and not browser"`.

Slow, functional, and browser tests are excluded from CI because they require
AmpliconSuiteAggregator, large test datasets, and a running dev server.

## Cleanup

Each test removes all artifacts it created — MongoDB documents, `tmp/` directories, and S3 objects — in a `finally` block, so cleanup happens even when a test fails.
Each test removes all artifacts it created — MongoDB documents, `tmp/` directories, and S3
objects — in a `finally` block, so cleanup happens even when a test fails.

---

Expand Down
Loading
Loading