AmpliconSuite · liefeld · May 13, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,71 @@
+name: Integration Tests
+
+on:
+  push:
+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+
+    services:
+      mongodb:
+        image: mongo:6
+        ports:
+          - 27017:27017
+        options: >-
+          --health-cmd "mongosh --eval 'db.runCommand({ ping: 1 })'"
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+
+    env:
+      # MongoDB — points to the service container above
+      DB_URI_SECRET: mongodb://localhost:27017
+      DB_NAME: caper-ci-test
+
+      # Django
+      DJANGO_SECRET_KEY: ci-test-secret-key-not-for-production
+      DJANGO_SETTINGS_MODULE: caper.settings
+      ACCOUNT_DEFAULT_HTTP_PROTOCOL: http
+      SECURE_SSL_REDIRECT: "FALSE"
+      AMPLICON_ENV: ci
+
+      # OAuth providers — placeholder values; OAuth is not exercised in CI tests.
+      # To run browser tests that actually log in via OAuth, store real keys as
+      # GitHub repository secrets and reference them here with ${{ secrets.NAME }}.
+      GOOGLE_SECRET_KEY: ${{ secrets.GOOGLE_SECRET_KEY || 'ci-placeholder' }}
+      GLOBUS_SECRET_KEY: ${{ secrets.GLOBUS_SECRET_KEY || 'ci-placeholder' }}
+      RECAPTCHA_PRIVATE_KEY: ${{ secrets.RECAPTCHA_PRIVATE_KEY || 'ci-placeholder' }}
+      RECAPTCHA_PUBLIC_KEY: ${{ secrets.RECAPTCHA_PUBLIC_KEY || 'ci-placeholder' }}
+
+      # S3 — disabled in CI; downloads stay local
+      S3_FILE_DOWNLOADS: "FALSE"
+      S3_STATIC_FILES: "FALSE"
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+          cache: pip
+
+      - name: Install dependencies
+        run: |
+          pip install --upgrade pip
+          pip install -r requirements.txt
+          pip install pytest
+
+      - name: Run fast integration tests
+        # Runs only tests marked `integration` that are NOT slow, functional, or browser.
+        # This excludes:
+        #   slow       — full aggregation pipeline (requires AmpliconSuiteAggregator + minutes)
+        #   functional — require `loaded_datasets` fixture (depend on slow tests completing)
+        #   browser    — require a running dev server and playwright
+        run: |
+          pytest -m "integration and not slow and not functional and not browser" \
+                 -v --tb=short
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,5 @@
 config.sh*
+caper/config.env
 *.swp
 .env
 /caper/caper.sqlite3

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,187 @@
+# AGENTS.md — AmpliconRepository
+
+A genomics data repository (ampliconrepository.org) for storing, browsing, and analysing DNA amplicon results produced by [AmpliconSuiteAggregator](https://github.com/AmpliconSuite/AmpliconSuiteAggregator). Built with Django + Mezzanine CMS.
+
+---
+
+## Critical: Environment Setup Before Any Django Command
+
+**Always** source `caper/config.sh` before running any Django management command. It sets all required env vars (MongoDB URI, OAuth secrets, S3, Neo4j, email).
+
+```bash
+# Required pattern
+source caper/config.sh && cd caper && python manage.py <command>
+
+# Or use the helper script from project root
+./run_django_command.sh <command>
+```
+
+Never commit `caper/config.sh` or `caper/.env` to version control.
+
+---
+
+## Architecture Overview
+
+### Dual-Database Design (key non-obvious detail)
+The app uses **two completely separate databases**:
+
+| Database | Purpose | Access |
+|---|---|---|
+| **SQLite** (`caper/caper.sqlite3`) | Django auth, sessions, Mezzanine CMS pages | Django ORM via `models.py` |
+| **MongoDB** (`DB_URI_SECRET` env var) | All project/sample/feature data | PyMongo directly via `utils.py` globals |
+
+**Do not** use Django ORM for project/sample data — all project queries go through `collection_handle` from `caper/caper/utils.py`. The `dbrouters.py` `RunsDBRouter` is a leftover artefact and not actively routing; real MongoDB access bypasses Django's ORM entirely.
+
+### Third Database: Neo4j
+Co-amplification graph data is stored in Neo4j (bolt port 7687). See `caper/caper/neo4j_utils.py`. The driver connects using `NEO4J_PASSWORD_SECRET` env var.
+
+### Key Global Handles (defined at module level in `utils.py`)
+```python
+collection_handle          # MongoDB 'projects' collection (secondary-preferred reads)
+collection_handle_primary  # Same collection, primary reads (for writes/admin)
+audit_log_handle           # MongoDB 'project_audit_log' collection
+fs_handle                  # GridFS handle (large files / tarballs)
+```
+These are imported directly across `views.py`, `search.py`, `site_stats.py`, etc.
+
+---
+
+## Code Structure
+
+```
+caper/caper/           # Main Django app
+  views.py             # ~5000 lines — primary request handlers
+  views_admin.py       # Admin-only pages (stats, delete, email)
+  views_apis.py        # REST upload API (FileUploadView, ProjectFileAddView)
+  utils.py             # MongoDB connection + all shared helpers (1000+ lines)
+  models.py            # SQLite-backed Django models (auth admin actions only)
+  settings.py          # All config; reads env vars set by config.sh
+  neo4j_utils.py       # Co-amplification graph load/query
+  search.py            # MongoDB-based project/sample search
+  extra_metadata.py    # CSV/TSV/XLSX metadata attachment to samples
+  gridfs_cache.py      # Django cache wrapper around GridFS reads
+  tar_utils.py         # Stream-extract files from GridFS-stored tarballs
+  site_stats.py        # Aggregated stats stored in MongoDB 'site_statistics'
+  context_processor.py # Also stores system flags (shutdown, registration) in MongoDB
+  schema_validate.py   # JSON schema validation for project documents
+  management/commands/create_project.py  # CLI to create a project from local/HTTP/S3 file
+caper/templates/       # Django templates (Mezzanine host-themes loader)
+caper/schema/          # schema.json for validating MongoDB project documents
+```
+
+---
+
+## Data Model (MongoDB)
+
+Projects live in the `projects` collection. Notable fields:
+- `private`: `"private"` | `"public"` | `"hidden_public"` (use `utils.normalize_visibility_field()` when reading legacy boolean values)
+- `current: True` — only the latest version of a renamed/updated project
+- `previous_versions` — list of prior project `_id`s (version chain)
+- `delete: False` — soft-delete flag
+- `runs` — dict of run-name → list of sample dicts
+- `project_members` — comma-separated usernames/emails controlling access
+
+Files (tarballs from AmpliconSuiteAggregator) are stored in **GridFS** and referenced by ObjectId within the project document. Use `tar_utils.extract_from_project_tarfile()` to stream-extract specific paths without writing the full tar to disk.
+
+---
+
+## Developer Workflows
+
+### Local dev server
+```bash
+source caper/config.sh && cd caper && python manage.py runserver
+# visit http://localhost:8000
+```
+
+### Docker dev (simplest for new setup)
+```bash
+mkdir -p logs tmp .aws .git
+docker compose -f docker-compose-dev.yml build --no-cache
+docker compose -f docker-compose-dev.yml up -d
+# visit http://localhost:8000
+docker compose -f docker-compose-dev.yml down
+```
+
+### Create a project from CLI
+```bash
+source caper/config.sh && cd caper && \
+  python manage.py create_project <project_name> <username> <path_or_url.tar.gz> \
+    --visibility public --description "My project"
+```
+Accepts local paths, HTTP URLs, or `s3://` URIs.
+
+### Purge local MongoDB data
+```bash
+python purge-local-db.py
+```
+
+### Do NOT commit
+- `caper/caper.sqlite3`
+- `caper/config.sh` / `.env`
+
+---
+
+## Auth & Social Login
+
+- Uses `django-allauth` with **Google** and **Globus** OAuth2 providers.
+- `CustomAccountAdapter` and `SocialAccountAdapter` (in `utils.py`) prevent username/email cross-collisions and respect the `registration_disabled` flag stored in MongoDB `system_settings`.
+- `ACCOUNT_EMAIL_VERIFICATION = 'none'` — email verification is off.
+
+---
+
+## Mezzanine CMS Integration
+
+Mezzanine provides the CMS page tree, admin UI (Grappelli), and URL catch-all. **Add all custom URL patterns above** the `path("", include("mezzanine.urls"))` line in `urls.py` — Mezzanine's catch-all will shadow anything placed after it.
+
+---
+
+## Test-Driven Development
+
+For bug fixes and new features, **start by writing a failing test** before touching production code. This keeps changes focused and verifiable.
+
+### Workflow
+
+1. **Write a failing test** that reproduces the bug or exercises the new behaviour.
+2. Confirm the test fails for the right reason.
+3. Implement the minimal code change to make the test pass.
+4. Verify no existing tests regressed.
+
+### Running tests
+
+Tests live in `tests/`. There are two suites:
+
+```bash
+# Fast suite (mocked DB — no live MongoDB required)
+source caper/config.sh && cd caper && python -m pytest ../tests/ -m "not slow" -v
+
+# Slow suite (requires live MongoDB — the default for new tests)
+source caper/config.sh && cd caper && python -m pytest ../tests/ -m slow -v
+
+# Full suite
+source caper/config.sh && cd caper && python -m pytest ../tests/ -v
+```
+
+### Test datasets
+
+Additional test data files (tarballs, sample inputs, etc.) are available on Google Drive:
+https://drive.google.com/drive/folders/1lp6NUPWg1C-72CQQeywucwX0swnBFDvu?usp=drive_link
+
+New tests go in the **slow suite** by default (mark with `@pytest.mark.slow`). Only move a test to the fast suite if it genuinely requires no database access and can be fully covered by mocks.
+
+```python
+import pytest
+
+@pytest.mark.slow
+def test_my_feature(client, live_mongo):
+    # arrange → act → assert
+    ...
+```
+
+---
+
+## PR Checklist
+
+- Never include `caper.sqlite3` in commits or PRs.
+- Minimum manual smoke-test: home page, CCLE project page, any CCLE sample page.
+- Versioned releases use tag pattern `v<major>.<minor>.<patch>_<MMDDYY>` (e.g., `v1.0.1_072523`).
+
diff --git a/README.md b/README.md
@@ -340,7 +340,14 @@ docker inspect -f \
 
 # Running the automated tests <a name="tests"></a>
 
-The test suite exercises project creation and editing end-to-end against a live MongoDB instance.
+The test suite is organized into several tiers by speed and prerequisites, all driven by `pytest` from the repository root.
+
+| Marker | Description | Prerequisites |
+|--------|-------------|---------------|
+| `integration` | Unit-style tests against live MongoDB; fast (< 5 s each) | MongoDB running |
+| `slow` | Full aggregation pipeline per test (several minutes each) | MongoDB + AmpliconSuiteAggregator |
+| `functional` | End-to-end view tests that depend on pre-loaded datasets | Everything above |
+| `browser` | Playwright browser tests; require a running dev server | Everything above + dev server |
 
 ## Prerequisites
 
@@ -355,7 +362,9 @@ The same environment you use to run the development server is required:
 
 2. **Install the test runner** (one-time, if not already present):
    ```bash
-   pip install pytest pytest-django
+   pip install pytest
+   # For browser tests only:
+   pip install pytest-playwright && playwright install chromium
    ```
 
 3. **MongoDB running locally** — the tests write to the `caper-dev` database on `localhost:27017`:
@@ -367,30 +376,70 @@ The same environment you use to run the development server is required:
    ```bash
    source caper/config.sh
    ```
-   The tests read `caper/config.env` automatically, but `config.sh` must have been sourced at least once in the current shell session so that any shell-level exports your environment depends on are present.
+   The tests read `caper/config.env` automatically, but `config.sh` must have been sourced
+   at least once in the current shell so that shell-level exports are present.
+
+5. **AmpliconSuiteAggregator available** (for `slow` and `functional` tests only) — confirm
+   `AGGREGATOR_DEV_PATH` in `config.sh` points to a valid aggregator installation.
+   Tests that use sample-name remapping require v6 or later.
 
-5. **AmpliconSuiteAggregator available** — confirm `AGGREGATOR_DEV_PATH` in `config.sh` points to a valid aggregator installation. Tests that use sample-name remapping require v6 or later.
+6. **Test datasets present** — place the following files in `test_data/`:
+   - `one_amprepo_sample.tar.gz` + `one_amprepo_sample.xlsx` — 1 sample, hg19 (tracked in git)
+   - `Contino_unagg_040423.tar.gz` — 9 samples, hg38 (download from the shared Google Drive)
+   - `two_hg38_samples_no_ecdna.tar.gz` — 2 hg38 samples, no ecDNA (Google Drive)
+
+   The [testing datasets](https://drive.google.com/drive/folders/1lp6NUPWg1C-72CQQeywucwX0swnBFDvu?usp=share_link)
+   are available on Google Drive.
 
 ## Running the tests
 
-From the top-level `caper/` directory (where `pytest.ini` lives):
+All commands are run from the **repository root** (where `pytest.ini` lives).
+
+**Fast integration tests only** (no aggregation; safe to run any time):
+```bash
+pytest -m "integration and not slow and not functional and not browser" -v
+```
 
+**Full integration + functional tests** (requires AmpliconSuiteAggregator; ~10 min):
+```bash
+pytest -m "integration and not browser" -v
+```
+
+**End-to-end project lifecycle** (slow, creates and aggregates real projects):
 ```bash
 pytest -m "slow and integration" -v
 ```
 
-Expected output with a correctly configured environment:
+**Browser tests** (requires a running dev server on port 8000):
+```bash
+# Terminal 1: start the dev server
+cd caper && python manage.py runserver
+
+# Terminal 2: run browser tests
+pytest -m browser --base-url http://localhost:8000 -v
+```
 
+Expected output for fast integration tests with a correctly configured environment:
 ```
-tests/test_create_edit_project.py::test_create_tar_only                   PASSED
-tests/test_create_edit_project.py::test_create_tar_and_metadata_no_remap  PASSED
-tests/test_create_edit_project.py::test_create_tar_and_metadata_with_remap PASSED
-tests/test_create_edit_project.py::test_create_then_edit_with_remap       PASSED
+tests/test_api.py::test_background_task_status_returns_200         PASSED
+tests/test_error_handling.py::test_create_project_without_file     PASSED
+tests/test_error_handling.py::test_project_page_nonexistent_id     PASSED
+tests/test_error_handling.py::test_download_nonexistent_project    PASSED
 ```
 
+## Continuous Integration
+
+Pushes and pull requests to `main` automatically run the fast integration tests via
+GitHub Actions (`.github/workflows/tests.yml`).  The workflow spins up a MongoDB 6
+service container and runs `pytest -m "integration and not slow and not functional and not browser"`.
+
+Slow, functional, and browser tests are excluded from CI because they require
+AmpliconSuiteAggregator, large test datasets, and a running dev server.
+
 ## Cleanup
 
-Each test removes all artifacts it created — MongoDB documents, `tmp/` directories, and S3 objects — in a `finally` block, so cleanup happens even when a test fails.
+Each test removes all artifacts it created — MongoDB documents, `tmp/` directories, and S3
+objects — in a `finally` block, so cleanup happens even when a test fails.
 
 ---