description

End-to-end caribou demo for MegaDetector-Overhead: download the Zenodo OWL-C weights and test patches, run OWL-C inference on GPU or CPU, and visualize the predictions.

Caribou Demo (download → infer → visualize)

This walkthrough takes you from a fresh clone to visualized OWL-C predictions on real caribou aerial patches. It uses the public Caribou Aerial Survey Dataset on Zenodo (weights + test patches), runs the same evaluation stack as tools/test.py, and renders the detections onto the patches as PNGs.

The demo auto-detects your hardware: it runs on a CUDA GPU when one is available and otherwise falls back to CPU. It makes no assumption that you have a GPU.

!!! note "About the weights" The Zenodo release labels the checkpoint "HerdNet (DLA-34)". In this repo the same DLA-34 detection branch is registered as OWL-C, so the demo loads it under model.name: OWLC. They are the same network.

Prerequisites

Install the environment with uv (see Installation):

uv sync
uv run python -c "import animaloc.models, dinov3; print('OK')"

You also need curl and unzip on your PATH (both are standard on Linux/macOS).

One command

./tools/demo_caribou.sh

This will:

Download weights.zip (216 MB) and test.zip (1.2 GB) from Zenodo into demo_data/ (skipped if already present).
Verify the weights' SHA-256 against the published checksum.
Build a deterministic 50-patch subset (40 annotated + 10 background).
Auto-detect the device (GPU if available, else CPU).
Run OWL-C inference (tools/test.py) with Weights & Biases disabled.
Render predictions onto every patch with tools/visualize_detections.py.

Outputs:

Path	Contents
`demo_data/run/metrics_results.csv`	F1 / precision / recall / MAE / RMSE
`demo_data/run/detections.csv`	One row per detection (`images, x, y, dscores, …`)
`demo_data/viz/*.png`	Patches with green = ground truth, red = predictions

Options

./tools/demo_caribou.sh --device cpu        # force CPU
./tools/demo_caribou.sh --device cuda        # force GPU
./tools/demo_caribou.sh --full               # run the full 2,607-patch test set
./tools/demo_caribou.sh --subset-size 100    # larger subset
./tools/demo_caribou.sh --score-threshold 0.3

Expected results

On the default 50-patch subset (229 ground-truth points) you should see numbers close to:

recall ≈ 0.98   precision ≈ 0.89   f1 ≈ 0.93

These match the per-patch validation regime reported for the checkpoint (val F1 = 0.937). The full test set reproduces the paper headline (F1 = 0.965 at τ = 20 px); see Datasets. GPU and CPU produce identical detections — only the speed differs (on a Tesla V100 the subset runs ~25× faster than CPU).

Manual walkthrough

If you prefer to run the steps yourself:

# 1. Download + extract
mkdir -p demo_data/weights demo_data/test
curl -fL -o demo_data/weights.zip \
    "https://zenodo.org/api/records/20767534/files/weights.zip/content"
curl -fL -o demo_data/test.zip \
    "https://zenodo.org/api/records/20767534/files/test.zip/content"
unzip -q demo_data/weights.zip -d demo_data/weights
unzip -q demo_data/test.zip   -d demo_data/test

# 2. Run OWL-C eval (CPU shown; use ++test.device_name=cuda for GPU)
export OWL_DEMO_DATA="$(pwd)/demo_data"
WANDB_MODE=disabled uv run python tools/test.py test=owlc_caribou_demo \
    ++test.device_name=cpu \
    ++test.model.pth_file="$OWL_DEMO_DATA/weights/best_model.pth" \
    ++test.dataset.root_dir="$OWL_DEMO_DATA/test" \
    ++test.dataset.csv_file="$OWL_DEMO_DATA/test/gt.csv" \
    ++hydra.run.dir="$OWL_DEMO_DATA/run"

# 3. Visualize predictions onto the patches
#    (predictions are saved in the model's down-sampled space; OWL-C uses
#     down_ratio=2, so pass --pred-scale 2 to map them onto the patch)
uv run python tools/visualize_detections.py \
    --detections "$OWL_DEMO_DATA/run/detections.csv" \
    --images-dir "$OWL_DEMO_DATA/test" \
    --output-dir "$OWL_DEMO_DATA/viz" \
    --gt "$OWL_DEMO_DATA/test/gt.csv" \
    --score-threshold 0.2 --pred-scale 2 --all-images

The portable demo config lives at configs/test/owlc_caribou_demo.yaml — unlike the author-specific eval configs, it hardcodes no machine paths (they come from OWL_DEMO_DATA or ++ overrides) and defaults to CPU.

Evaluation operating point

The demo config (configs/test/owlc_caribou_demo.yaml) evaluates with:

Match radius τ = 20 image px. evaluator.threshold: 10 is measured on the half-resolution heatmap (down_ratio: 2, stitcher up: False); ground truth is down-sampled by the same factor, so 10 heatmap px = 20 original px.
Confidence (peak selection) adapt_ts: 0.3 (LMDS), with neg_ts: 0.1 and a (3, 3) peak kernel.

This mirrors the per-patch validation regime (val F1 ≈ 0.937). The paper's headline F1 = 0.965 is reported at a slightly different operating point (c* = 0.20); see Datasets.

!!! note "Detection coordinate space" With up: False, tools/test.py writes detections.csv in the model's down-sampled space (x, y in 0…255 for a 512-px patch at down_ratio=2). Ground truth in gt.csv is in original 512-px space. The visualizer's --pred-scale 2 rescales predictions so the two overlay correctly.

Visualizing detections on your own runs

tools/visualize_detections.py works with any detections.csv produced by tools/test.py:

uv run python tools/visualize_detections.py \
    --detections path/to/detections.csv \
    --images-dir path/to/patches \
    --output-dir path/to/viz \
    --pred-scale 2 \
    [--gt path/to/gt.csv] [--score-threshold 0.2] [--all-images]

Predicted points are drawn in red; if --gt is given, ground-truth points are drawn in green. Each patch is captioned with its predicted (and GT) point count. Pass --pred-scale equal to the model's down_ratio (2 for OWL-C) so the down-sampled predictions land on the full-resolution patch; ground truth is never scaled.

Troubleshooting

Symptom	Cause / Fix
`wandb: ERROR ...` or a login prompt	The demo sets `WANDB_MODE=disabled`. Running `tools/test.py` by hand requires `WANDB_MODE=disabled` (or `wandb login`).
`CUDA: False` even though `nvidia-smi` shows a GPU	A plain `uv sync` installs the CPU build. Install a GPU build with `uv pip install torch torchvision --torch-backend=auto` (see Installation → GPU support).
`RuntimeError: ... unable to find an engine` on an older GPU	Some newer wheels omit kernels for older architectures (e.g. Volta / V100). Use `uv pip install torch torchvision --torch-backend=cu124`, which includes them.
Red prediction dots look shifted toward the top-left / "smaller"	Predictions are in the model's down-sampled space — pass `--pred-scale 2` (the OWL-C `down_ratio`) to the visualizer.
`ImportError: libGL.so.1` / `libgthread-2.0.so.0`	Image libs need system glib/GL. The project pins `opencv-python-headless`; re-run `uv sync` if it was replaced.
Checksum mismatch on weights	A corrupted/partial download. Delete `demo_data/weights/` and re-run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caribou Demo (download → infer → visualize)

Prerequisites

One command

Options

Expected results

Manual walkthrough

Evaluation operating point

Visualizing detections on your own runs

Troubleshooting

See also

FilesExpand file tree

demo.md

Latest commit

History

demo.md

File metadata and controls

Caribou Demo (download → infer → visualize)

Prerequisites

One command

Options

Expected results

Manual walkthrough

Evaluation operating point

Visualizing detections on your own runs

Troubleshooting

See also