Bathymetry-first shipwreck discovery and triage pipeline:
- builds training tiles from NOAA BAG surveys,
- trains deep models for wreck-presence detection,
- runs large-area streaming discovery with aggressive download parallelism,
- persists only positive detections plus geospatial metadata,
- and provides a reviewer webapp for rapid analyst validation.
This project is grounded in the workflow described in:
Chartrand et al. (2021), Archaeologic Machine Learning for Shipwreck Detection Using Lidar and Sonar
Remote Sensing, 13(9), 1759.
Paper PDF in this repo: remotesensing-13-01759-v2.pdf
Key idea we adopt: transform bathymetry into model-consumable imagery, then classify candidate targets with human-in-the-loop review.
The codebase has four major parts:
-
Data ingestion + tiling
- NOAA survey discovery and BAG URL resolution
- BAG/TIF raster loading and tiling
- composite tile generation for model input
-
Modeling
- YOLOv8 baseline detector
- heatmap U-Net point model
- ViT large-scale binary classifier for tile-level wreck presence
-
Streaming discovery at scale
- concurrent raster download workers
- bounded producer/consumer queue
- single inference consumer (stable MPS/CUDA behavior)
- positive-only output persistence
-
Review webapp
- browse detections
- inspect lat/lon, confidence, tile scale
- nearest AWOIS cross-reference in modal
Prereqs:
- Python 3.11+
- GDAL/raster stack capable of reading
.bag(orgdal_translatefor BAG->TIF fallback)
cd /Users/warren/workspace/ship-search
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -e .# Download demo data (NOAA BAG + wreck points)
ship-search download --out data/demo
# Prepare YOLO-style dataset
ship-search prepare --demo data/demo --out data/yolo_h13140
# Train
ship-search train --dataset data/yolo_h13140/dataset.yaml --epochs 50 --imgsz 416
# Predict over raster
ship-search predict \
--weights runs/detect/train/weights/best.pt \
--raster data/demo/H13140_MB_VR_MLLW.bag \
--out artifacts/preds# Example train
ship-search vit-cls train \
--dataset data/datasets/scale10x_10k/dataset.yaml \
--model vit_giant_patch14_clip_224 \
--project runs/vit_cls \
--name vitg14_10k_balanced \
--device mps \
--balanced-sampling \
--auto-pos-weightFlorida:
ship-search discover florida \
--weights runs/vit_cls/vitg14_10k_balanced_h200/weights/best.pt \
--out artifacts/discovery_florida_mps_085 \
--conf-thresh 0.85 \
--download-workers 90 \
--device mpsBahamas + Turks and Caicos:
ship-search discover bahamas-tci \
--weights runs/vit_cls/vitg14_10k_balanced_h200/weights/best.pt \
--out artifacts/discovery_bahamas_tci_mps_085 \
--conf-thresh 0.85 \
--download-workers 90 \
--device mpsImportant discovery behavior:
- default deletes stale
out/rasters/*at startup (resumes cleanly), - skips previously seen rasters in the same output root,
- deletes downloaded rasters after processing unless
--keep-rastersis set.
Under --out, discovery writes:
-
tiles/
Positive detection PNGs only. -
meta/detections.jsonl
One row per positive tile with:- tile identity/context (
tile_id,survey_id,raster_name) - geospatial fields (
center_lat,center_lon,bbox_lonlat) - model confidence (
score) - tile geometry (
tile_row,tile_col) - source metadata (
source_resolution_m) - tile bathymetry stats (
tile_depth_min_m,tile_depth_mean_m,tile_depth_max_m)
- tile identity/context (
-
meta/detections.geojsonandmeta/summary.json -
meta/processed_rasters.jsonl
Resume ledger used to avoid reprocessing imagery already seen in prior runs.
For analyst triage, detections can be joined to nearest known records from:
/Users/warren/Downloads/Wrecks_and_Obstructions_in_AWOIS.geojson
Current enriched files:
meta/detections_nearest_awois.jsonlmeta/detections_nearest_awois.csv
These include:
- nearest AWOIS point lat/lon
distance_m- key AWOIS fields (
OBJECTID,record,depth,yearSunk, etc.)
Serve discovery outputs directly:
ship-search viewer serve \
--data-root artifacts/discovery_florida_mps_085 \
--host 127.0.0.1 \
--port 8000In modal view, the app shows:
- detection lat/lon
- tile width/height in meters
- nearest AWOIS match distance and record/id
Copied into a commit-friendly docs path:
docs/assets/positive_ids/H11896_H11896_MB_2m_MLLW_8of8_r00007_c00006.png
Reference row IDs:
tile_id:H11896_H11896_MB_2m_MLLW_8of8_r00007_c00006- detection source:
artifacts/discovery_florida_mps_085/meta/detections.jsonl - nearest AWOIS source:
artifacts/discovery_florida_mps_085/meta/detections_nearest_awois.jsonl
- Research/prototyping only (not for navigation).
- BAG support depends on GDAL build options.
- If BAG open fails in your environment, convert with
gdal_translateand run on GeoTIFF.
