Species Distribution Modeling for Asian Elephants in Bhutan
This project implements a governance-compliant Species Distribution Modeling (SDM) pipeline for Asian elephants (Elephas maximus) in Bhutan under CMIP6 climate change scenarios.
This GitHub repository is the code and metadata layer of the project. Large raw data, generated run directories, logs, and working manuscript files are kept out of version control and should be published separately only after reviewing size, licensing, and species-location sensitivity constraints.
- 14-Phase Analytics Pipeline orchestrated by
run_pipeline.R - 4-Algorithm Ensemble: GLM, Random Forest, BRT, MaxEnt (AUC-weighted)
- CMIP6 Future Projections: 8 GCMs x 4 SSPs x 3 time periods = 96 scenarios
- Uncertainty Quantification: GCM SD, algorithm SD, combined, agreement maps
- Spatial Cross-Validation: 15 km block CV with 5 folds
- MESS Extrapolation Diagnostics for all future scenarios
- Evaluation Metrics: AUC, TSS, Boyce continuous index, Brier score, calibration slope, Moran's I
- Conservation Overlays: PA analysis, conflict zones, proximity to infrastructure
- ODMAP-Compliant Documentation
- Docker + CI/CD (GitHub Actions)
- CRS: EPSG:32645 (WGS 84 / UTM Zone 45N)
- Occurrence Data: 1,089 station-level presence-absence records (252 presences, 837 absences)
- Background Points: ~10,000 bias-corrected
- Predictors: 19 bioclimatic + DEM + HFI + NDVI + EVI + distance layers (25 total)
Elephas_maximus_SDM_Project_v4/
├── 00_governance/ # Governance documents + config.yaml
├── 00_registry/ # Data/model/run registries (CSV)
├── 01_data_raw/ # Input data (immutable)
├── 03_analysis/ # R pipeline scripts
│ ├── run_pipeline.R # CANONICAL RUNNER (entry point)
│ ├── 00_sdm_helpers.R # Core metric functions
│ ├── 00_spatial_alignment.R # Raster alignment utilities
│ ├── 00_contract_helpers.R # Path/validation utilities
│ ├── 00_logging.R # Structured logging
│ ├── 00_seed_propagation.R # Centralized seed management
│ ├── 00_data_validation.R # Pre-flight validation
│ ├── 01_data_ingest_harmonize.R
│ ├── 02_accessible_area_M.R
│ ├── 03_occurrence_processing.R
│ ├── 04_predictor_engine.R
│ ├── 05_background_bias.R
│ ├── 06_spatial_cv.R
│ ├── 07_model_training.R
│ ├── 08_model_evaluation.R
│ ├── 09_future_projections.R
│ ├── 10_change_metrics.R
│ ├── 10_figures.R
│ ├── 11_uncertainty.R
│ ├── 12_conservation_overlays.R
│ └── 13_synthesis_reporting.R
├── 04_outputs/runs/ # Run outputs (RUN_<timestamp>_<hash>/)
├── 06_logs/ # pipeline.log, errors.log, warnings.log
├── tests/testthat/ # Unit + integration tests
├── Dockerfile
├── docker-compose.yml
└── Makefile
- R >= 4.3 with packages:
terra,sf,yaml,ggplot2,viridis,ranger,maxnet,gbm,rlang - On Windows, run via PowerShell (not MSYS2 bash) to avoid
ggsave()segfaults
# From R console:
source("03_analysis/setup.R")# Windows PowerShell (recommended):
powershell.exe -Command "& 'C:\Program Files\R\R-4.4.0\bin\Rscript.exe' 03_analysis/run_pipeline.R 00_governance/config.yaml"
# Or via the convenience script:
.\rerun.ps1# Linux / macOS:
Rscript 03_analysis/run_pipeline.R 00_governance/config.yamlmake test
# or directly:
Rscript -e "testthat::test_dir('tests/testthat')"docker build -t elephas-sdm:2.1 .
docker run -v ${PWD}:/app elephas-sdm:2.1| Phase | Module | Description |
|---|---|---|
| 0 | Bootstrap + Validation | Run ID, config snapshot, pre-flight checks |
| 0b | Vector CRS Fix | Assign EPSG:32645 to shapefiles lacking valid CRS |
| 1 | Data Ingest | Harmonize rasters to unified template grid |
| 2 | Accessible Area (M) | Generate M mask from AOI polygon |
| 3 | Occurrence Processing | Clean, thin, and spatially validate presence records |
| 4 | Background Sampling | Bias-corrected background point generation |
| 5 | Spatial CV | 15 km block assignment, 5-fold spatial cross-validation |
| 6 | Predictor Engine | Distance rasters, collinearity diagnostics, predictor selection |
| 7 | Model Training | GLM + RF + BRT + MaxEnt with 5-fold CV; ensemble prediction |
| 8 | Model Evaluation | AUC, TSS, Boyce, Brier, calibration slope, Moran's I |
| 9 | Future Projections | CMIP6 multi-GCM projections with MESS diagnostics (optional) |
| 10 | Change Metrics | Delta suitability, gain/loss/persistence maps |
| 11 | Uncertainty | GCM SD, algorithm SD, combined uncertainty, agreement maps |
| 12 | Conservation Overlays | PA analysis, conflict zones, proximity indicators |
| 13 | Figures + Reports | 13 publication figures, ODMAP summary, management brief |
| Metric | FATAL Floor | Target |
|---|---|---|
| AUC-ROC (per algorithm) | < 0.65 | >= 0.70 |
| TSS (per algorithm) | < 0.30 | >= 0.40 |
| Boyce Index (per algorithm) | < 0.10 | >= 0.70 |
| Brier Score (per algorithm) | > 0.25 | < 0.20 |
| Ensemble AUC | — (aspirational) | >= 0.75 |
- Global seed: Configurable via
config.yaml(reproducibility.global_seed, default 123456) - Per-module seeds: Centralized via
00_seed_propagation.Rfor background sampling, spatial CV, model training - Package versions: Snapshotted to
package_versions.csvin each run manifest - Input hashes: MD5 checksums of all input files recorded per run
- Session info: Full
sessionInfo()captured in each run manifest - Config snapshot: config.yaml copied into run directory at start
The test suite (tests/testthat/) includes:
- test-core-functions.R: 40+ unit tests for AUC, TSS, Brier, Boyce, MESS, VIF, calibration, Moran's I, logging
- test-pipeline-integration.R: Structural integration tests (config validation, source graph, function availability, phase numbering)
- test-governance.R: Governance compliance checks
make test # Run all tests
make lint # Code linting with lintr
make check-governance # Verify governance files exist
make all # lint + test + governance check| Document | Purpose |
|---|---|
| 00_governance/config.yaml | Single source of truth for all parameters |
| 00_governance/governance.md | Constitutional governance rules |
| 00_governance/methods.md | Scientific methodology |
| 00_governance/targets.md | Performance targets and FATAL thresholds |
If you use this software, cite the archived release:
Wangdi, W., & Laxmi, S. (2026). Elephas maximus SDM Pipeline (v0.1.0). Zenodo.
https://doi.org/10.5281/zenodo.19659305
Concept DOI for all versions:
https://doi.org/10.5281/zenodo.19659304
Related methodological references:
Zurell, D., et al. (2020). A standard protocol for reporting species distribution models.
Ecography, 43(9), 1261-1277. https://doi.org/10.1111/ecog.04960
O'Neill, B.C., et al. (2016). The Scenario Model Intercomparison Project (ScenarioMIP)
for CMIP6. Geoscientific Model Development, 9, 3461-3482.
MIT License - see LICENSE file.
Last Updated: 2026-04-20 Public Release: v0.1.0