GEOS-ESM · ftgoktas · Mar 3, 2026 · Mar 4, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/.gitignore b/.gitignore
@@ -145,7 +145,4 @@ r2d2_credentials.yaml
 # Miscellaneous
 GEOS_mksi/
 jedi_bundle/
-output/
-
-
-*.md
+output/
diff --git a/docs/_sidebar.md b/docs/_sidebar.md
@@ -23,11 +23,13 @@
     -  [3DVAR](examples/soca/3dvar.md)
     -  [3DFGAT_cycle]((examples/soca/3dfgat_cycle.md))
   - **R2D2 - Storing Data**
+       - [Understanding R2D2](examples/r2d2_intro.md)
        - [Storing Observations to R2D2](examples/ingest_obs.md)
 
 - Configuration files in swell
 
   - [Observation configuration](configs/observation_configuration.md)
+  - [R2D2 v3 credentials](configs/r2d2_v3_credentials.md)
   - [SLURM configuration](configs/slurm_configuration.md)
   - Model configuration:
     - [CICE6](configs/model_configurations/cice6.md)

diff --git a/docs/creating_an_experiment.md b/docs/creating_an_experiment.md
@@ -4,12 +4,16 @@ Once you have installed `swell` and configured `cylc` you should be able to crea
 
 A useful command when using swell is `swell --help`. This will take you through all the options within swell. The help traverses through the applications so you can similarly issue `swell create --help`
 
+- Make sure you've configured `~/.swell/r2d2_credentials.yaml` as described in [R2D2 v3 credentials](configs/r2d2_v3_credentials.md).
+
 The first step is to create an experiment which is done with
 
 ```bash
 swell create <suite> <options>
 ```
 
+**During `swell create`**: Credentials are loaded, and the experiment are registered in R2D2 automatically. The experiment ID is stored in `experiment.yaml` and used by STORE operations such as SaveRestart and SaveObsDiags.
+
 This will create a directory with your experiment ID in the experiment root.
 
 - If you specify no options the resulting experiment will be configured the way that suite is run in the tier 1 testing.

diff --git a/docs/examples/r2d2_intro.md b/docs/examples/r2d2_intro.md
@@ -0,0 +1,289 @@
+# R2D2: Research Repository for Data and Diagnostics
+
+## Table of Contents
+
+1. [What is R2D2?](#what-is-r2d2)
+3. [How R2D2 Works](#how-r2d2-works)
+4. [R2D2 Concepts](#r2d2-concepts)
+5. [How Swell Uses R2D2](#how-swell-uses-r2d2)
+6. [Store & Fetch Quick Reference](#store--fetch-quick-reference)
+7. [Storing Observations to R2D2](examples/ingest_obs.md)
+
+---
+
+## What is R2D2?
+
+**R2D2** is a metadata + storage system for scientific data: it keeps a **MySQL database** of what files exist and where they live, while the **actual files** go in S3 or local storage. When you `fetch` or `store`, you talk to the R2D2 API for metadata; file transfers go **directly** to/from storage. Swell uses R2D2 to fetch observations, store backgrounds, and manage experiment data.
+
+Think of R2D2 as a **central database for scientific data** that:
+- Knows exactly where every file is stored
+- Tracks what type of data each file contains (observations, forecasts, analyses, etc.)
+- Remembers when data was created and by whom
+- Can quickly retrieve the right file when you need it
+
+**Swell + R2D2**: When you run a Swell experiment, it uses R2D2 to fetch observations, store/retrieve background and analysis files, and manage experiment metadata.
+
+---
+
+### Why R2D2
+
+R2D2 serves as the centralized source for managing and accessing scientific data:
+
+With R2D2 you can:
+- Retrieve specific files easily:
+  - 
+    ```python
+    r2d2.fetch(
+        item='observation',
+        provider='nasa',
+        observation_type='airs',
+        window_start='20240103T120000Z',
+        window_length='PT6H',
+        target_file='obs.nc4'
+    )
+    ```
+- Store new data and make it accessible:
+  - 
+    ```python
+    r2d2.store(
+        item='analysis',
+        model='geos',
+        experiment='my_exp',
+        file_extension='nc4',
+        date='20240103T120000Z',
+        source_file='./an.nc'
+    )
+    ```
+- Automatically track data versions and timestamps
+- Share data securely with authorized users across locations
+- Prevent duplicate storage
+
+---
+
+## How R2D2 Works
+
+### Architecture Example:
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                     R2D2 Server (metadata only)                             │
+│                                                                             │
+│   ┌─────────────────────────────────────────────────────────────────────┐   │
+│   │   R2D2 API                 │          MySQL / Database              │   │
+│   │   (HTTP)                   │          (what exists)                 │   │
+│   └─────────────────────────────────────────────────────────────────────┘   │
+│              │                                                              │
+│              │  Answers: "What files match? Where are they stored?"         │
+└──────────────┼──────────────────────────────────────────────────────────────┘
+               │
+               │  Client does NOT send files through the server.
+               │  Client talks to server for metadata, then transfers
+               │  files directly to/from storage (S3, local, etc.).
+               │
+       ┌───────┴────────────────────────────────────────────────────┐
+       │                    Compute client                          │
+       │                   (HPC/Discover, cloud etc.)               │
+       │                                                            │
+       │   import r2d2                                              │
+       │   r2d2.fetch(item='observation', provider='nasa', ...)     │
+       │   r2d2.store(item='observation', source_file='obs.nc', ...)│
+       └────────────────────────────────────────────────────────────┘
+               │                                 ▲
+               │  Fetch: get metadata and        │  Direct transfer
+               │  download from storage          │  to/from storage
+               ▼                                 │
+       ┌─────────────────────────────────────────────────────────────┐
+       │              Data storage (S3, local disk, etc.)            │
+       │   observation/  forecast/  analysis/  bias_correction/  ... │
+       └─────────────────────────────────────────────────────────────┘
+```
+
+
+1. **R2D2 Server**: Only handles metadata queries
+   - "What observations exist for this window?"
+   - "Where is this file stored?" (returns S3 path or local path)
+
+2. **S3 / local storage**: Stores the actual data files
+   - File transfers go **directly** between your client and S3; *not through the R2D2 server*
+
+Even with a small EC2 instance, R2D2 can serve metadata for terabytes of data. The server doesn't proxy file I/O.
+
+---
+
+## R2D2 Concepts
+
+### Data Hub
+A **Data Hub** is a storage platform or cloud region where data can be stored.
+
+| Property | Description | Example Values |
+|----------|-------------|----------------|
+| `name` | Unique identifier | `aws-us-east-1`, `discover-local`, `azure-eastus` |
+| `platform` | Storage platform type | `aws`, `local`, `azure`, `gcloud` |
+| `region` | Geographic region | `us-east-1`, `us-west-2` |
+
+**Why it exists**: You may access data from different cloud providers or on-premise storage. A data hub tells R2D2 which storage system to use.
+
+### Data Store
+A **Data Store** is our data repository, think of it like a specific storage location (like an S3 bucket or file system path) within a Data Hub. 
+
+| Property | Description | Example Values |
+|----------|-------------|----------------|
+| `name` | Unique identifier (often the bucket name) | `r2d2-experiments-prod-us-east-1` |
+| `data_hub` | Which Data Hub this belongs to | `aws-us-east-1` |
+| `data_store_type` | Category of data | `experiments`, `archive`, `skylab` |
+| `basedir` | Base directory path | `/data/r2d2/` or empty for S3 root |
+| `read_only` | Whether writes are allowed | `true` or `false` |
+
+
+### Compute Host
+A **Compute Host** is our compute environment, it represents a computing environment where scientists run their code.
+
+| Property | Description | Example Values |
+|----------|-------------|----------------|
+| `name` | Unique identifier | `discover-intel`, `localhost-gnu`, `aws-graviton-gnu` |
+| `hostname` | Machine identifier | `discover`, `localhost`, `ip179-99-99-99` |
+| `compiler` | Compiler used to build software | `intel`, `gnu`, `nvhpc` |
+
+
+### How They Connect
+
+```
+                    ┌───────────────────┐
+                    │   Compute Host    │
+                    │  (discover-intel) │
+                    └───────────────────┘
+                             │
+                             │ "Where should I store/fetch data?"
+                             │
+                             ▼
+           ┌─────────────────────────────────┐
+           │     compute_host_register       │
+           │  (links hosts to data hubs)     │
+           │                                 │
+           │  discover-intel → aws-us-east-1 │
+           │  localhost-gnu → aws-us-east-1  │
+           └─────────────────┬───────────────┘
+                             │
+                             ▼
+                    ┌─────────────────┐
+                    │    Data Hub     │
+                    │  (aws-us-east-1)│
+                    └────────┬────────┘
+                             │
+                             │ "Which bucket within this hub?"
+                             │
+                             ▼
+                    ┌─────────────────┐
+                    │   Data Store    │
+                    │  (r2d2-bucket)  │
+                    └─────────────────┘
+```
+
+---
+
+## How Swell Uses R2D2
+
+When you run a Swell experiment, R2D2 is used behind the scenes in several tasks:
+
+| Swell Task | What it does with R2D2 |
+|------------|------------------------|
+| **Get Observations** | Fetches observation files from R2D2 by `provider`, `observation_type`, `window_start`, `window_length`; falls back to empty observations if not found |
+| **Store Background** | Stores forecast/background files so they can be reused by later cycles |
+| **Get Background** | Fetches background files for the current cycle from R2D2 |
+| **Ingest Obs** | Ingest suite that stores newly processed observations into R2D2 |
+| **Save Obs Diags** | Stores feedback/diagnostic files (`item='feedback'`) |
+| **Save Restart** | Stores forecast and analysis restart files for model components |
+
+> **Note**: R2D2 adaptation in Swell is under active development. Task behavior and configuration may change as implementation continues.
+
+---
+
+## Store & Fetch Quick Reference
+
+### Observation (shared input data — no experiment)
+
+```python
+# Fetch
+r2d2.fetch(item='observation', 
+           provider='ncdiag', 
+           observation_type='airs',
+           file_extension='nc4', 
+           window_start='20240103T120000Z', 
+           window_length='PT6H',
+           target_file='obs.nc4')
+
+# Store
+r2d2.store(item='observation', 
+           provider='ncdiag', 
+           observation_type='airs',
+           file_extension='nc4', 
+           window_start='20240103T120000Z', 
+           window_length='PT6H',
+           source_file='./obs.nc4')
+```
+
+**Required:** `provider`, `observation_type`, `file_extension`, `window_start`, `window_length`
+
+---
+
+### Analysis & forecast/background (experiment-specific)
+
+**Required:** `model`, `experiment`, `file_extension`, `date`. For forecast also: `resolution`, `step`.
+
+```python
+# Fetch analysis
+r2d2.fetch(item='analysis', 
+           model='geos', 
+           experiment='my_exp', 
+           file_extension='nc4',
+           date='20240103T120000Z', 
+           target_file='an.nc4')
+
+# Fetch forecast (background)
+r2d2.fetch(item='forecast', 
+           model='geos', 
+           experiment='my_exp', 
+           file_extension='nc4',
+           resolution='c90', 
+           step='PT6H', 
+           date='20240103T120000Z', 
+           target_file='bkg.nc4')
+
+# Store analysis
+r2d2.store(item='analysis', 
+           model='geos', 
+           experiment='my_exp', 
+           file_extension='nc4',
+           date='20240103T120000Z', 
+           source_file='./an.nc4')
+
+# Store forecast
+r2d2.store(item='forecast', 
+           model='geos', 
+           experiment='my_exp', 
+           file_extension='nc4',
+           resolution='c90', 
+           step='PT6H', 
+           date='20240103T120000Z', 
+           source_file='./bkg.nc4')
+```
+
+**Note:** `experiment` must be registered in R2D2 first.
+
+---
+
+### Bias correction (experiment-specific)
+
+**Required:** `model`, `experiment`, `provider`, `observation_type`, `file_extension`, `file_type`, `date`
+
+```python
+r2d2.fetch(item='bias_correction', 
+           model='geos', 
+           experiment='my_exp', 
+           provider='gsi',
+           observation_type='airs', 
+           file_extension='satbias', 
+           file_type='satbias',
+           date='20240103T120000Z', 
+           target_file='satbias.nc')
+```
diff --git a/docs/installing_swell.md b/docs/installing_swell.md
@@ -32,3 +32,6 @@ pip install --prefix=/path/to/install/swell/ .
 To make the software usable ensure `/path/to/install/swell/bin` is in the `$PATH`. Also ensure that `/path/to/install/swell/lib/python<version>/site-packages` is in the `$PYTHONPATH`, where `<version>` denotes the version of Python used for the install, e.g. `3.9`.
 
 Swell makes use of additional packages which are located in shared directories on Discover, such as under `/discover/nobackup/projects/gmao`. When installed correctly, many of these libraries should be visible in the `$PYTHONPATH`.
+
+
+Configure `~/.swell/r2d2_credentials.yaml` as described in [R2D2 v3 credentials](configs/r2d2_v3_credentials.md).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,3 +32,6 @@ pip install --prefix=/path/to/install/swell/ .
		To make the software usable ensure `/path/to/install/swell/bin` is in the `$PATH`. Also ensure that `/path/to/install/swell/lib/python<version>/site-packages` is in the `$PYTHONPATH`, where `<version>` denotes the version of Python used for the install, e.g. `3.9`.

		Swell makes use of additional packages which are located in shared directories on Discover, such as under `/discover/nobackup/projects/gmao`. When installed correctly, many of these libraries should be visible in the `$PYTHONPATH`.


		Configure `~/.swell/r2d2_credentials.yaml` as described in [R2D2 v3 credentials](configs/r2d2_v3_credentials.md).