Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 284 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,289 @@
# Deployment of SPO data processing to VM
# Deployment of SPO Data Processing Pipeline

The project uses ansible to deploy a data processing pipeline for Smartphone Pressure Obsersvations (SPO) to a virtual machine (VM). The pipeline is designed to process and analyze data from the SPO project, which involves collecting pressure data from smartphones.
This project deploys a data processing pipeline for **Smartphone Pressure Observations (SPO)** to a virtual machine using **Ansible** and **Docker**.

To manually deploy the pipeline, run the following command:
The pipeline retrieves pressure observations from **Firebase**, processes them, and:

- Sends observations to **ESOH**
- Stores raw JSON data in **ECMWF Object Storage (S3-compatible)**
- Deletes processed data from **Firebase after successful storage**

The pipeline is executed inside **Docker containers** and scheduled via **cron** to run periodically.

---

# Architecture Overview

The deployed pipeline performs the following steps:

1. Fetch SPO observations from **Firebase**
2. Convert observations into **ESOH-compatible features**
3. Upload features to **ESOH**
4. Store raw observation data in **ECMWF Object Storage**
5. Delete processed data from Firebase after successful storage

If ESOH upload fails, the pipeline retries up to **5 times** before continuing.
Firebase deletion depends on **successful storage in object storage**, ensuring that no data is lost.

---

# Deployment Requirements

The deployment requires:

- An **EWC VM**
- **Docker** installed on the VM (handled by Ansible)
- **Ansible** installed locally
- **SSH access** to the VM
- **Firebase credential files**
- **Object storage credentials**
- **ESOH credentials**

---

# Repository Structure

```
ansible/
├── ewc.yml
├── inventory
└── roles/
└── spo-pipeline/
├── tasks/
│ └── main.yaml
└── vars/
├── main.yml
└── credentials.yml

spo_firebase_fetcher/
├── fetcher.py
└── firebase_config.py

Dockerfile
pyproject.toml
README.md
```

---

# Configuration

## Main configuration

`ansible/roles/spo-pipeline/vars/main.yml`

Example:

```yaml
fetcher_image: "your-registry/spo-firebase-fetcher:latest"

fetcher_apps:
- dmi
- sfs

spo_base_path: "/opt/spo"

object_store_endpoint: "https://object-store.os-api.cci1.ecmwf.int"
object_store_bucket: "spo-firebase-data"
object_store_region: "default"
```

---

## Credentials

Credentials should be stored in:

```
ansible/roles/spo-pipeline/vars/credentials.yml
```

Example:

```yaml
object_store_access_key: "YOUR_ACCESS_KEY"
object_store_secret_key: "YOUR_SECRET_KEY"

esoh_username: "YOUR_ESOH_USERNAME"
esoh_password: "YOUR_ESOH_PASSWORD"
```

For security, this file should be encrypted using **Ansible Vault**.

Example:

```bash
ansible-vault encrypt ansible/roles/spo-pipeline/vars/credentials.yml
```

---

# Firebase Credential Files

The deployment expects Firebase credential files on the VM:

```
/opt/spo/dmiapp_firebase.json
/opt/spo/sfsapp_firebase.json
```

These files are mounted inside the container as:

```
/creds/credentials.json
```

---

# Docker Image

The Docker image runs the SPO fetcher:

```
python -m spo_firebase_fetcher.fetcher
```

The container receives configuration through environment variables provided by Ansible.

---

# Deploying the Pipeline

To deploy the pipeline to the VM:

```bash
uv run ansible-playbook \
ansible/ewc.yml \
--inventory ansible/inventory \
--user <ewc_user> \
--private-key <private_key> \
--become
```

Example:

```bash
uv run ansible-playbook \
ansible/ewc.yml \
--inventory ansible/inventory \
--user elbadmin \
--private-key ~/.ssh/id_ewc \
--become
```

If sudo requires a password, include:

```bash
--ask-become-pass
```

---

# What the Playbook Does

The playbook performs the following actions:

1. Installs Docker on the VM
2. Pulls the SPO fetcher Docker image
3. Runs one container per SPO app (`dmi`, `sfs`)
4. Mounts Firebase credential files
5. Injects object storage and ESOH credentials
6. Configures cron jobs to run the fetcher every 10 minutes

---

# Running Containers

After deployment, the following containers will run:

```
firebase-fetcher-dmi
firebase-fetcher-sfs
```

---

# Monitoring the Pipeline

Check running containers:

```bash
docker ps
```

Check logs:

```bash
docker logs firebase-fetcher-dmi
```

Example log output:

```
Sending 42 feature(s) to ESOH
ESOH upload successful
Uploaded spo_data_dmi_20240601T120000.json
Deleted data from Firebase
```

---

# Cron Scheduling

The pipeline runs every **10 minutes** via cron.

Example cron entry:

```
*/10 * * * * docker restart firebase-fetcher-dmi
```

---

# Local Testing

To run the fetcher locally without object storage:

```bash
pdm run python -m spo_firebase_fetcher.fetcher \
--cred_path path/to/firebase.json \
--app dmi \
--time_limit 1 \
--no_s3
```

---

# Data Safety

The pipeline is designed to prevent data loss:

- ESOH upload is retried **5 times**
- Raw data is stored in object storage
- Firebase data is deleted **only after successful storage**

---

# Troubleshooting

### Check container logs

```bash
docker logs firebase-fetcher-dmi
```

### Verify object storage connectivity

Check that environment variables are present inside the container:

```bash
docker inspect firebase-fetcher-dmi
```

### Verify Firebase credential mounting

```bash
uv run ansible-playbook ansible/ewc.yml --inventory ansible/inventory --user <ewc_user> --private-key <private_key>
docker exec -it firebase-fetcher-dmi ls /creds
```
This command will execute the Ansible playbook defined in `playbook.yml` using the inventory file located in `inventory/hosts`. The playbook will set up the necessary environment and dependencies for the SPO data processing pipeline on the VM.

---
2 changes: 1 addition & 1 deletion ansible/inventory
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
# Static inventory of servers
##############################################
[spo-dev]
spo-data-processor ansible_host=<ewc_ip_adress> ansible_port=22 ansible_python_interpreter=/usr/bin/python3
spo-data-processor ansible_host=136.156.128.75 ansible_port=22 ansible_python_interpreter=/usr/bin/python3 ansible_user=elbadmin
67 changes: 62 additions & 5 deletions ansible/roles/spo-pipeline/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,69 @@
---

#####################
# Include vars
#####################

- name: Include role environment variables
- name: Include role variables
include_vars:
file: "{{ role_path }}/vars/main.yml"

- name: Include role credentials
include_vars:
dir: "vars"
extensions:
- "yml"
file: "{{ role_path }}/vars/credentials.yml"

- name: Install Docker engine
apt:
name: docker.io
state: present
update_cache: yes
become: yes

- name: Ensure Docker service is running
service:
name: docker
state: started
enabled: yes
become: yes

- name: Pull the Firebase Fetcher image
community.docker.docker_image:
name: "{{ fetcher_image }}"
source: pull
become: yes

- name: Show which creds file we will mount
debug:
msg: "→ {{ spo_base_path }}/{{ item }}app_firebase.json"
loop: "{{ fetcher_apps }}"

- name: Run one fetcher container per app
community.docker.docker_container:
name: "firebase-fetcher-{{ item }}"
image: "{{ fetcher_image }}"
state: started
restart_policy: "no"
volumes:
- "{{ spo_base_path }}/{{ item }}app_firebase.json:/creds/credentials.json:ro"
env:
OBJECT_STORE_ENDPOINT: "{{ object_store_endpoint }}"
OBJECT_STORE_BUCKET: "{{ object_store_bucket }}"
OBJECT_STORE_REGION: "{{ object_store_region }}"
OBJECT_STORE_ACCESS_KEY: "{{ object_store_access_key }}"
OBJECT_STORE_SECRET_KEY: "{{ object_store_secret_key }}"
ESOH_USERNAME: "{{ esoh_username }}"
ESOH_PASSWORD: "{{ esoh_password }}"
command:
- "--cred_path"
- "/creds/credentials.json"
- "--app"
- "{{ item }}"
loop: "{{ fetcher_apps }}"
become: yes

- name: Set up cron job to restart fetcher containers every 10 minutes
cron:
name: "Restart firebase-fetcher-{{ item }}"
minute: "*/10"
job: "docker restart firebase-fetcher-{{ item }}"
loop: "{{ fetcher_apps }}"
become: yes
15 changes: 15 additions & 0 deletions ansible/roles/spo-pipeline/vars/main.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,17 @@
---
user_home: '/home/{{ ansible_user }}'

# Docker built locally
fetcher_image: "spo-firebase-fetcher:latest"

# Which Firebase fetchers to run
fetcher_apps:
- dmi
- sfs

# Base path on the target host where JSON creds live
spo_base_path: "/home/elbadmin/spo_firebase_fetcher/spo_firebase_fetcher"

object_store_endpoint: "https://object-store.os-api.cci1.ecmwf.int"
object_store_bucket: "spo-firebase-data"
object_store_region: "default"