diff --git a/README.md b/README.md index 6c82cd8..23be1d0 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,289 @@ -# Deployment of SPO data processing to VM +# Deployment of SPO Data Processing Pipeline -The project uses ansible to deploy a data processing pipeline for Smartphone Pressure Obsersvations (SPO) to a virtual machine (VM). The pipeline is designed to process and analyze data from the SPO project, which involves collecting pressure data from smartphones. +This project deploys a data processing pipeline for **Smartphone Pressure Observations (SPO)** to a virtual machine using **Ansible** and **Docker**. -To manually deploy the pipeline, run the following command: +The pipeline retrieves pressure observations from **Firebase**, processes them, and: + +- Sends observations to **ESOH** +- Stores raw JSON data in **ECMWF Object Storage (S3-compatible)** +- Deletes processed data from **Firebase after successful storage** + +The pipeline is executed inside **Docker containers** and scheduled via **cron** to run periodically. + +--- + +# Architecture Overview + +The deployed pipeline performs the following steps: + +1. Fetch SPO observations from **Firebase** +2. Convert observations into **ESOH-compatible features** +3. Upload features to **ESOH** +4. Store raw observation data in **ECMWF Object Storage** +5. Delete processed data from Firebase after successful storage + +If ESOH upload fails, the pipeline retries up to **5 times** before continuing. +Firebase deletion depends on **successful storage in object storage**, ensuring that no data is lost. + +--- + +# Deployment Requirements + +The deployment requires: + +- An **EWC VM** +- **Docker** installed on the VM (handled by Ansible) +- **Ansible** installed locally +- **SSH access** to the VM +- **Firebase credential files** +- **Object storage credentials** +- **ESOH credentials** + +--- + +# Repository Structure + +``` +ansible/ + ├── ewc.yml + ├── inventory + └── roles/ + └── spo-pipeline/ + ├── tasks/ + │ └── main.yaml + └── vars/ + ├── main.yml + └── credentials.yml + +spo_firebase_fetcher/ + ├── fetcher.py + └── firebase_config.py + +Dockerfile +pyproject.toml +README.md +``` + +--- + +# Configuration + +## Main configuration + +`ansible/roles/spo-pipeline/vars/main.yml` + +Example: + +```yaml +fetcher_image: "your-registry/spo-firebase-fetcher:latest" + +fetcher_apps: + - dmi + - sfs + +spo_base_path: "/opt/spo" + +object_store_endpoint: "https://object-store.os-api.cci1.ecmwf.int" +object_store_bucket: "spo-firebase-data" +object_store_region: "default" +``` + +--- + +## Credentials + +Credentials should be stored in: + +``` +ansible/roles/spo-pipeline/vars/credentials.yml +``` + +Example: + +```yaml +object_store_access_key: "YOUR_ACCESS_KEY" +object_store_secret_key: "YOUR_SECRET_KEY" + +esoh_username: "YOUR_ESOH_USERNAME" +esoh_password: "YOUR_ESOH_PASSWORD" +``` + +For security, this file should be encrypted using **Ansible Vault**. + +Example: + +```bash +ansible-vault encrypt ansible/roles/spo-pipeline/vars/credentials.yml +``` + +--- + +# Firebase Credential Files + +The deployment expects Firebase credential files on the VM: + +``` +/opt/spo/dmiapp_firebase.json +/opt/spo/sfsapp_firebase.json +``` + +These files are mounted inside the container as: + +``` +/creds/credentials.json +``` + +--- + +# Docker Image + +The Docker image runs the SPO fetcher: + +``` +python -m spo_firebase_fetcher.fetcher +``` + +The container receives configuration through environment variables provided by Ansible. + +--- + +# Deploying the Pipeline + +To deploy the pipeline to the VM: + +```bash +uv run ansible-playbook \ + ansible/ewc.yml \ + --inventory ansible/inventory \ + --user \ + --private-key \ + --become +``` + +Example: + +```bash +uv run ansible-playbook \ + ansible/ewc.yml \ + --inventory ansible/inventory \ + --user elbadmin \ + --private-key ~/.ssh/id_ewc \ + --become +``` + +If sudo requires a password, include: + +```bash +--ask-become-pass +``` + +--- + +# What the Playbook Does + +The playbook performs the following actions: + +1. Installs Docker on the VM +2. Pulls the SPO fetcher Docker image +3. Runs one container per SPO app (`dmi`, `sfs`) +4. Mounts Firebase credential files +5. Injects object storage and ESOH credentials +6. Configures cron jobs to run the fetcher every 10 minutes + +--- + +# Running Containers + +After deployment, the following containers will run: + +``` +firebase-fetcher-dmi +firebase-fetcher-sfs +``` + +--- + +# Monitoring the Pipeline + +Check running containers: + +```bash +docker ps +``` + +Check logs: + +```bash +docker logs firebase-fetcher-dmi +``` + +Example log output: + +``` +Sending 42 feature(s) to ESOH +ESOH upload successful +Uploaded spo_data_dmi_20240601T120000.json +Deleted data from Firebase +``` + +--- + +# Cron Scheduling + +The pipeline runs every **10 minutes** via cron. + +Example cron entry: + +``` +*/10 * * * * docker restart firebase-fetcher-dmi +``` + +--- + +# Local Testing + +To run the fetcher locally without object storage: + +```bash +pdm run python -m spo_firebase_fetcher.fetcher \ + --cred_path path/to/firebase.json \ + --app dmi \ + --time_limit 1 \ + --no_s3 +``` + +--- + +# Data Safety + +The pipeline is designed to prevent data loss: + +- ESOH upload is retried **5 times** +- Raw data is stored in object storage +- Firebase data is deleted **only after successful storage** + +--- + +# Troubleshooting + +### Check container logs + +```bash +docker logs firebase-fetcher-dmi +``` + +### Verify object storage connectivity + +Check that environment variables are present inside the container: + +```bash +docker inspect firebase-fetcher-dmi +``` + +### Verify Firebase credential mounting ```bash -uv run ansible-playbook ansible/ewc.yml --inventory ansible/inventory --user --private-key +docker exec -it firebase-fetcher-dmi ls /creds ``` -This command will execute the Ansible playbook defined in `playbook.yml` using the inventory file located in `inventory/hosts`. The playbook will set up the necessary environment and dependencies for the SPO data processing pipeline on the VM. + +--- diff --git a/ansible/inventory b/ansible/inventory index 6439cb5..3f2a08e 100644 --- a/ansible/inventory +++ b/ansible/inventory @@ -3,4 +3,4 @@ # Static inventory of servers ############################################## [spo-dev] -spo-data-processor ansible_host= ansible_port=22 ansible_python_interpreter=/usr/bin/python3 +spo-data-processor ansible_host=136.156.128.75 ansible_port=22 ansible_python_interpreter=/usr/bin/python3 ansible_user=elbadmin diff --git a/ansible/roles/spo-pipeline/tasks/main.yml b/ansible/roles/spo-pipeline/tasks/main.yml index dcb653c..7e70f71 100644 --- a/ansible/roles/spo-pipeline/tasks/main.yml +++ b/ansible/roles/spo-pipeline/tasks/main.yml @@ -1,12 +1,69 @@ --- - ##################### # Include vars ##################### -- name: Include role environment variables +- name: Include role variables + include_vars: + file: "{{ role_path }}/vars/main.yml" + +- name: Include role credentials include_vars: - dir: "vars" - extensions: - - "yml" + file: "{{ role_path }}/vars/credentials.yml" + +- name: Install Docker engine + apt: + name: docker.io + state: present + update_cache: yes + become: yes + +- name: Ensure Docker service is running + service: + name: docker + state: started + enabled: yes + become: yes + +- name: Pull the Firebase Fetcher image + community.docker.docker_image: + name: "{{ fetcher_image }}" + source: pull + become: yes + +- name: Show which creds file we will mount + debug: + msg: "→ {{ spo_base_path }}/{{ item }}app_firebase.json" + loop: "{{ fetcher_apps }}" + +- name: Run one fetcher container per app + community.docker.docker_container: + name: "firebase-fetcher-{{ item }}" + image: "{{ fetcher_image }}" + state: started + restart_policy: "no" + volumes: + - "{{ spo_base_path }}/{{ item }}app_firebase.json:/creds/credentials.json:ro" + env: + OBJECT_STORE_ENDPOINT: "{{ object_store_endpoint }}" + OBJECT_STORE_BUCKET: "{{ object_store_bucket }}" + OBJECT_STORE_REGION: "{{ object_store_region }}" + OBJECT_STORE_ACCESS_KEY: "{{ object_store_access_key }}" + OBJECT_STORE_SECRET_KEY: "{{ object_store_secret_key }}" + ESOH_USERNAME: "{{ esoh_username }}" + ESOH_PASSWORD: "{{ esoh_password }}" + command: + - "--cred_path" + - "/creds/credentials.json" + - "--app" + - "{{ item }}" + loop: "{{ fetcher_apps }}" + become: yes +- name: Set up cron job to restart fetcher containers every 10 minutes + cron: + name: "Restart firebase-fetcher-{{ item }}" + minute: "*/10" + job: "docker restart firebase-fetcher-{{ item }}" + loop: "{{ fetcher_apps }}" + become: yes \ No newline at end of file diff --git a/ansible/roles/spo-pipeline/vars/main.yml b/ansible/roles/spo-pipeline/vars/main.yml index 2e476d7..b9374b2 100644 --- a/ansible/roles/spo-pipeline/vars/main.yml +++ b/ansible/roles/spo-pipeline/vars/main.yml @@ -1,2 +1,17 @@ --- user_home: '/home/{{ ansible_user }}' + +# Docker built locally +fetcher_image: "spo-firebase-fetcher:latest" + +# Which Firebase fetchers to run +fetcher_apps: + - dmi + - sfs + +# Base path on the target host where JSON creds live +spo_base_path: "/home/elbadmin/spo_firebase_fetcher/spo_firebase_fetcher" + +object_store_endpoint: "https://object-store.os-api.cci1.ecmwf.int" +object_store_bucket: "spo-firebase-data" +object_store_region: "default" \ No newline at end of file