Skip to content

Commit

Permalink
feat: mylearn with airflow, mlflow and poetry on python 3.11 (#1)
Browse files Browse the repository at this point in the history
* chore: initialize mylearn with airflow, mlflow and poetry

* build(airflow): extend conf via pyproject.toml + misc

* build: upgrade required python version to 3.11

* ci: setup ci github action + disable tests as mlflow not on python 3.11

* docs: update readme badges
  • Loading branch information
MichaelKarpe authored Oct 30, 2022
1 parent fedf417 commit 62944b0
Show file tree
Hide file tree
Showing 30 changed files with 7,045 additions and 54 deletions.
25 changes: 0 additions & 25 deletions .circleci/config.yml

This file was deleted.

36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: ci

on: push

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3
- name: Set up python
id: setup-python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root
- name: Install project
run: poetry install --no-interaction
- name: Run tests
run: |
source .venv/bin/activate
poe checks
17 changes: 17 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
.env
.python-version
.idea/
.venv/
build/
data/
dist/
mylearn.egg-info/
**/__pycache__/**
**/.mypy_cache/**
**/.pytest_cache/**

airflow/dags/
metadata/
mlruns/
notebooks/.ipynb_checkpoints/

airflow/airflow.cfg
airflow/airflow.db
airflow/airflow-webserver.pid
airflow/webserver_config.py
airflow/logs
33 changes: 33 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: name-tests-test
- id: requirements-txt-fixer
- repo: https://github.com/asottile/add-trailing-comma
rev: v2.2.3
hooks:
- id: add-trailing-comma
args: [--py36-plus]
- repo: https://github.com/asottile/pyupgrade
rev: v2.37.3
hooks:
- id: pyupgrade
args: [--py37-plus]
- repo: https://github.com/psf/black
rev: 22.6.0
hooks:
- id: black
args: [-l 120, --check]
- repo: https://github.com/PyCQA/flake8
rev: 5.0.4
hooks:
- id: flake8
args: [--config=config/flake8.ini]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.971
hooks:
- id: mypy
additional_dependencies: [types-all]
97 changes: 93 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,97 @@
<h2 align="center">mylearn: my Machine Learning toolkit</h2>
<h2 align="center">mylearn: my Machine Learning framework</h2>

<p align="center">
<a href="https://circleci.com/gh/MichaelKarpe/mylearn"><img alt="Build Status" src="https://circleci.com/gh/MichaelKarpe/mylearn.svg?style=shield"></a>
<a href="https://github.com/psf/black/blob/master/LICENSE"><img alt="License: MIT" src="https://black.readthedocs.io/en/stable/_static/license.svg"></a>
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
<a href="https://pypi.org/project/mylearn"><img src="https://img.shields.io/pypi/v/mylearn.svg"></a>
<a href="https://pypi.org/project/mylearn"><img src="https://img.shields.io/pypi/pyversions/mylearn.svg"></a>
<a href="https://github.com/MichaelKarpe/mylearn/blob/main/LICENSE"><img src="https://img.shields.io/pypi/l/mylearn.svg"></a>
<a href="https://github.com/MichaelKarpe/mylearn/actions"><img src="https://github.com/MichaelKarpe/mylearn/workflows/ci/badge.svg"></a>
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
</p>

___

[mylearn](https://github.com/MichaelKarpe/mylearn) is a Machine Learning framework based on
[Airflow](https://github.com/apache/airflow) and [MLflow](https://github.com/mlflow/mlflow) for designing machine
learning systems in a production perspective.

**Work in progress... Stay tuned!**

# Index

1. [Prerequisites](#prerequisites)
2. [Installation & Setup](#installation-setup)
3. [Usage](#usage)

# Prerequisites

## pyenv

To be completed with how to install and setup pyenv

## poetry

To be completed with how to install and setup poetry

# Installation & Setup

mylearn leverages [poetry](https://github.com/python-poetry/poetry) and [poethepoet](https://github.com/nat-n/poethepoet)
to make its installation and setup surprisingly simple.

## Installation

It is recommended to install requirements within a virtualenv located at the project root level, although not required.
```commandline
poetry config virtualenvs.in-project true
```

Installation is run with
```commandline
poetry install
```

## Airflow Setup

Airflow setup is initialized via a `poe` command
```commandline
poe airflow-init
```

Airflow Scheduler & Webserver can be run with
```commandline
poe airflow-scheduler
poe airflow-webserver
```

Airflow UI can be opened at [localhost](0.0.0.0:8080) (port 8080), and you can login with username and password `admin`.

If you want to clean your Airflow setup before rerunning `poe airflow-init`, you need to kill Airflow Scheduler &
Webserver and run
```commandline
poe airflow-clean
```

## MLflow Setup

MLflow UI can be opened at [localhost](0.0.0.0:5000) (port 5000) after execution of the following command:
```commandline
poe mlflow-ui
```

# Usage

## MLflow Pipelines Regression Template

The *mlflow-template* pipeline, based on the
[MLflow Pipelines Regression Template](https://github.com/mlflow/mlp-regression-template), can be run independently with
```commandline
poe mlflow-run
```

or via an Airflow Directed Acyclic Graph (DAG) by triggering the *mlflow-template* DAG via Airflow UI or with
```commandline
TO BE COMPLETED
```

## Other examples

**Work in progress... Stay tuned!**
Empty file added airflow/dags/__init__.py
Empty file.
99 changes: 99 additions & 0 deletions airflow/dags/pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.bash import BashOperator

# The DAG object; we'll need this to instantiate a DAG
# Operators; we need this to operate!

with DAG(
"mlflow",
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args={
"depends_on_past": False,
"email": ["[email protected]"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 0,
"retry_delay": timedelta(seconds=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'sla_miss_callback': yet_another_function,
# 'trigger_rule': 'all_success'
},
description="MLflow DAG",
schedule_interval=timedelta(days=1),
start_date=datetime(2022, 8, 28),
catchup=False,
tags=["mlflow"],
) as dag:

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id="ingest",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step ingest;
""",
)

t2 = BashOperator(
task_id="split",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step split;
""",
)

t3 = BashOperator(
task_id="transform",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step transform;
""",
)

t4 = BashOperator(
task_id="train",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step train;
""",
)

t5 = BashOperator(
task_id="evaluate",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step evaluate;
""",
)

t6 = BashOperator(
task_id="register",
bash_command="""
cd ${AIRFLOW_HOME};
cd ..;
mlflow pipelines run --step register;
""",
)

t1 >> t2
t2 >> t3
t3 >> t4
t4 >> t5
t5 >> t6
2 changes: 2 additions & 0 deletions config/flake8.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[flake8]
max-line-length = 120
Binary file added data/sample.parquet
Binary file not shown.
Empty file added mlflow/__init__.py
Empty file.
Loading

0 comments on commit 62944b0

Please sign in to comment.