This repository is no longer actively maintained. Due to shifting priorities and limited resources, we have decided to archive the repository and discontinue further development and maintenance.
What this means:
- No new features or updates will be added.
- Issues and pull requests will no longer be reviewed or responded to.
- You are welcome to fork the project and continue development under your own maintenance.
This tool is an extension for the Python Framework luigi which helps to build reproducable and complex data pipelines for batch jobs. Visit our docs to learn more!
This is how an end-to-end luisy pipeline may look like:
import luisy
import pandas as pd
@luisy.raw
@luisy.csv_output(delimiter=',')
class InputFile(luisy.ExternalTask):
label = luisy.Parameter()
def get_file_name(self):
return f"file_{self.label}"
@luisy.interim
@luisy.requires(InputFile)
class ProcessedFile(luisy.Task):
def run(self):
df = self.input().read()
# Some more preprocessings
# ...
# Write to disk
self.write(df)
@luisy.final
class MergedFile(luisy.ConcatenationTask):
def requires(self):
for label in ['a', 'b', 'c', 'd']:
yield ProcessedFile(label=label)Stable Branch: main
Minimum python version: 3.8
Install luisy with
pip install luisyTo run all unittests that are inside the tests directory use the following command:
pytestPlease have a look at our contribution guide.
| Name | License | Type |
|---|---|---|
| numpy | BSD-3-Clause License | Dependency |
| pandas | BSD 3-Clause License | Dependency |
| networkx | BSD-3-Clause License | Dependency |
| luigi | Apache License 2.0 | Dependency |
| distlib | Python license | Dependency |
| matplotlib | Other | Dependency |
| azure-storage-blob | MIT License | Dependency |
| tables | BSD license | Dependency |
| pipdeptree | MIT License | Dependency |
| requirements-parser | Apache License 2.0 | Dependency |
| pyarrow | Apache License 2.0 | Dependency |
| spark | Apache License 2.0 | Dependency |
| Name | License | Type |
|---|---|---|
| sphinx | BSD-2-Clause | Dependency |
| sphinx_rtd_theme | MIT License | Dependency |
| flake8 | MIT License | Dependency |
| pytest | MIT License | Dependency |
| pytest-flake8 | BSD License | Dependency |
| pytest-cov | MIT License | Dependency |
| pip-tools | BSD 3-Clause License | Dependency |