Skip to content

SpiliosDimakopoulos/Gnosis-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GNOSIS Analytics

A production-ready Python library for end-to-end data analysis — preprocessing, statistical testing, machine learning, and automated PDF report generation.

Python License: MIT Code style: black


Features

Module Capability
DataLoader CSV / Excel / JSON from local path or URL — auto encoding detection
DataCleaner Missing value imputation, IQR outlier handling, normalisation, categorical encoding
StatisticsAnalysis Descriptive stats, correlation matrix, Shapiro-Wilk normality, linear regression, category-specific metrics
MachineLearningAnalysis 13 classifiers + regressors, cross-validation, MSE / accuracy reporting
ReportGeneration Multi-page PDF: cover, dataset overview, heatmap, distributions, ML charts

Quick Start

git clone https://github.com/SpiliosDimakopoulos/gnosis-analytics.git
cd gnosis-analytics
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python -m gnosis_analytics.cli

Project Structure

gnosis-analytics/
├── src/
│   └── gnosis_analytics/
│       ├── __init__.py
│       ├── cli.py                       # CLI entry point + run_pipeline()
│       ├── config.py                    # Environment-based AppConfig
│       ├── exceptions.py                # Custom exception hierarchy
│       ├── dataset_loader.py            # DataLoader + DataCategorizer
│       ├── data_cleaner.py              # Full cleaning pipeline
│       ├── data_overview.py             # PDF dataset overview pages
│       ├── statistical_analysis.py      # Statistical analysis
│       ├── machine_learning_analysis.py # ML training & evaluation
│       ├── visualization.py             # Matplotlib/Seaborn charts
│       └── report_generation.py         # PDF report assembly
├── tests/
│   └── test_pipeline.py                 # pytest unit tests
├── examples/
│   ├── README.md
│   ├── ai_product_manager/              # AI PM portfolio example (2,500 records)
│   │   ├── run_analysis.py
│   │   └── ai_product_manager_dataset.csv
│   └── telco_churn/                     # Telco churn example (7,043 records)
│       ├── run_analysis.py
│       └── telco_customer_churn.csv
├── config/
│   └── logging.yaml
├── .env.example
├── pyproject.toml
└── requirements.txt

Examples

AI Product Manager Portfolio

cd examples/ai_product_manager
python run_analysis.py

Analyses 2,500 AI product initiative records across 8 products, 6 teams, and 5 development stages. Tracks model accuracy, inference latency, MAU, NPS, OKR score, bias score, deploy frequency, and revenue.

Telco Customer Churn

cd examples/telco_churn
python run_analysis.py

Analyses 7,043 telecom customer records to predict churn based on contract type, internet service, billing, and tenure.

See examples/README.md for full dataset documentation.


Programmatic API

from gnosis_analytics.cli import run_pipeline

report = run_pipeline(
    source="data.csv",             # local path or https:// URL
    category=2,                    # 1=Financial  2=Business  3=Scientific
    algorithm_ids=[1, 3, 5],       # see algorithm table below
    target_column="Churn",
    report_path="reports/report.pdf",
    missing_strategy="median",     # mean | median | mode | drop
    outlier_strategy="cap",        # cap | remove
    normalization_method="minmax", # minmax | zscore
)

Available ML Algorithms

ID Algorithm Type
1 Logistic Regression Classification
2 Linear Regression Regression
3 Random Forest Classifier Classification
4 Random Forest Regressor Regression
5 Gradient Boosting Classifier Classification
6 Gradient Boosting Regressor Regression
7 AdaBoost Classifier Classification
8 Decision Tree Classifier Classification
9 Decision Tree Regressor Regression
10 Support Vector Classifier (SVC) Classification
11 Support Vector Regressor (SVR) Regression
12 KNN Classifier Classification
13 KNN Regressor Regression

Configuration

Copy .env.example to .env:

Variable Default Description
APP_ENV development development or production
LOG_LEVEL INFO DEBUG / INFO / WARNING / ERROR
REPORT_OUTPUT_DIR ./reports PDF output directory
REPORT_FILENAME report.pdf Default report filename
DEFAULT_MISSING_STRATEGY median mean / median / mode / drop
DEFAULT_OUTLIER_STRATEGY cap cap / remove
DEFAULT_NORMALIZATION minmax minmax / zscore

Testing

pytest
pytest -v
pytest --cov=gnosis_analytics --cov-report=term-missing

License

MIT © 2024 Spilios Dimakopoulos

About

End-to-end Python library for data analysis — preprocessing, statistical testing, 13 ML algorithms, and automated PDF report generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages