GNOSIS Analytics

A production-ready Python library for end-to-end data analysis — preprocessing, statistical testing, machine learning, and automated PDF report generation.

Features

Module	Capability
`DataLoader`	CSV / Excel / JSON from local path or URL — auto encoding detection
`DataCleaner`	Missing value imputation, IQR outlier handling, normalisation, categorical encoding
`StatisticsAnalysis`	Descriptive stats, correlation matrix, Shapiro-Wilk normality, linear regression, category-specific metrics
`MachineLearningAnalysis`	13 classifiers + regressors, cross-validation, MSE / accuracy reporting
`ReportGeneration`	Multi-page PDF: cover, dataset overview, heatmap, distributions, ML charts

Quick Start

git clone https://github.com/SpiliosDimakopoulos/gnosis-analytics.git
cd gnosis-analytics
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python -m gnosis_analytics.cli

Project Structure

gnosis-analytics/
├── src/
│   └── gnosis_analytics/
│       ├── __init__.py
│       ├── cli.py                       # CLI entry point + run_pipeline()
│       ├── config.py                    # Environment-based AppConfig
│       ├── exceptions.py                # Custom exception hierarchy
│       ├── dataset_loader.py            # DataLoader + DataCategorizer
│       ├── data_cleaner.py              # Full cleaning pipeline
│       ├── data_overview.py             # PDF dataset overview pages
│       ├── statistical_analysis.py      # Statistical analysis
│       ├── machine_learning_analysis.py # ML training & evaluation
│       ├── visualization.py             # Matplotlib/Seaborn charts
│       └── report_generation.py         # PDF report assembly
├── tests/
│   └── test_pipeline.py                 # pytest unit tests
├── examples/
│   ├── README.md
│   ├── ai_product_manager/              # AI PM portfolio example (2,500 records)
│   │   ├── run_analysis.py
│   │   └── ai_product_manager_dataset.csv
│   └── telco_churn/                     # Telco churn example (7,043 records)
│       ├── run_analysis.py
│       └── telco_customer_churn.csv
├── config/
│   └── logging.yaml
├── .env.example
├── pyproject.toml
└── requirements.txt

Examples

AI Product Manager Portfolio

cd examples/ai_product_manager
python run_analysis.py

Analyses 2,500 AI product initiative records across 8 products, 6 teams, and 5 development stages. Tracks model accuracy, inference latency, MAU, NPS, OKR score, bias score, deploy frequency, and revenue.

Telco Customer Churn

cd examples/telco_churn
python run_analysis.py

Analyses 7,043 telecom customer records to predict churn based on contract type, internet service, billing, and tenure.

See examples/README.md for full dataset documentation.

Programmatic API

from gnosis_analytics.cli import run_pipeline

report = run_pipeline(
    source="data.csv",             # local path or https:// URL
    category=2,                    # 1=Financial  2=Business  3=Scientific
    algorithm_ids=[1, 3, 5],       # see algorithm table below
    target_column="Churn",
    report_path="reports/report.pdf",
    missing_strategy="median",     # mean | median | mode | drop
    outlier_strategy="cap",        # cap | remove
    normalization_method="minmax", # minmax | zscore
)

Available ML Algorithms

ID	Algorithm	Type
1	Logistic Regression	Classification
2	Linear Regression	Regression
3	Random Forest Classifier	Classification
4	Random Forest Regressor	Regression
5	Gradient Boosting Classifier	Classification
6	Gradient Boosting Regressor	Regression
7	AdaBoost Classifier	Classification
8	Decision Tree Classifier	Classification
9	Decision Tree Regressor	Regression
10	Support Vector Classifier (SVC)	Classification
11	Support Vector Regressor (SVR)	Regression
12	KNN Classifier	Classification
13	KNN Regressor	Regression

Configuration

Copy .env.example to .env:

Variable	Default	Description
`APP_ENV`	`development`	`development` or `production`
`LOG_LEVEL`	`INFO`	`DEBUG` / `INFO` / `WARNING` / `ERROR`
`REPORT_OUTPUT_DIR`	`./reports`	PDF output directory
`REPORT_FILENAME`	`report.pdf`	Default report filename
`DEFAULT_MISSING_STRATEGY`	`median`	`mean` / `median` / `mode` / `drop`
`DEFAULT_OUTLIER_STRATEGY`	`cap`	`cap` / `remove`
`DEFAULT_NORMALIZATION`	`minmax`	`minmax` / `zscore`

Testing

pytest
pytest -v
pytest --cov=gnosis_analytics --cov-report=term-missing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GNOSIS Analytics

Features

Quick Start

Project Structure

Examples

AI Product Manager Portfolio

Telco Customer Churn

Programmatic API

Available ML Algorithms

Configuration

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
examples		examples
src/gnosis_analytics		src/gnosis_analytics
tests		tests
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

GNOSIS Analytics

Features

Quick Start

Project Structure

Examples

AI Product Manager Portfolio

Telco Customer Churn

Programmatic API

Available ML Algorithms

Configuration

Testing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages