A production-ready Python library for end-to-end data analysis — preprocessing, statistical testing, machine learning, and automated PDF report generation.
| Module | Capability |
|---|---|
DataLoader |
CSV / Excel / JSON from local path or URL — auto encoding detection |
DataCleaner |
Missing value imputation, IQR outlier handling, normalisation, categorical encoding |
StatisticsAnalysis |
Descriptive stats, correlation matrix, Shapiro-Wilk normality, linear regression, category-specific metrics |
MachineLearningAnalysis |
13 classifiers + regressors, cross-validation, MSE / accuracy reporting |
ReportGeneration |
Multi-page PDF: cover, dataset overview, heatmap, distributions, ML charts |
git clone https://github.com/SpiliosDimakopoulos/gnosis-analytics.git
cd gnosis-analytics
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python -m gnosis_analytics.clignosis-analytics/
├── src/
│ └── gnosis_analytics/
│ ├── __init__.py
│ ├── cli.py # CLI entry point + run_pipeline()
│ ├── config.py # Environment-based AppConfig
│ ├── exceptions.py # Custom exception hierarchy
│ ├── dataset_loader.py # DataLoader + DataCategorizer
│ ├── data_cleaner.py # Full cleaning pipeline
│ ├── data_overview.py # PDF dataset overview pages
│ ├── statistical_analysis.py # Statistical analysis
│ ├── machine_learning_analysis.py # ML training & evaluation
│ ├── visualization.py # Matplotlib/Seaborn charts
│ └── report_generation.py # PDF report assembly
├── tests/
│ └── test_pipeline.py # pytest unit tests
├── examples/
│ ├── README.md
│ ├── ai_product_manager/ # AI PM portfolio example (2,500 records)
│ │ ├── run_analysis.py
│ │ └── ai_product_manager_dataset.csv
│ └── telco_churn/ # Telco churn example (7,043 records)
│ ├── run_analysis.py
│ └── telco_customer_churn.csv
├── config/
│ └── logging.yaml
├── .env.example
├── pyproject.toml
└── requirements.txt
cd examples/ai_product_manager
python run_analysis.pyAnalyses 2,500 AI product initiative records across 8 products, 6 teams, and 5 development stages. Tracks model accuracy, inference latency, MAU, NPS, OKR score, bias score, deploy frequency, and revenue.
cd examples/telco_churn
python run_analysis.pyAnalyses 7,043 telecom customer records to predict churn based on contract type, internet service, billing, and tenure.
See examples/README.md for full dataset documentation.
from gnosis_analytics.cli import run_pipeline
report = run_pipeline(
source="data.csv", # local path or https:// URL
category=2, # 1=Financial 2=Business 3=Scientific
algorithm_ids=[1, 3, 5], # see algorithm table below
target_column="Churn",
report_path="reports/report.pdf",
missing_strategy="median", # mean | median | mode | drop
outlier_strategy="cap", # cap | remove
normalization_method="minmax", # minmax | zscore
)| ID | Algorithm | Type |
|---|---|---|
| 1 | Logistic Regression | Classification |
| 2 | Linear Regression | Regression |
| 3 | Random Forest Classifier | Classification |
| 4 | Random Forest Regressor | Regression |
| 5 | Gradient Boosting Classifier | Classification |
| 6 | Gradient Boosting Regressor | Regression |
| 7 | AdaBoost Classifier | Classification |
| 8 | Decision Tree Classifier | Classification |
| 9 | Decision Tree Regressor | Regression |
| 10 | Support Vector Classifier (SVC) | Classification |
| 11 | Support Vector Regressor (SVR) | Regression |
| 12 | KNN Classifier | Classification |
| 13 | KNN Regressor | Regression |
Copy .env.example to .env:
| Variable | Default | Description |
|---|---|---|
APP_ENV |
development |
development or production |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
REPORT_OUTPUT_DIR |
./reports |
PDF output directory |
REPORT_FILENAME |
report.pdf |
Default report filename |
DEFAULT_MISSING_STRATEGY |
median |
mean / median / mode / drop |
DEFAULT_OUTLIER_STRATEGY |
cap |
cap / remove |
DEFAULT_NORMALIZATION |
minmax |
minmax / zscore |
pytest
pytest -v
pytest --cov=gnosis_analytics --cov-report=term-missingMIT © 2024 Spilios Dimakopoulos