ThePhoenixAgency · EthanThePhoenix38 · Dec 17, 2025 · Dec 13, 2025 · Dec 13, 2025 · Dec 13, 2025
diff --git a/.coverage b/.coverage
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+
+# Python cache
+__pycache__/
+*.pyc
+.coverage
+.pytest_cache/
diff --git a/CODE_REVIEW_RESPONSE.md b/CODE_REVIEW_RESPONSE.md
@@ -0,0 +1,109 @@
+# Code Review Response
+
+This document summarizes the changes made in response to the code review feedback.
+
+## Review Comments Addressed
+
+### 1. TimeSeriesModule.py - Line 82: Missing Sampling Rate Parameter
+
+**Issue**: The `rfftfreq` function needs a sampling rate parameter (d) to produce correct frequency values. Without it, frequencies are calculated assuming unit sampling rate, which is incorrect for time series with specific time intervals.
+
+**Fix Applied** (Commit: ae2fe0d):
+- Added automatic detection of sampling interval from series index
+- For DatetimeIndex/TimedeltaIndex: extracts frequency from index.freq or calculates from first two points
+- For explicit timestamps: uses timestamp deltas
+- Falls back to default of 1.0 for non-temporal data
+- Fixed deprecation warning by using `pd.Timedelta()` instead of `.delta`
+
+**Code Added**:
+```python
+# Determine sampling interval for correct frequency calculation
+sampling_interval = 1.0
+if isinstance(series.index, (pd.DatetimeIndex, pd.TimedeltaIndex)):
+    if hasattr(series.index, "freq") and series.index.freq is not None:
+        sampling_interval = pd.Timedelta(series.index.freq).total_seconds()
+    elif len(series.index) > 1:
+        delta = series.index[1] - series.index[0]
+        sampling_interval = delta.total_seconds()
+# ... then uses: freqs = np.fft.rfftfreq(len(series), d=sampling_interval)
+```
+
+### 2. TimeSeriesModule.py - Lines 139-142: Same Sampling Rate Issue in get_seasonality()
+
+**Issue**: The same sampling rate parameter issue exists in the `get_seasonality()` method.
+
+**Fix Applied** (Commit: ae2fe0d):
+- Applied identical sampling interval detection logic
+- Used `pd.Timedelta()` to avoid deprecation warnings
+- Ensures period detection is accurate for time series with explicit time scales
+
+### 3. TimeSeriesModule.py - Line 134: Unused Variable
+
+**Issue**: The `acf` variable (autocorrelation) is calculated but never used in the `get_seasonality()` method.
+
+**Fix Applied** (Commit: ae2fe0d):
+- Removed the unused line: `acf = pd.Series(series).autocorr()`
+- Improves performance by eliminating unnecessary computation
+
+### 4. FrequenceModule.py - Lines 92-100: Logic Issue with normalize=True
+
+**Issue**: When `process()` is called with `normalize=True`, the result DataFrame has "Fréquence Relative" columns instead of "Fréquence". This causes `get_frequence_relative()` to fail because it expects the "Fréquence" column to exist.
+
+**Fix Applied** (Commit: ae2fe0d):
+- Modified `process()` to always compute and store absolute frequencies in `self.result`
+- When `normalize=True`, returns relative frequencies as a separate DataFrame
+- Internal `self.result` always has "Fréquence" column, ensuring `get_frequence_relative()` works correctly
+
+**Updated Logic**:
+```python
+# Always store absolute frequencies
+freq = series.value_counts(normalize=False)
+self.result = pd.DataFrame({"Fréquence": freq, "Fréquence Cumulée": cum_freq})
+
+if normalize:
+    # Return relative frequencies separately
+    rel_freq = self.result["Fréquence"] / self.result["Fréquence"].sum()
+    rel_cum_freq = rel_freq.cumsum()
+    return pd.DataFrame({
+        "Fréquence Relative": rel_freq,
+        "Fréquence Relative Cumulée": rel_cum_freq,
+    }, index=self.result.index)
+```
+
+## Additional Improvements
+
+### Housekeeping
+- Removed accidentally committed cache files (`__pycache__`, `.coverage`)
+- Updated `.gitignore` to prevent future commits of cache files
+
+## Testing
+
+All 12 existing tests pass:
+```
+tests/test_basic_imports.py::TestBasicImports::test_matplotlib PASSED
+tests/test_basic_imports.py::TestBasicImports::test_numpy PASSED
+tests/test_basic_imports.py::TestBasicImports::test_pandas PASSED
+tests/test_basic_imports.py::TestBasicImports::test_sklearn PASSED
+tests/test_correlation.py::TestCorrelationAnalysis::test_analyze_dataframe PASSED
+tests/test_correlation.py::TestCorrelationAnalysis::test_analyze_univariate PASSED
+tests/test_descriptives.py::TestDescriptiveStatistics::test_analyze_dataframe PASSED
+tests/test_descriptives.py::TestDescriptiveStatistics::test_analyze_list PASSED
+tests/test_regression_module.py::TestRegressionModule::test_linear_regression_fit PASSED
+tests/test_regression_module.py::TestRegressionModule::test_linear_regression_predict PASSED
+tests/test_regression_module.py::TestRegressionModule::test_regression_coefficients PASSED
+tests/test_regression_module.py::TestRegressionModule::test_regression_metrics PASSED
+```
+
+## Commits Made
+
+1. **25715b3**: Apply code review feedback: fix sampling rate, remove unused var, fix normalize logic
+2. **ae2fe0d**: Remove cache files and update gitignore
+
+## Impact
+
+These fixes ensure:
+- ✅ Correct frequency and period calculations for time series with real-world time scales
+- ✅ Consistent API behavior for FrequenceModule regardless of normalize parameter
+- ✅ Better code quality with no unused variables
+- ✅ Cleaner repository without cache files
+- ✅ Modern pandas API usage (pd.Timedelta instead of deprecated .delta)
diff --git a/py_stats_toolkit/__pycache__/__init__.cpython-312.pyc b/py_stats_toolkit/__pycache__/__init__.cpython-312.pyc
diff --git a/py_stats_toolkit/stats/__pycache__/correlation.cpython-312.pyc b/py_stats_toolkit/stats/__pycache__/correlation.cpython-312.pyc
diff --git a/py_stats_toolkit/stats/__pycache__/descriptives.cpython-312.pyc b/py_stats_toolkit/stats/__pycache__/descriptives.cpython-312.pyc
diff --git a/py_stats_toolkit/stats/__pycache__/regression.cpython-312.pyc b/py_stats_toolkit/stats/__pycache__/regression.cpython-312.pyc
diff --git a/py_stats_toolkit/stats/correlation/CorrelationModule.py b/py_stats_toolkit/stats/correlation/CorrelationModule.py
@@ -18,9 +18,33 @@
 =====================================================================
 """
 
+tags : module, stats
+"""
 from typing import List, Tuple, Union
 
 import pandas as pd
+from scipy import stats
+
+from ...utils.parallel import ParallelProcessor
+from ..core.AbstractClassBase import StatisticalModule
+
+
+class CorrelationModule(StatisticalModule):
+    """Module pour l'analyse de corrélation."""
+
+    def __init__(self, n_jobs: int = -1):
+        super().__init__()
+        self.method = None
+        self.parallel_processor = ParallelProcessor(n_jobs=n_jobs)
+
+    def process(self, data, method="pearson", **kwargs):
+        """
+        Calcule la corrélation entre les variables.
+
+        Args:
+            data: Données d'entrée (pandas DataFrame)
+            method: Méthode de corrélation ('pearson', 'spearman', 'kendall')
+            **kwargs: Arguments additionnels
 
 from py_stats_toolkit.algorithms import correlation as correlation_algos
 from py_stats_toolkit.core.base import StatisticalModule
@@ -59,6 +83,29 @@ def process(self, data: Union[pd.DataFrame, pd.Series], method: str = "pearson",
         Returns:
             Correlation matrix
         """
+        self.validate_data(data)
+        self.method = method
+
+        if not isinstance(data, pd.DataFrame):
+            raise TypeError("Les données doivent être un pandas DataFrame")
+
+        # Compute correlation matrix directly
+        # pandas/numpy already use optimized algorithms
+        # Note: Chunking correlation computation produces incorrect results because
+        # correlation requires all data points to compute proper covariance and variance statistics
+        self.result = data.corr(method=method)
+        return self.result
+
+    def get_correlation_matrix(self):
+        """Retourne la matrice de corrélation."""
+        return self.result
+
+    def get_correlation_pairs(self, threshold=0.5):
+        """
+        Retourne les paires de variables avec une corrélation supérieure au seuil.
+
+        Args:
+            threshold: Seuil de corrélation
         # Validation (delegated to validator)
         DataValidator.validate_data(data)
 
@@ -100,6 +147,27 @@ def get_correlation_pairs(self, threshold: float = 0.5) -> List[Tuple[str, str,
         Returns:
             List of (var1, var2, correlation) tuples
         """
+        if self.result is None:
+            raise ValueError("Exécutez d'abord process()")
+
+        # Utilisation de numpy pour le calcul parallèle des paires
+        corr_matrix = self.result.to_numpy()
+        n = len(self.result.columns)
+
+        # Création des indices pour les paires
+        i, j = np.triu_indices(n, k=1)
+        corr_values = corr_matrix[i, j]
+
+        # Filtrage des paires selon le seuil
+        mask = np.abs(corr_values) >= threshold
+        mask_indices = np.where(mask)[0]
+
+        # Vectorized construction of pairs using list comprehension
+        pairs = [
+            (self.result.columns[i[idx]], self.result.columns[j[idx]], corr_values[idx])
+            for idx in mask_indices
+        ]
+
         if not self.has_result():
             raise ValueError("No analysis performed. Call process() first.")