Create xi_multipoles.py to compress original auto-cf to its multipoles by p-slash · Pull Request #140 · andreicuceu/vega

p-slash · 2025-02-20T20:25:34Z

No description provided.

codecov · 2025-02-20T21:24:19Z

Codecov Report

Attention: Patch coverage is 0% with 68 lines in your changes missing coverage. Please review.

Project coverage is 39.15%. Comparing base (28f612f) to head (f0c3207).

Files with missing lines	Patch %	Lines
vega/xi_multipoles.py	0.00%	68 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #140      +/-   ##
==========================================
- Coverage   39.89%   39.15%   -0.74%     
==========================================
  Files          30       31       +1     
  Lines        3645     3713      +68     
  Branches      678      687       +9     
==========================================
  Hits         1454     1454              
- Misses       2047     2115      +68     
  Partials      144      144

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copilot

Pull request overview

This PR introduces a new module xi_multipoles.py that provides functionality to compress correlation function data into multipole representations using Legendre polynomial decomposition.

Key Changes

Adds XiMultipoles class for fitting correlation function data to Legendre multipole basis
Implements automatic overfitting detection with chi-squared criterion
Provides Monte Carlo error estimation capabilities

Comments suppressed due to low confidence (1)

vega/xi_multipoles.py:57

Variable idx is not used.

        idx = ((rr - self.rmin) / self.dr).astype(int)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T19:36:35Z

vega/xi_multipoles.py

+        self.xi_knots_fit = np.zeros((nl, nr))
+        self.std_xi_mc = np.zeros((nl, nr))
+
+    def __call__(self, r, mu, xi_knots=None):


Missing docstring for the __call__ method. The method should include documentation explaining parameters (r, mu, xi_knots) and return value.

Suggested change

def __call__(self, r, mu, xi_knots=None):

def __call__(self, r, mu, xi_knots=None):

"""

Evaluate the multipole expansion of the correlation function at given (r, mu) values.

Parameters

----------

r : array_like

Radial distances at which to evaluate the correlation function.

mu : array_like

Cosine of the angle to the line of sight for each r.

xi_knots : array_like, optional

Precomputed multipole coefficients to use for evaluation. If None, uses self.xi_knots_fit.

Returns

-------

results : ndarray

The evaluated correlation function at the specified (r, mu) values.

"""

Copilot · 2025-11-21T19:36:35Z

vega/xi_multipoles.py

+
+        return results
+
+    def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):


Missing docstring for the fit method. The method should include documentation for all parameters (rdata, mudata, xi_data, cov_data, mumax, nmc, seed, return_mcs) and return value, following the NumPy/SciPy docstring convention used elsewhere in the codebase.

Suggested change

def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):

def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):

"""

Fit the multipole model to the input data using least squares and optionally Monte Carlo sampling.

Parameters

----------

rdata : array_like

Array of radial distances of the data points.

mudata : array_like

Array of cosine of angles (mu) for the data points.

xi_data : array_like

Array of measured correlation function values at (rdata, mudata).

cov_data : array_like

Covariance matrix of the measured correlation function values.

mumax : float, optional

Maximum value of mu to include in the fit (default is 0.999).

nmc : int, optional

Number of Monte Carlo samples to draw for error estimation (default is 0).

seed : int, optional

Random seed for Monte Carlo sampling (default is 0).

return_mcs : bool, optional

If True, return the Monte Carlo samples of the fitted multipoles (default is False).

Returns

-------

chi2 : float

The chi-squared value of the fit.

xi_mcs : ndarray, optional

If `return_mcs` is True, returns the Monte Carlo samples of the fitted multipoles

as an array of shape (nmc, nl, nr).

"""

Copilot · 2025-11-21T19:36:35Z

vega/xi_multipoles.py

+            if d > 0:
+                n2 = min(self.nr - 1, n1 + 1)
+            else:
+                n2 = max(0, n1 - 1)
+
+            for ell in range(self.nl):
+                matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i])
+                matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i])


Logic error in interpolation weights: When d <= 0, the code sets n2 = max(0, n1 - 1) and then applies weight d to n2. However, when d = 0, the weight (1.0 - d) correctly gives full weight to n1, but the weight d (which is 0) is still applied to n2. When d < 0 (which shouldn't happen based on the calculation but isn't explicitly prevented), the logic becomes problematic. Consider handling the d == 0 case separately or ensuring d is always non-negative, or using abs(d) for the second weight.

Copilot · 2025-11-21T19:36:35Z

vega/xi_multipoles.py

+            for ell in range(self.nl):
+                matrix_A[i, n1 + ell * self.nr] += (1.0 - d) * self.leg_polys[ell](mumu[i])
+                matrix_A[i, n2 + ell * self.nr] += d * self.leg_polys[ell](mumu[i])


[nitpick] Performance concern: Legendre polynomials are evaluated inside a loop for each data point. Since self.leg_polys[ell] is a callable that computes the polynomial, calling it repeatedly for the same mu values (lines 69-70) can be inefficient. Consider pre-computing the Legendre polynomial values for all unique mumu values before the loop, especially if there are repeated values or if the dataset is large.

Copilot · 2025-11-21T19:36:36Z

vega/xi_multipoles.py

+class XiMultipoles():
+    @classmethod
+    def getOverfit(
+            cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8
+    ):
+        """ Find the nl that overfits the data such that chi2 < chi_c.
+        """


[nitpick] Magic number without explanation: The value 0.8 is used as a default for chi_c (chi-squared criterion) without documentation. Consider documenting what this threshold represents (e.g., reduced chi-squared target) or defining it as a named constant.

Suggested change

class XiMultipoles():

@classmethod

def getOverfit(

cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8

):

""" Find the nl that overfits the data such that chi2 < chi_c.

"""

# Default reduced chi-squared target for overfitting criterion

DEFAULT_CHI_SQUARED_CRITERION = 0.8

class XiMultipoles():

@classmethod

def getOverfit(

cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=DEFAULT_CHI_SQUARED_CRITERION

):

"""

Find the nl that overfits the data such that chi2 < chi_c.

Parameters

----------

chi_c : float

Reduced chi-squared target for overfitting criterion (default: 0.8).

"""

Copilot · 2025-11-21T19:36:37Z

vega/xi_multipoles.py

+
+        return xi_multipoles
+
+    def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):


Missing docstring for the __init__ method. Following the pattern in other classes in this codebase, the __init__ method should include a docstring documenting all parameters (r1, r2, nr, nl).

Suggested change

def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):

def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):

"""

Initialize the XiMultipoles object.

Parameters

----------

r1 : float, optional

Minimum value of the separation grid (default: 42.0).

r2 : float, optional

Maximum value of the separation grid (default: 178.0).

nr : int, optional

Number of separation grid points (default: 35).

nl : int, optional

Number of Legendre multipoles (default: 6).

"""

Copilot · 2025-11-21T19:36:37Z

vega/xi_multipoles.py

+        Nell = self.nr * self.nl
+        Nr = rr.size
+        matrix_A = np.zeros((Nr, Nell))
+        idx = ((rr - self.rmin) / self.dr).astype(int)


Potential bug: variable idx is computed but never used. Line 57 calculates idx = ((rr - self.rmin) / self.dr).astype(int) but this value is not referenced anywhere in the subsequent code. The actual indexing uses a different calculation starting at line 60 with variable d. Either this variable should be removed, or there's a logic error in how indices are calculated.

Suggested change

idx = ((rr - self.rmin) / self.dr).astype(int)

Copilot · 2025-11-21T19:36:37Z

vega/xi_multipoles.py

+class XiMultipoles():
+    @classmethod
+    def getOverfit(
+            cls, rdata, mudata, xi_data, cov_data, mumax=0.999, nlmin=6, nlmax=20, chi_c=0.8


[nitpick] Magic number without explanation: The value 0.999 is used as a default for mumax without documentation. Consider either documenting why this specific value is chosen in the method docstring, or defining it as a named constant with an explanatory comment (e.g., DEFAULT_MU_MAX = 0.999 # Exclude extreme mu values close to 1).

Copilot · 2025-11-21T19:36:37Z

vega/xi_multipoles.py

+        """
+        for nl in range(nlmin, nlmax):
+            xi_multipoles = cls(nl=nl)
+            chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)
+            if chi2 < chi_c:
+                print("Converged for nl=", nl)
+                break
+


Missing error handling: If the loop completes without finding an nl that satisfies chi2 < chi_c, the method returns xi_multipoles which would be undefined (from the last iteration where the condition wasn't met). This could lead to returning a poorly-fitted model without warning. Consider either raising an exception or logging a warning if convergence is not achieved within the range [nlmin, nlmax].

Suggested change

"""

for nl in range(nlmin, nlmax):

xi_multipoles = cls(nl=nl)

chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)

if chi2 < chi_c:

print("Converged for nl=", nl)

break

"""

converged = False

for nl in range(nlmin, nlmax):

xi_multipoles = cls(nl=nl)

chi2 = xi_multipoles.fit(rdata, mudata, xi_data, cov_data, mumax=mumax)

if chi2 < chi_c:

print("Converged for nl=", nl)

converged = True

break

if not converged:

raise RuntimeError(

f"getOverfit: No nl in range [{nlmin}, {nlmax}) found with chi2 < {chi_c}. "

f"Last chi2={chi2} for nl={nl}."

)

Copilot · 2025-11-21T19:36:38Z

vega/xi_multipoles.py

+            np.zeros(rdata.size), cov=cov_data, size=nmc
+        )[:, w]


Potential memory issue: The Monte Carlo simulation generates nmc samples of size rdata.size, which could be memory-intensive for large datasets. The random samples are generated with full size rdata.size and then filtered with [:, w], which is inefficient. Consider generating the random samples only for the filtered data points to reduce memory usage: rng.multivariate_normal(np.zeros(w.sum()), cov=cov_in, size=nmc).

Suggested change

np.zeros(rdata.size), cov=cov_data, size=nmc

)[:, w]

np.zeros(w.sum()), cov=cov_in, size=nmc

)

p-slash added 3 commits February 20, 2025 15:25

Create xi_multipoles.py to compress original auto-cf to its multipoles

ae8bac6

fix: import scipy minimize

cccc2b5

fix: tqdm is not part of vega's requirements.

a57b435

p-slash added 3 commits February 21, 2025 14:48

classmethod to find overfitting multipoles. mumax cut remove los bins.

e93563c

return chi2 from fitting function

345e0dd

remove fit_mini function

f0c3207

andreicuceu requested a review from Copilot November 21, 2025 19:32

Copilot started reviewing on behalf of andreicuceu November 21, 2025 19:32 View session

Copilot finished reviewing on behalf of andreicuceu November 21, 2025 19:34

Copilot AI reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create xi_multipoles.py to compress original auto-cf to its multipoles#140

Create xi_multipoles.py to compress original auto-cf to its multipoles#140
p-slash wants to merge 6 commits intomasterfrom
xi-multipoles

p-slash commented Feb 20, 2025

Uh oh!

codecov bot commented Feb 20, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    def __call__(self, r, mu, xi_knots=None):
+    def __call__(self, r, mu, xi_knots=None):
+        """
+        Evaluate the multipole expansion of the correlation function at given (r, mu) values.
+        Parameters
+        ----------
+        r : array_like
+            Radial distances at which to evaluate the correlation function.
+        mu : array_like
+            Cosine of the angle to the line of sight for each r.
+        xi_knots : array_like, optional
+            Precomputed multipole coefficients to use for evaluation. If None, uses self.xi_knots_fit.
+        Returns
+        -------
+        results : ndarray
+            The evaluated correlation function at the specified (r, mu) values.
+        """


		return results

		def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):

-    def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):
+    def fit(self, rdata, mudata, xi_data, cov_data, mumax=0.999, nmc=0, seed=0, return_mcs=False):
+        """
+        Fit the multipole model to the input data using least squares and optionally Monte Carlo sampling.
+        Parameters
+        ----------
+        rdata : array_like
+            Array of radial distances of the data points.
+        mudata : array_like
+            Array of cosine of angles (mu) for the data points.
+        xi_data : array_like
+            Array of measured correlation function values at (rdata, mudata).
+        cov_data : array_like
+            Covariance matrix of the measured correlation function values.
+        mumax : float, optional
+            Maximum value of mu to include in the fit (default is 0.999).
+        nmc : int, optional
+            Number of Monte Carlo samples to draw for error estimation (default is 0).
+        seed : int, optional
+            Random seed for Monte Carlo sampling (default is 0).
+        return_mcs : bool, optional
+            If True, return the Monte Carlo samples of the fitted multipoles (default is False).
+        Returns
+        -------
+        chi2 : float
+            The chi-squared value of the fit.
+        xi_mcs : ndarray, optional
+            If `return_mcs` is True, returns the Monte Carlo samples of the fitted multipoles
+            as an array of shape (nmc, nl, nr).
+        """


		return xi_multipoles

		def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):

-    def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):
+    def __init__(self, r1=42.0, r2=178.0, nr=35, nl=6):
+        """
+        Initialize the XiMultipoles object.
+        Parameters
+        ----------
+        r1 : float, optional
+            Minimum value of the separation grid (default: 42.0).
+        r2 : float, optional
+            Maximum value of the separation grid (default: 178.0).
+        nr : int, optional
+            Number of separation grid points (default: 35).
+        nl : int, optional
+            Number of Legendre multipoles (default: 6).
+        """

Conversation

p-slash commented Feb 20, 2025

Uh oh!

codecov bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 20, 2025 •

edited

Loading