Skip to content

Introducing group to curve analysis #715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nkanazawa1989
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 commented Mar 4, 2022

Summary

The main purpose of this PR is introduction of group in the CurveAnalysis. Now the data model of the fitting has following structure.

- Group: This is top level component of the fitting. If an analysis defines
  multiple groups, it performs multiple independent optimization
  and generates results for every optimization group.

- Series: This is a collection of curves to form a multi-objective optimization task.
  The fit entries in the same series share the fit parameters,
  and multiple experimental results are simultaneously fit to generate a single fit result.

- Curve: This is a single entry of analysis. Every curve may take unique filter keywords
  to extract corresponding (x, y) data from the whole experimental results,
  along with the callback function used for the curve fitting.

In addition, module documentation and unittest are overhauled to increase the coverage of the logic.

Details and comments

This refactoring is necessary to fix performance issue of CR Hamiltonian and complexity in HEAT experiment implementation in #625 . The common feature of these experiments is performing the same curve analysis with different control qubit state. Thus estimated parameters may be different though they have the same function shape. However, current CurveAnalysis only supports multi-objective optimization. This means, if a single fit model j has parameters of (p0j, p1j, p2j), the fit model that the curve analysis can handle is F(p00, p10, p20, p01, p11, p21). Since init guess P0j = {P0_j_n} may different for each j, this should repeat fitting for {P0_0_0, P0_1_0}, {P0_0_0, P0_1_1}, ... ~ O(N^2) which is quite inefficient.

To overcome this, we should implement each experiment j as batch experiment however still this is really heavy coding overhead. You can check my draft code here https://github.com/nkanazawa1989/qiskit-experiments/pull/7/files. This is working, but you need to implement (1) single child experiment (2) child analysis (3) batch experiment wrapper (3) composite analysis wrapper. The same thing happened to the current HEAT implementation. This pattern is not quite limited to these two experiments, but we will see more in future since this is very conventional in two qubit calibrations and characterization.

This complexity will be drastically alleviated by supporting multi-group fitting, i.e.

__series__ = [
    SeriesDef(fit_func=myfunc(x, p00, p10, p20), group="1"),
    SeriesDef(fit_func=myfunc(x, p10, p11, p21), group="2"),
]

There will be no breaking API change as you can see there is no unittest modification except for CurveAnalysis itself``

Planned future refactoring:

  • remove dependency on experiment metadata
  • more tight coupling of drawing method, e.g. CueveDrawingMixIn.
  • removal of data extraction and offloading to data processor, e.g. CurveDataProcessor involving post data processing and filtering.

This PR add new feature
- CurveAnalysis.curve_fit: this is class method, user can directly call fit function
- CompositeFitFunction: a function-like object for fitting

In addition unittest of curve analysis and class documentation are overhauled.

Following is deprecated
- CurveAnalysis.options.curve_fitter: This object is not serializable
The composite_func is no longer class attribute. It is converted into protected member and replaced with property method. This property method returns a copy of function to avoid conflict in the multithread execution. The parameter ordering is also fixed to match with original fit function signature.
Copy link
Collaborator

@yaelbh yaelbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code I wrote one long comment about an alternative solution, which does not require groups nor batch experiments, and uses the existing curve analysis class in a simple way. Let me know if there's a problem with this solution. If not then I don't know - it's hard for me to tell - whether introducing groups is more conveneient to the user.

That long comment also discusses the difference in parallelization between the three options (batch/group/alternative). Please carefully read the comment. Whether my alternative solution is correct or now, it still remains to compare batch with group with regard to parallelization.

Additional comments:

  1. This is an important comment: the diffs are rendered in a way that makes it impossible to review most of the PR. Do you have an idea how to overcome it? Because of this I didn't check most of the changes.
  2. All the grammar fixes - I'm not an English expert so some of them are probably wrong.
  3. The long explanation is excellent. After this PR it will be great if you can write a tutorial about curve analysis. The tutorial can consist of the very same text, accompanied with a concrete example.
  4. I think it will improve code & review quality if you split to several PRs. For example one PR that introduces composite functions, followed by a PR that introduces fixed parameters, following by a PR that introduces groups. I know that splitting is a lot of tedious work, and I won't insist on it, but be aware of the consequences of not splitting.

nkanazawa1989 and others added 2 commits March 7, 2022 09:32
If there are multiple groups, the second group may have curve indices not starting from zero. Explicit index mapping is added to the CompositeFitFunction constructor. In addition, data filtering for group is added to the curve analysis.
Co-authored-by: Yael Ben-Haim <[email protected]>
@nkanazawa1989 nkanazawa1989 force-pushed the upgrade/curve_analysis_with_group_fit branch from 62aa67e to 4f73766 Compare March 7, 2022 00:50
@nkanazawa1989
Copy link
Collaborator Author

nkanazawa1989 commented Mar 7, 2022

Thanks @yaelbh for careful reading of docs and suggestions.

I understand your suggestion might be good for performance, since curve fit for each group can run on separate thread (indeed this was the approach I implemented in the draft PR above).

However, this requires us to write/manage many analysis classes. Usually curves belong to different group has unique filter_kwargs to extract corresponding (x, y) data from the entire experiment results, i.e. they take different experiment data. Since __series__ is class attribute, you cannot override it at run time (i.e. it causes conflict issue in multithread since it overrides values of all instances). This means you need to define separate analysis class for group1 and group2, also a subclass of new batch analysis (which you suggest) to implement a logic to compute new quantity from fit result. In the case of CrossResonanceHamiltonianAnalysis, the filter_kwargs is something like

filter_kwargs={"control_state": 0, "meas_basis": "y"}

and we need this extra logic to compute Hamiltonian coefficients from fit data (fit params are discarded once these values are computed)
https://github.com/Qiskit/qiskit-experiments/blob/be322e3a8d709613716232bec4aabbaeb1ec97b3/qiskit_experiments/library/characterization/analysis/cr_hamiltonian_analysis.py#L297-L321

Managing three analysis classes for a single experiment is indeed a heavy code management overhead. Note that the CR Hamiltonian analysis can be written very simply with group. See this for complement. With my local tox test, the execution time is drastically reduced from >100s to <10s.

https://github.com/Qiskit/qiskit-experiments/pull/718/files#diff-13eb266502b0862a86f0eec5ada014f4d95fa057b3ce598df37cce1a2c8cc8c7

FYI (on the CI)
image

This is an important comment: the diffs are rendered in a way that makes it impossible to review most of the PR. Do you have an idea how to overcome it? Because of this I didn't check most of the changes.

I agree the diff is not quite smart. Unfortunately I don't have good hack for reviewing. Usually I try to carefully understand the new logic and check if the test covers all edge cases.

All the grammar fixes - I'm not an English expert so some of them are probably wrong.

Thanks these look great. Me neither (you know it!) so hopefully other can review too.

The long explanation is excellent. After this PR it will be great if you can write a tutorial about curve analysis. The tutorial can consist of the very same text, accompanied with a concrete example.

Yes, indeed this is intended. We can move this to new rst file once we prepare the developer tutorials.

I think it will improve code & review quality if you split to several PRs. For example one PR that introduces composite functions, followed by a PR that introduces fixed parameters, following by a PR that introduces groups. I know that splitting is a lot of tedious work, and I won't insist on it, but be aware of the consequences of not splitting.

The parameter fix is already existing feature so cannot be split. Probably CompositeFitFunction can be split from this PR but it takes group in the constructor so PR may look weird. Feel free to ping me on slack or whatever if you think talking offline would help to understand new logic.

@nkanazawa1989 nkanazawa1989 force-pushed the upgrade/curve_analysis_with_group_fit branch from 5f6d837 to 6a8b9c0 Compare March 7, 2022 04:59
Copy link
Collaborator

@yaelbh yaelbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is fine. I'm not approving only to give space to the other reviewers to submit their feedback.

def __post_init__(self):
"""Implicitly parse fit function signature for fit function."""
# The first argument is x, which is not a fit parameter
sig = list(inspect.signature(self.fit_func).parameters.keys())[1:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to modify the line in the code to something more readable. Shorter is not necessarily simpler, it's usually the opposite. Another option is to add inline comments which describe the data structure in each step, and what each step does exactly.

Copy link
Contributor

@eggerdj eggerdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that the logic of curve analysis is becoming too complex because it is trying to accommodate many different cases. Would it help to have a GroupedCurveAnalysis class which inherits from CurveAnalysis instead?

Overview
========

The base class :class:`CurveAnalysis` supports multi-objective optimization on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific here? More specifically to define what an object is in the context of curve fitting. The term multi-object optimization is rather generic. Would it help to add some math notation? I would suggest something along the following lines.

The base class :class:`CurveAnalysis` supports multi-objective optimization on different sets of 
experiment results, and you can also define multiple independent optimization tasks in the same class.
More specifically :class:`CurveAnalysis` can fit multiple groups :math:`G_i=\{y_{i1}(x), y_{i2}(x), ...\}` 
of several  series of data :math:`y_{ij}(x)`. Here, a data series :math:`y_{ij}(x)` represents a single 
curve that depends on :math:`x`, called the x-value, i.e. :code:`xval`. The series in the group :math:`G_i` are
fit to a common function :math:`f_i(x, \boldsymbol{a}_i)` where :math:`\boldsymbol{a}_i` are the
fit parameters of group i.

Comment on lines +33 to +35
- Group: This is top level component of the fitting. If an analysis defines
multiple groups, it performs multiple independent optimizations
and generates results for every optimization group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered having a code structure where the group is implemented by a class? E.g.

class GroupedCurveAnalysis(CurveAnalysis)

Would something like that work/be useful? It might make the code easier to digest.

Comment on lines +37 to +39
- Series: This is a collection of curves to form a multi-objective optimization task.
The fit entries in the same series share the fit parameters,
and multiple experimental results are simultaneously fit to generate a single fit result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think that some notation would help make this easier to digest.

Comment on lines +45 to +46
To manage this structure, curve analysis provides a special dataclass :class:`SeriesDef`
that represents an optimization configuration for a single curve data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does group fit into this?

model_description="p0 * exp(-p1 * x) + p2",
)

The minimum field you must fill with is the ``fit_func``, which is a callback function used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The minimum field you must fill with is the ``fit_func``, which is a callback function used
The minimum field you must provide is ``fit_func``. It is the callback function used

Comment on lines +127 to +134
# Let's keep order of parameters rather than using set, though code is bit messy.
# It is better to match composite function signature with the func in series definition.
fit_args = []
for func in composite_funcs:
for param in func.signature:
if param not in fit_args:
fit_args.append(param)
cls._fit_params = fit_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

"different function signature. They should receive "
"the same parameter set for multi-objective function fit."
"CurveAnalysis subclass requires CompositeFitFunction instance to perform fitting. "
"Standard callback function is not acceptable due to missing signature metadata."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit more explanation. What is this missing metadata issue?

Comment on lines +191 to +193
lower = [bounds[p][0] for p in func.signature]
upper = [bounds[p][1] for p in func.signature]
scipy_bounds = (lower, upper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a method of func? E.g. func.format_scipy_bounds(bounds)?

def __post_init__(self):
"""Implicitly parse fit function signature for fit function."""
# The first argument is x, which is not a fit parameter
sig = list(inspect.signature(self.fit_func).parameters.keys())[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit of having a pythonic one-liner when you then need seven lines of comments to explain it.


@property
def data_index(self) -> np.ndarray:
"""Return current data index mapping."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

@nkanazawa1989 nkanazawa1989 mentioned this pull request Apr 6, 2022
4 tasks
@nkanazawa1989 nkanazawa1989 deleted the upgrade/curve_analysis_with_group_fit branch October 27, 2022 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants