Introducing `group` to curve analysis #715

nkanazawa1989 · 2022-03-04T19:33:50Z

Summary

The main purpose of this PR is introduction of group in the CurveAnalysis. Now the data model of the fitting has following structure.

- Group: This is top level component of the fitting. If an analysis defines
  multiple groups, it performs multiple independent optimization
  and generates results for every optimization group.

- Series: This is a collection of curves to form a multi-objective optimization task.
  The fit entries in the same series share the fit parameters,
  and multiple experimental results are simultaneously fit to generate a single fit result.

- Curve: This is a single entry of analysis. Every curve may take unique filter keywords
  to extract corresponding (x, y) data from the whole experimental results,
  along with the callback function used for the curve fitting.

In addition, module documentation and unittest are overhauled to increase the coverage of the logic.

Details and comments

This refactoring is necessary to fix performance issue of CR Hamiltonian and complexity in HEAT experiment implementation in #625 . The common feature of these experiments is performing the same curve analysis with different control qubit state. Thus estimated parameters may be different though they have the same function shape. However, current CurveAnalysis only supports multi-objective optimization. This means, if a single fit model j has parameters of (p0j, p1j, p2j), the fit model that the curve analysis can handle is F(p00, p10, p20, p01, p11, p21). Since init guess P0j = {P0_j_n} may different for each j, this should repeat fitting for {P0_0_0, P0_1_0}, {P0_0_0, P0_1_1}, ... ~ O(N^2) which is quite inefficient.

To overcome this, we should implement each experiment j as batch experiment however still this is really heavy coding overhead. You can check my draft code here https://github.com/nkanazawa1989/qiskit-experiments/pull/7/files. This is working, but you need to implement (1) single child experiment (2) child analysis (3) batch experiment wrapper (3) composite analysis wrapper. The same thing happened to the current HEAT implementation. This pattern is not quite limited to these two experiments, but we will see more in future since this is very conventional in two qubit calibrations and characterization.

This complexity will be drastically alleviated by supporting multi-group fitting, i.e.

__series__ = [
    SeriesDef(fit_func=myfunc(x, p00, p10, p20), group="1"),
    SeriesDef(fit_func=myfunc(x, p10, p11, p21), group="2"),
]

There will be no breaking API change as you can see there is no unittest modification except for CurveAnalysis itself``

Planned future refactoring:

remove dependency on experiment metadata
more tight coupling of drawing method, e.g. CueveDrawingMixIn.
removal of data extraction and offloading to data processor, e.g. CurveDataProcessor involving post data processing and filtering.

This PR add new feature - CurveAnalysis.curve_fit: this is class method, user can directly call fit function - CompositeFitFunction: a function-like object for fitting In addition unittest of curve analysis and class documentation are overhauled. Following is deprecated - CurveAnalysis.options.curve_fitter: This object is not serializable

…ade/curve_analysis_with_group_fit

The composite_func is no longer class attribute. It is converted into protected member and replaced with property method. This property method returns a copy of function to avoid conflict in the multithread execution. The parameter ordering is also fixed to match with original fit function signature.

yaelbh

In the code I wrote one long comment about an alternative solution, which does not require groups nor batch experiments, and uses the existing curve analysis class in a simple way. Let me know if there's a problem with this solution. If not then I don't know - it's hard for me to tell - whether introducing groups is more conveneient to the user.

That long comment also discusses the difference in parallelization between the three options (batch/group/alternative). Please carefully read the comment. Whether my alternative solution is correct or now, it still remains to compare batch with group with regard to parallelization.

Additional comments:

This is an important comment: the diffs are rendered in a way that makes it impossible to review most of the PR. Do you have an idea how to overcome it? Because of this I didn't check most of the changes.
All the grammar fixes - I'm not an English expert so some of them are probably wrong.
The long explanation is excellent. After this PR it will be great if you can write a tutorial about curve analysis. The tutorial can consist of the very same text, accompanied with a concrete example.
I think it will improve code & review quality if you split to several PRs. For example one PR that introduces composite functions, followed by a PR that introduces fixed parameters, following by a PR that introduces groups. I know that splitting is a lot of tedious work, and I won't insist on it, but be aware of the consequences of not splitting.

qiskit_experiments/curve_analysis/__init__.py

qiskit_experiments/curve_analysis/curve_data.py

If there are multiple groups, the second group may have curve indices not starting from zero. Explicit index mapping is added to the CompositeFitFunction constructor. In addition, data filtering for group is added to the curve analysis.

Co-authored-by: Yael Ben-Haim <[email protected]>

nkanazawa1989 · 2022-03-07T01:57:48Z

Thanks @yaelbh for careful reading of docs and suggestions.

I understand your suggestion might be good for performance, since curve fit for each group can run on separate thread (indeed this was the approach I implemented in the draft PR above).

However, this requires us to write/manage many analysis classes. Usually curves belong to different group has unique filter_kwargs to extract corresponding (x, y) data from the entire experiment results, i.e. they take different experiment data. Since __series__ is class attribute, you cannot override it at run time (i.e. it causes conflict issue in multithread since it overrides values of all instances). This means you need to define separate analysis class for group1 and group2, also a subclass of new batch analysis (which you suggest) to implement a logic to compute new quantity from fit result. In the case of CrossResonanceHamiltonianAnalysis, the filter_kwargs is something like

filter_kwargs={"control_state": 0, "meas_basis": "y"}

and we need this extra logic to compute Hamiltonian coefficients from fit data (fit params are discarded once these values are computed)
https://github.com/Qiskit/qiskit-experiments/blob/be322e3a8d709613716232bec4aabbaeb1ec97b3/qiskit_experiments/library/characterization/analysis/cr_hamiltonian_analysis.py#L297-L321

Managing three analysis classes for a single experiment is indeed a heavy code management overhead. Note that the CR Hamiltonian analysis can be written very simply with group. See this for complement. With my local tox test, the execution time is drastically reduced from >100s to <10s.

https://github.com/Qiskit/qiskit-experiments/pull/718/files#diff-13eb266502b0862a86f0eec5ada014f4d95fa057b3ce598df37cce1a2c8cc8c7

FYI (on the CI)

This is an important comment: the diffs are rendered in a way that makes it impossible to review most of the PR. Do you have an idea how to overcome it? Because of this I didn't check most of the changes.

I agree the diff is not quite smart. Unfortunately I don't have good hack for reviewing. Usually I try to carefully understand the new logic and check if the test covers all edge cases.

All the grammar fixes - I'm not an English expert so some of them are probably wrong.

Thanks these look great. Me neither (you know it!) so hopefully other can review too.

The long explanation is excellent. After this PR it will be great if you can write a tutorial about curve analysis. The tutorial can consist of the very same text, accompanied with a concrete example.

Yes, indeed this is intended. We can move this to new rst file once we prepare the developer tutorials.

I think it will improve code & review quality if you split to several PRs. For example one PR that introduces composite functions, followed by a PR that introduces fixed parameters, following by a PR that introduces groups. I know that splitting is a lot of tedious work, and I won't insist on it, but be aware of the consequences of not splitting.

The parameter fix is already existing feature so cannot be split. Probably CompositeFitFunction can be split from this PR but it takes group in the constructor so PR may look weird. Feel free to ping me on slack or whatever if you think talking offline would help to understand new logic.

yaelbh

I think the PR is fine. I'm not approving only to give space to the other reviewers to submit their feedback.

yaelbh · 2022-03-07T11:10:26Z

qiskit_experiments/curve_analysis/curve_data.py

+    def __post_init__(self):
+        """Implicitly parse fit function signature for fit function."""
+        # The first argument is x, which is not a fit parameter
+        sig = list(inspect.signature(self.fit_func).parameters.keys())[1:]


I think it's better to modify the line in the code to something more readable. Shorter is not necessarily simpler, it's usually the opposite. Another option is to add inline comments which describe the data structure in each step, and what each step does exactly.

eggerdj

I'm a bit concerned that the logic of curve analysis is becoming too complex because it is trying to accommodate many different cases. Would it help to have a GroupedCurveAnalysis class which inherits from CurveAnalysis instead?

eggerdj · 2022-03-08T10:09:08Z

qiskit_experiments/curve_analysis/__init__.py

+Overview
+========
+
+The base class :class:`CurveAnalysis` supports multi-objective optimization on


Can we be more specific here? More specifically to define what an object is in the context of curve fitting. The term multi-object optimization is rather generic. Would it help to add some math notation? I would suggest something along the following lines.

The base class :class:`CurveAnalysis` supports multi-objective optimization on different sets of experiment results, and you can also define multiple independent optimization tasks in the same class. More specifically :class:`CurveAnalysis` can fit multiple groups :math:`G_i=\{y_{i1}(x), y_{i2}(x), ...\}` of several series of data :math:`y_{ij}(x)`. Here, a data series :math:`y_{ij}(x)` represents a single curve that depends on :math:`x`, called the x-value, i.e. :code:`xval`. The series in the group :math:`G_i` are fit to a common function :math:`f_i(x, \boldsymbol{a}_i)` where :math:`\boldsymbol{a}_i` are the fit parameters of group i.

eggerdj · 2022-03-08T10:11:23Z

qiskit_experiments/curve_analysis/__init__.py

+- Group: This is top level component of the fitting. If an analysis defines
+  multiple groups, it performs multiple independent optimizations
+  and generates results for every optimization group.


Have you considered having a code structure where the group is implemented by a class? E.g.

class GroupedCurveAnalysis(CurveAnalysis)

Would something like that work/be useful? It might make the code easier to digest.

eggerdj · 2022-03-08T10:12:44Z

qiskit_experiments/curve_analysis/__init__.py

+- Series: This is a collection of curves to form a multi-objective optimization task.
+  The fit entries in the same series share the fit parameters,
+  and multiple experimental results are simultaneously fit to generate a single fit result.


I really think that some notation would help make this easier to digest.

eggerdj · 2022-03-08T10:13:38Z

qiskit_experiments/curve_analysis/__init__.py

+To manage this structure, curve analysis provides a special dataclass :class:`SeriesDef`
+that represents an optimization configuration for a single curve data.


How does group fit into this?

eggerdj · 2022-03-08T10:16:40Z

qiskit_experiments/curve_analysis/__init__.py

+        model_description="p0 * exp(-p1 * x) + p2",
+    )
+
+The minimum field you must fill with is the ``fit_func``, which is a callback function used


Suggested change

The minimum field you must fill with is the ``fit_func``, which is a callback function used

The minimum field you must provide is ``fit_func``. It is the callback function used

eggerdj · 2022-03-08T10:48:24Z

qiskit_experiments/curve_analysis/curve_analysis.py

+        # Let's keep order of parameters rather than using set, though code is bit messy.
+        # It is better to match composite function signature with the func in series definition.
+        fit_args = []
+        for func in composite_funcs:
+            for param in func.signature:
+                if param not in fit_args:
+                    fit_args.append(param)
+        cls._fit_params = fit_args


why is this needed?

eggerdj · 2022-03-08T10:49:44Z

qiskit_experiments/curve_analysis/curve_analysis.py

-                "different function signature. They should receive "
-                "the same parameter set for multi-objective function fit."
+                "CurveAnalysis subclass requires CompositeFitFunction instance to perform fitting. "
+                "Standard callback function is not acceptable due to missing signature metadata."


This needs a bit more explanation. What is this missing metadata issue?

eggerdj · 2022-03-08T10:51:15Z

qiskit_experiments/curve_analysis/curve_analysis.py

+        lower = [bounds[p][0] for p in func.signature]
+        upper = [bounds[p][1] for p in func.signature]
+        scipy_bounds = (lower, upper)


Should this be a method of func? E.g. func.format_scipy_bounds(bounds)?

eggerdj · 2022-03-08T10:59:33Z

qiskit_experiments/curve_analysis/curve_data.py

+    def __post_init__(self):
+        """Implicitly parse fit function signature for fit function."""
+        # The first argument is x, which is not a fit parameter
+        sig = list(inspect.signature(self.fit_func).parameters.keys())[1:]


I don't see the benefit of having a pythonic one-liner when you then need seven lines of comments to explain it.

eggerdj · 2022-03-08T11:03:29Z

qiskit_experiments/curve_analysis/curve_data.py

+
+    @property
+    def data_index(self) -> np.ndarray:
+        """Return current data index mapping."""


what is this?

nkanazawa1989 added 3 commits March 4, 2022 07:54

Merge branch 'main' of github.com:Qiskit/qiskit-experiments into upgr…

f40e02c

…ade/curve_analysis_with_group_fit

nkanazawa1989 requested review from eggerdj, chriseclectic, yaelbh and wshanks March 4, 2022 19:33

yaelbh reviewed Mar 6, 2022

View reviewed changes

nkanazawa1989 and others added 2 commits March 7, 2022 09:32

Fix bugs

a25043d

If there are multiple groups, the second group may have curve indices not starting from zero. Explicit index mapping is added to the CompositeFitFunction constructor. In addition, data filtering for group is added to the curve analysis.

Documentation fixes

4f73766

Co-authored-by: Yael Ben-Haim <[email protected]>

nkanazawa1989 force-pushed the upgrade/curve_analysis_with_group_fit branch from 62aa67e to 4f73766 Compare March 7, 2022 00:50

add mod docs

2fc0d18

nkanazawa1989 mentioned this pull request Mar 7, 2022

CR Hamiltonian tomo with group fit #718

Closed

lint fix

6a8b9c0

nkanazawa1989 force-pushed the upgrade/curve_analysis_with_group_fit branch from 5f6d837 to 6a8b9c0 Compare March 7, 2022 04:59

yaelbh reviewed Mar 7, 2022

View reviewed changes

mode docs and comments

2e7d283

eggerdj suggested changes Mar 8, 2022

View reviewed changes

nkanazawa1989 mentioned this pull request Mar 8, 2022

Add fit model to CurveAnalysis #726

Closed

nkanazawa1989 mentioned this pull request Apr 6, 2022

CurveAnalysis base class #765

Merged

4 tasks

nkanazawa1989 closed this Jun 9, 2022

nkanazawa1989 deleted the upgrade/curve_analysis_with_group_fit branch October 27, 2022 06:57

		To manage this structure, curve analysis provides a special dataclass :class:`SeriesDef`
		that represents an optimization configuration for a single curve data.

	The minimum field you must fill with is the ``fit_func``, which is a callback function used
	The minimum field you must provide is ``fit_func``. It is the callback function used

Introducing group to curve analysis #715

Introducing group to curve analysis #715

Uh oh!

Conversation

nkanazawa1989 commented Mar 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

yaelbh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanazawa1989 commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaelbh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eggerdj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Introducing `group` to curve analysis #715

Introducing `group` to curve analysis #715

nkanazawa1989 commented Mar 4, 2022 •

edited

Loading

nkanazawa1989 commented Mar 7, 2022 •

edited

Loading