-
Notifications
You must be signed in to change notification settings - Fork 131
Introducing group
to curve analysis
#715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing group
to curve analysis
#715
Conversation
This PR add new feature - CurveAnalysis.curve_fit: this is class method, user can directly call fit function - CompositeFitFunction: a function-like object for fitting In addition unittest of curve analysis and class documentation are overhauled. Following is deprecated - CurveAnalysis.options.curve_fitter: This object is not serializable
…ade/curve_analysis_with_group_fit
The composite_func is no longer class attribute. It is converted into protected member and replaced with property method. This property method returns a copy of function to avoid conflict in the multithread execution. The parameter ordering is also fixed to match with original fit function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code I wrote one long comment about an alternative solution, which does not require groups nor batch experiments, and uses the existing curve analysis class in a simple way. Let me know if there's a problem with this solution. If not then I don't know - it's hard for me to tell - whether introducing groups is more conveneient to the user.
That long comment also discusses the difference in parallelization between the three options (batch/group/alternative). Please carefully read the comment. Whether my alternative solution is correct or now, it still remains to compare batch with group with regard to parallelization.
Additional comments:
- This is an important comment: the diffs are rendered in a way that makes it impossible to review most of the PR. Do you have an idea how to overcome it? Because of this I didn't check most of the changes.
- All the grammar fixes - I'm not an English expert so some of them are probably wrong.
- The long explanation is excellent. After this PR it will be great if you can write a tutorial about curve analysis. The tutorial can consist of the very same text, accompanied with a concrete example.
- I think it will improve code & review quality if you split to several PRs. For example one PR that introduces composite functions, followed by a PR that introduces fixed parameters, following by a PR that introduces groups. I know that splitting is a lot of tedious work, and I won't insist on it, but be aware of the consequences of not splitting.
Co-authored-by: Yael Ben-Haim <[email protected]>
62aa67e
to
4f73766
Compare
Thanks @yaelbh for careful reading of docs and suggestions. I understand your suggestion might be good for performance, since curve fit for each group can run on separate thread (indeed this was the approach I implemented in the draft PR above). However, this requires us to write/manage many analysis classes. Usually curves belong to different group has unique filter_kwargs={"control_state": 0, "meas_basis": "y"} and we need this extra logic to compute Hamiltonian coefficients from fit data (fit params are discarded once these values are computed) Managing three analysis classes for a single experiment is indeed a heavy code management overhead. Note that the CR Hamiltonian analysis can be written very simply with
I agree the diff is not quite smart. Unfortunately I don't have good hack for reviewing. Usually I try to carefully understand the new logic and check if the test covers all edge cases.
Thanks these look great. Me neither (you know it!) so hopefully other can review too.
Yes, indeed this is intended. We can move this to new rst file once we prepare the developer tutorials.
The parameter fix is already existing feature so cannot be split. Probably |
5f6d837
to
6a8b9c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is fine. I'm not approving only to give space to the other reviewers to submit their feedback.
def __post_init__(self): | ||
"""Implicitly parse fit function signature for fit function.""" | ||
# The first argument is x, which is not a fit parameter | ||
sig = list(inspect.signature(self.fit_func).parameters.keys())[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to modify the line in the code to something more readable. Shorter is not necessarily simpler, it's usually the opposite. Another option is to add inline comments which describe the data structure in each step, and what each step does exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that the logic of curve analysis is becoming too complex because it is trying to accommodate many different cases. Would it help to have a GroupedCurveAnalysis
class which inherits from CurveAnalysis
instead?
Overview | ||
======== | ||
|
||
The base class :class:`CurveAnalysis` supports multi-objective optimization on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific here? More specifically to define what an object is in the context of curve fitting. The term multi-object optimization
is rather generic. Would it help to add some math notation? I would suggest something along the following lines.
The base class :class:`CurveAnalysis` supports multi-objective optimization on different sets of
experiment results, and you can also define multiple independent optimization tasks in the same class.
More specifically :class:`CurveAnalysis` can fit multiple groups :math:`G_i=\{y_{i1}(x), y_{i2}(x), ...\}`
of several series of data :math:`y_{ij}(x)`. Here, a data series :math:`y_{ij}(x)` represents a single
curve that depends on :math:`x`, called the x-value, i.e. :code:`xval`. The series in the group :math:`G_i` are
fit to a common function :math:`f_i(x, \boldsymbol{a}_i)` where :math:`\boldsymbol{a}_i` are the
fit parameters of group i.
- Group: This is top level component of the fitting. If an analysis defines | ||
multiple groups, it performs multiple independent optimizations | ||
and generates results for every optimization group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered having a code structure where the group is implemented by a class? E.g.
class GroupedCurveAnalysis(CurveAnalysis)
Would something like that work/be useful? It might make the code easier to digest.
- Series: This is a collection of curves to form a multi-objective optimization task. | ||
The fit entries in the same series share the fit parameters, | ||
and multiple experimental results are simultaneously fit to generate a single fit result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think that some notation would help make this easier to digest.
To manage this structure, curve analysis provides a special dataclass :class:`SeriesDef` | ||
that represents an optimization configuration for a single curve data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does group
fit into this?
model_description="p0 * exp(-p1 * x) + p2", | ||
) | ||
|
||
The minimum field you must fill with is the ``fit_func``, which is a callback function used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum field you must fill with is the ``fit_func``, which is a callback function used | |
The minimum field you must provide is ``fit_func``. It is the callback function used |
# Let's keep order of parameters rather than using set, though code is bit messy. | ||
# It is better to match composite function signature with the func in series definition. | ||
fit_args = [] | ||
for func in composite_funcs: | ||
for param in func.signature: | ||
if param not in fit_args: | ||
fit_args.append(param) | ||
cls._fit_params = fit_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
"different function signature. They should receive " | ||
"the same parameter set for multi-objective function fit." | ||
"CurveAnalysis subclass requires CompositeFitFunction instance to perform fitting. " | ||
"Standard callback function is not acceptable due to missing signature metadata." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a bit more explanation. What is this missing metadata issue?
lower = [bounds[p][0] for p in func.signature] | ||
upper = [bounds[p][1] for p in func.signature] | ||
scipy_bounds = (lower, upper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a method of func
? E.g. func.format_scipy_bounds(bounds)
?
def __post_init__(self): | ||
"""Implicitly parse fit function signature for fit function.""" | ||
# The first argument is x, which is not a fit parameter | ||
sig = list(inspect.signature(self.fit_func).parameters.keys())[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of having a pythonic one-liner when you then need seven lines of comments to explain it.
|
||
@property | ||
def data_index(self) -> np.ndarray: | ||
"""Return current data index mapping.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
Summary
The main purpose of this PR is introduction of
group
in theCurveAnalysis
. Now the data model of the fitting has following structure.In addition, module documentation and unittest are overhauled to increase the coverage of the logic.
Details and comments
This refactoring is necessary to fix performance issue of CR Hamiltonian and complexity in HEAT experiment implementation in #625 . The common feature of these experiments is performing the same curve analysis with different control qubit state. Thus estimated parameters may be different though they have the same function shape. However, current
CurveAnalysis
only supports multi-objective optimization. This means, if a single fit model j has parameters of (p0j, p1j, p2j), the fit model that the curve analysis can handle is F(p00, p10, p20, p01, p11, p21). Since init guess P0j = {P0_j_n} may different for each j, this should repeat fitting for {P0_0_0, P0_1_0}, {P0_0_0, P0_1_1}, ... ~ O(N^2) which is quite inefficient.To overcome this, we should implement each experiment j as batch experiment however still this is really heavy coding overhead. You can check my draft code here https://github.com/nkanazawa1989/qiskit-experiments/pull/7/files. This is working, but you need to implement (1) single child experiment (2) child analysis (3) batch experiment wrapper (3) composite analysis wrapper. The same thing happened to the current HEAT implementation. This pattern is not quite limited to these two experiments, but we will see more in future since this is very conventional in two qubit calibrations and characterization.
This complexity will be drastically alleviated by supporting multi-group fitting, i.e.
There will be no breaking API change as you can see there is no unittest modification except for CurveAnalysis itself``
Planned future refactoring:
CueveDrawingMixIn
.CurveDataProcessor
involving post data processing and filtering.