-
Notifications
You must be signed in to change notification settings - Fork 19
Remove the type ParamSpaceSGD
#205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @yebai , could you check to make sure that this is what you asked for? Personally, I feel the |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Red-Portal -- I left some comments above. In addition, let's simplify the folder structure a bit for clarity:
- move all files in
paramsspacesgd
toalgorithms
, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl" - keep each algorithm in its own file
Also, I'd suggest we consider renaming paramspacesgd.jl
to interface.jl
or something along the lines:
- "algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"
Hi Hong, I planned to do the restructuring in a separate PR to keep things simple in this one. Though:
After the release of v0.5, we'll be adding algorithms that don't conform to the original |
It is okay to keep all algorithms under
You can add more algorithms to |
I am saying that these new algorithms can't be grouped in |
"grouping" refers to grouping interface code together for similar VI algorithms in the proposed |
If I understand right, this PR flattens the existing Before we drop it, may I understand what concrete benefits the flattening delivers? In particular, are we planning to add other algorithm families alongside the current At the moment, I haven't quite convince myself that the flattening of the type hierarchy is necessary. |
I suggested the removal of the |
@yebai @sunxd3 Thanks for chiming in. Actually, I have a new idea. So I believe the main complaint at the moment is that the term In a nutshell, the nice thing about the current
or something along these lines? Would that resolve your concern? |
I think I see your point. But, I am not sure that helps.
That probably includes every learning algorithm in ML. |
Yes, that is indeed almost true! But the point is that there are a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too. So this is the reason I wish to allow for two different abstraction levels. But as you said, most algorithms only require defining a gradient estimator. So the lower-level interface helps unify the code for all those algorithms. |
We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, There might be some insights we can learn by taking a unifying view of parameter space gradient descent VI, but that is a discussion we should have offline for a review paper. |
My main beef with using Unions here is the following:
With that said, do you find the solution below still unsatisfying? At least I hope that this resolves your concern that the terminology is non-standard.
If you think we should still go with an implicit interface, then I'll follow for the sake of moving forward. |
Hi @yebai , what is your final verdict on what we should do here given my last comment above? |
Hi @ Red-Portal, I don't think the proposed interfaces are better. For clarity, I'd suggest we remove Happy to continue the discussion offline and explore whether parameter space can be useful for novel algorithmic insights. |
…VI.jl into remove_paramspacesgd
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Hi @yebai , I applied all the changes you requested. Let me know if you're satisfied with the code now. Updating the docs will be separate since it will be a lot. (And painful on my end 😅 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, @Red-Portal, for the contributions. I'm happy with the changes; @penelopeysm / @mhauru might want to review this too.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Results
Benchmark suite | Current: 9b2eabb | Previous: fb69a43 | Ratio |
---|---|---|---|
normal/RepGradELBO + STL/meanfield/Zygote |
3990679915 ns |
4038685577 ns |
0.99 |
normal/RepGradELBO + STL/meanfield/ReverseDiff |
1124955886 ns |
1125086941 ns |
1.00 |
normal/RepGradELBO + STL/meanfield/Mooncake |
1261092892.5 ns |
1251364823 ns |
1.01 |
normal/RepGradELBO + STL/fullrank/Zygote |
3971814683.5 ns |
4006929376 ns |
0.99 |
normal/RepGradELBO + STL/fullrank/ReverseDiff |
1621498433.5 ns |
1627490017.5 ns |
1.00 |
normal/RepGradELBO + STL/fullrank/Mooncake |
1285149814.5 ns |
1271144046.5 ns |
1.01 |
normal/RepGradELBO/meanfield/Zygote |
2859724373.5 ns |
2867089524.5 ns |
1.00 |
normal/RepGradELBO/meanfield/ReverseDiff |
777886417 ns |
799659066 ns |
0.97 |
normal/RepGradELBO/meanfield/Mooncake |
1137178738 ns |
1121597958 ns |
1.01 |
normal/RepGradELBO/fullrank/Zygote |
2864869952 ns |
2844951835.5 ns |
1.01 |
normal/RepGradELBO/fullrank/ReverseDiff |
954006564 ns |
954966722.5 ns |
1.00 |
normal/RepGradELBO/fullrank/Mooncake |
1144634418 ns |
1159699376 ns |
0.99 |
normal + bijector/RepGradELBO + STL/meanfield/Zygote |
5743469386 ns |
5572987635 ns |
1.03 |
normal + bijector/RepGradELBO + STL/meanfield/ReverseDiff |
2419710523 ns |
2398829013 ns |
1.01 |
normal + bijector/RepGradELBO + STL/meanfield/Mooncake |
4227822021.5 ns |
4131189735.5 ns |
1.02 |
normal + bijector/RepGradELBO + STL/fullrank/Zygote |
5787110379 ns |
5649130276 ns |
1.02 |
normal + bijector/RepGradELBO + STL/fullrank/ReverseDiff |
3116964174 ns |
3043240221 ns |
1.02 |
normal + bijector/RepGradELBO + STL/fullrank/Mooncake |
4366946807.5 ns |
4230203948.5 ns |
1.03 |
normal + bijector/RepGradELBO/meanfield/Zygote |
4464663738.5 ns |
4276506885.5 ns |
1.04 |
normal + bijector/RepGradELBO/meanfield/ReverseDiff |
2102610294 ns |
1991487412 ns |
1.06 |
normal + bijector/RepGradELBO/meanfield/Mooncake |
4074397467 ns |
3939831565.5 ns |
1.03 |
normal + bijector/RepGradELBO/fullrank/Zygote |
4560198939 ns |
4343225391 ns |
1.05 |
normal + bijector/RepGradELBO/fullrank/ReverseDiff |
2331323017 ns |
2275989579 ns |
1.02 |
normal + bijector/RepGradELBO/fullrank/Mooncake |
4175468965.5 ns |
4061135640 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
This PR removes the use of the type
ParamSpaceSGD
, which provides a unifying implementation of VI algorithms that run SGD in parameter space. Instead, each parameter space SGD-based VI algorithm becomes its ownAbstractVariationalAlgorithm
, where the shared code implementingstep
is shared by dispatching over theirUnion
.This addresses #204