-
Notifications
You must be signed in to change notification settings - Fork 102
Formalise AD integration status, rewrite AD page correspondingly #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
23469c6
to
e594515
Compare
Preview the changes: https://turinglang.org/docs/pr-previews/595 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts.
**Tier 3** is the same as Tier 2, but in addition to that, we formally also take responsibility for ensuring that the backend works with Turing models. | ||
If you submit an issue about using Turing with a Tier 3 library, we will actively try to make it work. | ||
Realistically, this is only possible for AD backends that are actively maintained by somebody on the Turing team, such as Mooncake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the limits to how far we're willing to take this? Per our discussion yesterday, if someone does something really non-differentiable (e.g. a custom ccall
), we're not going to try and add support for their proposal.
Maybe "we will actively try to make it work" could be extended to say how we'll try to make it work? e.g. if someone encounters a bug, we'll fix it, but if they're doing something unusual we might suggest a more standard way to go about it that avoids the problem they're seeing entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a paragraph above or below the list of tiers explaining what it means that a backend works with Turing? It's mostly relevant for Tier 3, but it could be a good clarification more generally too. Something like
When we say that an AD backend works with Turing, we mean that it is able to differentiate any Turing model that depends only on Turing and some common Julia standard library modules such as LinearAlgebra. Note that a Turing model can include arbitrary Julia code, which can involve code dependencies on other packages such as differential equation solvers or external calls using
ccall
(should something else be added to the list of exclusions?). If a Tier 2 or 3 AD backend fails on a Turing model because of such features, we may still be able to help you out in some cases, but we may also consider the problem to be outside our control or scope.
Good to also keep in mind that while it's nice to be clear and explicit about our thinking, it's not a legal contract and we don't have to be suuuuper precise about our wording on what we commit to fixing. It's all still subject to the usual uncertainties of academic funding and time anyway.
Firstly, you could broaden the type of the container: | ||
|
||
```{julia} | ||
@model function forwarddiff_working1() | ||
x = Real[0.0, 1.0] | ||
a ~ Normal() | ||
x[1] = a | ||
b ~ MvNormal(x, I) | ||
end | ||
sample(forwarddiff_working1(), NUTS(; adtype=AutoForwardDiff()), 10) | ||
``` | ||
|
||
Or, you can pass a type as a parameter to the model: | ||
|
||
```{julia} | ||
@model function forwarddiff_working2(::Type{T}=Float64) where T | ||
x = T[0.0, 1.0] | ||
a ~ Normal() | ||
x[1] = a | ||
b ~ MvNormal(x, I) | ||
end | ||
sample(forwarddiff_working2(), NUTS(; adtype=AutoForwardDiff()), 10) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful for users to make it clear that the second option here is highly preferable to the first in general, and that the first should only be used if the second doesn't work for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely, and a link to https://discourse.julialang.org/t/vector-real-vector-float64-methoderror/25926/5 might also be helpful (it helped me back in the day)
Co-authored-by: Will Tebbutt <[email protected]>
### Usable AD Backends | ||
|
||
Turing.jl uses the functionality in [DifferentiationInterface.jl](https://github.com/JuliaDiff/DifferentiationInterface.jl) ('DI') to interface with AD libraries in a unified way. | ||
Thus, in principle, any AD library that has integrations with DI can be used with Turing; you should consult the [DI documentation](https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/) for an up-to-date list of compatible AD libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plural in "integrations with DI" feels funny to me.
| 1 | Yes | No | 'You're on your own' | Enzyme, Zygote | | ||
| 0 | No | No | 'You can't use this' | | | ||
|
||
**Tier 0** means that the AD library is not integrated with DI, and thus will not work with Turing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Tier 0** means that the AD library is not integrated with DI, and thus will not work with Turing. | |
**Tier 0** means that the AD library is not integrated with DI, and thus will not work with Turing, or is known to have serious enough issues when used with Turing to render it useless. |
To cover cases like what might happen with Zygote, where we know that it won't with any Turing models, so don't bother trying.
**Tier 1** means that the AD library is integrated with DI, and you can try to use it with Turing if you like; however, we provide no guarantee that it will work correctly. | ||
If you submit an issue about using Turing with a Tier 1 library, it is unlikely that we will be able to help you, unless the issue is very simple to fix. | ||
|
||
**Tier 2** indicates some level of confidence on our side that the AD library will work, because it is included as part of DynamicPPL's continuous integration (CI) tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Tier 2** indicates some level of confidence on our side that the AD library will work, because it is included as part of DynamicPPL's continuous integration (CI) tests. | |
**Tier 2** indicates some level of confidence on our side that the AD library will work, because it is included as part of Turing's continuous integration (CI) tests. |
Since these are user-facing docs, I think we can't assume that the reader knows what DPPL is. If Turing sounds too much like the Turing.jl repo then we could also be more ambiguous with something like "our CI tests". Also nice not to have to reedit this if we just move some tests around.
This may be either due to upstream bugs / limitations (which exist even for ForwardDiff), or simply because of time constraints. | ||
However, if there are workarounds that can be implemented in Turing to make the backend work, we will try to do so. | ||
|
||
**Tier 3** is the same as Tier 2, but in addition to that, we formally also take responsibility for ensuring that the backend works with Turing models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Tier 3** is the same as Tier 2, but in addition to that, we formally also take responsibility for ensuring that the backend works with Turing models. | |
**Tier 3** is the same as Tier 2, but in addition to that, we also take responsibility for ensuring that the backend works with Turing models. |
I felt like the word "formally" wasn't adding anything.
**Tier 3** is the same as Tier 2, but in addition to that, we formally also take responsibility for ensuring that the backend works with Turing models. | ||
If you submit an issue about using Turing with a Tier 3 library, we will actively try to make it work. | ||
Realistically, this is only possible for AD backends that are actively maintained by somebody on the Turing team, such as Mooncake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a paragraph above or below the list of tiers explaining what it means that a backend works with Turing? It's mostly relevant for Tier 3, but it could be a good clarification more generally too. Something like
When we say that an AD backend works with Turing, we mean that it is able to differentiate any Turing model that depends only on Turing and some common Julia standard library modules such as LinearAlgebra. Note that a Turing model can include arbitrary Julia code, which can involve code dependencies on other packages such as differential equation solvers or external calls using
ccall
(should something else be added to the list of exclusions?). If a Tier 2 or 3 AD backend fails on a Turing model because of such features, we may still be able to help you out in some cases, but we may also consider the problem to be outside our control or scope.
Good to also keep in mind that while it's nice to be clear and explicit about our thinking, it's not a legal contract and we don't have to be suuuuper precise about our wording on what we commit to fixing. It's all still subject to the usual uncertainties of academic funding and time anyway.
This is an initial attempt to put down in words what @willtebbutt and I think our approach to integrating AD backends should be going forward.