-
-
Notifications
You must be signed in to change notification settings - Fork 387
Description
In current Trio, if one task in a nursery raises an exception, then that exception directly propagates out of the nursery. If multiple tasks in a nursery raise exceptions, then they are wrapped into a single MultiError at the nursery boundary.
This is error-prone because it means except FooError: will catch a FooError across a nursery boundary, but only if there was just a single exception raised. As soon as another error appears in tandem, except FooError: won't catch anything, because MultiError doesn't inherit from FooError. This can be especially surprising if the other error is Cancelled:
with trio.move_on_after(1):
try:
async with trio.open_nursery() as nursery:
nursery.start_soon(trio.sleep_forever)
try:
await trio.sleep_forever()
finally:
raise trio.BrokenResourceError("oh no, issue during finalization")
except trio.BrokenResourceError:
print("handler doesn't execute!")
It has been previously proposed (#611) that we should fix this by always wrapping exceptions in a MultiError/ExceptionGroup at a nursery boundary, even if there is only one exception being raised. That way, using except FooError: will always fail to catch the thing, instead of failing only when multiple exceptions occur simultaneously, which is theoretically easier to debug and understand. This is certainly the way Trio would have been designed if we had been able to predict this issue several years ago.
The problem: by Hyrum's Law, there is probably a lot of existing Trio code that does rely on the ability to write except FooError: and have that catch FooErrors even if they have crossed a nursery boundary since they were raised. This becomes especially tricky when you consider that library code might hide a nursery in an innocuous-seeming async context manager -- so it's hard to even audit for places where such . Maybe such code is technically broken now... but in practice it almost always works, so it's a bit hard to swallow the idea that we should turn it into code that never works. And since there are not really any hooks in the exception-catching mechanism, it would be basically impossible to provide a deprecation warning.
Let's call the desired new behavior (where nurseries always add a layer of ExceptionGroup wrapping, whether there's one exception or multiple) "strict exception semantics", versus the current "loose exception semantics".
Once we switch to ExceptionGroups (#2213), users on Python 3.11 will have an easy way out of this quandary: they can write except* FooError instead of except FooError (and deal with the fact that the actual error they're catching is now an ExceptionGroup[FooError], not a direct FooError). This is forward-compatible: it works with the current loose semantics, and will also work once we switch to strict semantics. Unfortunately, users not on 3.11 yet (which will include most libraries for a while) must convert their exception handlers to functions plus a with exceptiongroups.catch(): context manager, which is significantly clunkier (and is probably the best we can do with the available language features).
This issue is intended to collect potential solutions to the above-described quandary.
My basic proposal is as follows:
- Merge the ExceptionGroups change backwards-compatibly; do whatever we do about single-exception wrapping under separate cover.
- Document the issue, encouraging people to use the new
except*syntax on 3.11 and later. - Provide a
trio.run()keyword argument (strict_exception_groups=True?) to opt into strict behavior (which will become the future default), and anopen_nursery()keyword argument to override the default for that particular nursery. (Not sure whether the nursery one should nest or not. Probably yes?) - Wait until 3.11 is in widespread use.
- Make targeted changes to encourage libraries to support strict exception semantics. I'm not sure what exactly this would be; perhaps "default strict_exception_groups to True when running under pytest" or something like that.
- Wait some more time.
- Make the default be
strict_exception_groups=Truewhen running on 3.11+.
I'm not sure we ever need to remove support for strict_exception_groups=False. In particular, it is probably useful indefinitely on an individual-nursery basis, in order to support libraries that "hide" a nursery and expect their background tasks to never raise errors. Users should probably be able to write
try:
async with open_websocket(...) as ws:
raise RuntimeError
except RuntimeError:
print("and have this clause execute")
We might instead want to add a nursery mode more targeted at such cases, that passes through exceptions from the nested child but wraps any background-task exception in some new error like trio.HelperTaskCrashed. We could use this for the system nursery too, instead of the current norm of using trio.TrioInternalError when a system task crashes. This could profitably be combined with the "service nurseries" concept discussed in #1521.
Some optional mix-ins for the above:
- Default to strict exception semantics on 3.11+ immediately. Assume any libraries affected by this will notice the problem when they test for 3.11 compatibility in the lead-up to the release. Benefits: new users (which are more likely to be using new Python versions, and are definitely not going to know to toggle this obscure
trio.runargument) will get the "better" semantics sooner; existing deployments will notice the issue as part of their general "does our thing still work with the latest Python version?" testing. Drawbacks: libraries might be left scrambling. - Default to strict exception semantics everywhere rather than limiting it to 3.11+. Benefits: it's a little confusing when a Python upgrade breaks something, because it's not Python's fault and users might not think to check Trio release notes if they didn't also upgrade Trio. Drawbacks: the necessary workarounds to support strict semantics before 3.11 are cumbersome and are likely to annoy our users.
- Add a deprecation period where users who don't explicitly specify
strict_exception_groups=Trueget a deprecation warning explaining the issue. Benefits: user education. Drawbacks: requires two code changes at different times for users who don't want to see the warning.
I don't feel strongly about what default we choose for strict vs loose semantics and when we change it, but I do feel strongly that users should be able to override whatever we choose.