-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trait versioning design sketch #90
base: main
Are you sure you want to change the base?
Conversation
ae7b405
to
f3e0225
Compare
f3e0225
to
3e22e9a
Compare
Great work folks - quick question - I wondered if you had considered exposing the version of the traits themselves in the class hierarchy, vs at the package level? So you could see more easily in code how the versions work. Eg (or similar): openassetio_example.traits.example.Unchanged_v1
openassetio_example.traits.example.Updated # Could be added as a automatic/versionless wrapper later
openassetio_example.traits.example.Updated_v1
openassetio_example.traits.example.Updated_v2 So it is a little more explicit. It might be counterintuitive that You could leave multiple version declarations in the YAML and deprecate them as-and when, which would give any specific package useful information about the history for migrations etc. [edit] I think this leaves the logic described largely unchanged, and could also be used to version specifications more explicitly at the code level so they behave consistently with traits - which is one less thing to explain. Updated:
- version: 1
description: An example.
usage:
- entity
- locale
- relationship
properties:
propertyToKeep:
type: string
description: A property that is unchanged between versions.
propertyToRename:
type: boolean
description: >
A property that has an inappropriate name and should be renamed in the
next version.
propertyToRemove:
type: boolean
description: A defunct property that should be removed in the next version.
- version: 2
description: An updated example.
...
properties:
propertyToKeep:
type: string
description: A property that is unchanged between versions.
propertyThatWasRenamed:
type: boolean
description: A property that has been renamed.
propertyThatWasAdded:
type: float
description: A new property added in the latest version. |
Thanks for the feedback! I think its worth at least considering this alternate formulation. Will spend some time sketching it out. Some initial thoughts:
As a general point, not addressing your comment, but worth bearing in mind - I don't think there's any way around having to manually discover changes between versions (in the release notes/library). At some point a human is going to have to look at the docs/code and explicitly write bespoke handler code to work with multiple versions. That said, putting the version in the class name could help users discover that there are multiple versions available, e.g. via IDE code completion. Would perhaps make the code slightly more readable too, though I'd imagine any code being conditional on trait versions would be pretty clear. One advantage of versioning at the namespace level is that multiple subpackages can coexist, potentially from multiple sources. Not really explored in the sketch, but v1 and v2 could be separate PyPI (namespace) packages. Removing the need for a schema version is alluring, though without further changes it removes the ability to have/use two different Specification versions in the same global symbol space (relates to your final point, below). There's nothing stopping us doing both. We can keep the top-level schema namespace and add duplicates/aliases of lower schema trait views with
This is something we have vaguely considered, but haven't properly sketched out. The example presented vaguely assumes multiple YAML files are passed to An infinitely growing schema feels like a bad idea, but we could mark deprecations and cull old versions periodically. There is a small danger that changes could be made to an old version (by accident), making it incompatible with a previous release.
Would the idea here be to have e.g. |
^ (1fbaee9) updated the sketch to add support for versioned suffixes on trait view classes, e.g.
|
Thanks so much for fleshing that out @feltech. Sorry a little short on time so a quick reply, please forgive any brevity.
Yeah - there would still be need to read the docs, but as you say - least it's visible in practice in the code. For me the programming time experience is much clearer/readable if the versions are in the classes not the package. Probably the starkest example would be that you don't have to from openassetio_mediacreation.traits.content import LocatableContent_v1, LocatableContent_v2 vs from openassetio_mediacreation.v1.traits.content import LocatableContent as LocatableContent_v1
from openassetio_mediacreation.v2.traits.content import LocatableContent as LocatableContent_Two Then when you use them at the call site you're using the actual classes vs aliases which may be defined inconsistently across a code base/projects. The other situation being if someone has done the following, which makes it less clear in the subsequent code: from openassetio_mediacreation.v1.traits.content import LoctableContent Granted you'd have more places to change if there is a compatible but differently versioned update, but if they have a consistent class name hopefully thats easier, and when the version changes it needs consideration anyway?
What's the use case for that one? Feels like you say it would need to be in the actual PyPi package namespace vs the modules to allow different release versions to be installed side-by-side?
Could that potentially ambiguous as to whether
Yeah, definitely - you'd probably want good management policy (lol), is one benefit though, that if say, you have 10 version where you just add traits, you don't have to worry about 10 copies of the 99 existing traits in 10 files? Feels a little more manageable.
Any more than any other change in the rest of the code base we self-regulate with semver?
Yeah - was just highlighting that some concerns were mentioned that was is no outwardly visible versioning of specifications - the same mechanism as for traits could be used for them. Let's say you changed a trait in
Nice one - thanks! I guess my only question is - what is the real concrete benefit of having the two version numbers that outweighs the maintenance complexity and potential ambiguity?
I'd prefer that approach (single file with multiple versions) as it feels much easier to manager at source and makes it easier to catch if you have inadvertently changed a pervious version. as the git diff will show it, vs you missing eyeballing a difference against a new file and an old one, that wouldn't show up in the diff unless you manually generate it. Ultimately - what I like about that the most is that it allows you to be explicit about the life-cycles of these traits, as it is very clear in each "release" of trait/spec package, which ones are considered current, and/or deprecated. It would also allow you to more easily add a deprecation warning mechanism to the older traits, without having to go back and update many many files.
I guess I'm struggling to follow the relevance of a schema version, if all that matters at the data layer is the traits and their properties - which are now individually versioned. Where do you see the schema version being used? The main thing that matters with specs is which traits it is composing, and the schema version wouldn't tell you that as it doesn't tell you if they have changed or not. Having the version number as per traits gets you the same thing at a per-spec level? And would only change when the actual traits of the spec change, which make it clearer. Eg, ImageSpec may be identical in schemas v1-27, which isn't apparent, unlike ImageSpec_v1 -> ImageSpec_v2 in v24 of the package. For side-by-side visibility, having it in the class name seems the most concrete? Otherwise you have to use full paths, and/or aliased imports? As per the trait example above? |
{ | ||
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": "However, a workaround to this is to use the fallback versioned trait view classes in the newer schema subpackage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Consider] This seems too clever for my liking. There are two other options here that I can see, both which I prefer.
- Don't allow traits views of different versions to be cross compatible at all, force folk to do fallback behaviour to the correct type
- Be permissive with imbue/isImbuedTo behaviour, ignoring the version section of the traitID and making best attempts to perform the operation on the data we have.
I prefer permissiveness, I wouldn't even hard error if the trait cannot be populated fully, although i would try to warn. Furthermore, i suspect if we're not permissive, folk are going to end up using the low level API to gain permissiveness anyhow, although that's just speculation.
Maybe eventually a configuration option at generation time, whether you want permissive trait loading or would rather receive errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't allow traits views of different versions to be cross compatible at all, force folk to do fallback behaviour to the correct type
In the context of a solution that has both schema version namespace as well as versioned class names, do you mean that older schema versions shouldn't be duplicated (or aliased) under a newer schema version namespace, i.e. no _vX
-suffixed classes?
- Be permissive with imbue/isImbuedTo behaviour, ignoring the version section of the traitID and making best attempts to perform the operation on the data we have.
I think this is a much bigger change that we'll need to consider in isolation. I.e. given the trinary state
- Trait not imbued
- Trait imbued and version matches
- Trait imbued but version mismatch so properties are not retrievable
... we need to start by considering if that third case even has much value - you usually need to know the correct version so you can request and extract property values. I'd also worry about hard-to-diagnose bugs if we didn't give appropriate feedback.
Permissiveness is much easier to add than to remove later, so I would definitely prefer starting out more restrictive.
I agree with Tom on the naming conversation, having the version in the class name itself is preferred.
I echo this, having read the examples. I understand there are potential benefits to having a schema version as well as a trait specific version number, but I'm thinking now that the complexity we bring on far outweighs them. At this stage of play, i'd rather versioning be difficult than complex.
Maybe that is the best approach, on reflection. A similar motive around trying not to be complicated about things.
Echoing this as well. Now that we're putting data in the trait itself, the need to a version outside of it eludes me, save for some speculative compatibility patterns, unless I'm missing something? Is the primary motivation to tie specific trait versions to specifications? |
Thanks, all good points. Definitely worth updating the sketch to remove any concept of a schema version, to see what shakes out.
Python namespace packages allow you to install two PyPI packages that both have the same top-level namespace. So we could have That way, a host could depend on It's a very Python-specific packaging technicality, so isn't the most important consideration, but worth noting that removing a top-level schema namespace would prevent us from making use of Python namespace packages in this way in the future.
The Host sketch shows that Specifically in the sketch a
However, the relevant logic could probably be rewritten to make better use of Specifications instead.
Yeah, versioning Specifications is a headache... should a Specification's version really change when a constituent trait version changes? I argued in the Notebook that, conceptually, Specifications are trait version agnostic, but practically, the Specification classes won't work unless you use/expect the trait versions that the Specification class was built against. Having a top-level schema version seems to solve that conflict between the conceptual and practical, since the Specification version and the versions of its constituent traits are all given by the schema version (notwithstanding cross-schema Specifications).
Yeah full paths or aliases would be needed. Having a version tag in the Specification class name instead could be easier to recognize. The problem is if Specifications are independently versioned, rather than linked to the schema version. Losing the schema namespace/tag means you lose the ability to know at-a-glance what schema version a Specfication's constituent traits come from. If you detect a trait set satisfies a particular Specification, then knowing the schema version of that Specification allows you to use the trait view classes of its constituent traits, without having to look it up in the YAML/code. This all relies on Specifications being kept up to date with the trait IDs in its schema version, which I think is reasonable. |
70e7e36
to
b1d9925
Compare
^ (b1d9925)
I quite like it. My main concern with removing a schema version was for Specifications.
However, in the sketch these issues aren't so bad. For the latter issue in particular, making proper use of the Specification view class ameliorates the problem of "just knowing" the correct trait view class to use. Removing the top-level schema does of course simplify things, as reflected in the amount of text cut from the sketch. Adding multi-version support to the YAML means a fairly large breaking change to our YAML (JSON-) schema. But so be it. The tweak to ensure fixed-width version tags in trait IDs means that a future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I know you're thinking about putting these options into a DR, but I think the current one is the right approach. The callout in the docs about specification views being potentially dangerous is a good one (I still think we should maybe just not generate them for this reason, at least until the thinking around versionless interfaces develops).
The back of my mind is screaming at me that 999 versions may not be enough thirty years from now, but I mean, surely not that would be crazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a lot simpler now - nice one! ❤️
|
||
Updated: | ||
versions: | ||
- description: An example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Do we need to explicitly version the traits/specifications in the YAML? So that we can delete them in the future without resetting the index? You'd mentioned a concern before for an ever growing list. If they were versioned you could remove deprecated ones after some period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, very good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added version numbers in 86b3680
"\n", | ||
"Specifications are a way to document well-known sets of traits that categorize entities, relationships, locales, or policies. Agreement on these as an industry is crucial for effective interop.\n", | ||
"\n", | ||
"However, they do not have an independent version - their version is implicit in the (versioned) traits that they compose. A major consequence of this is that no specification version is embedded in the data itself.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Does this need updating now they do have a version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated that paragraph in 0d25667
Closes OpenAssetIO#88. As due diligence before going ahead with an implementation, we need to sketch out what a versioned trait package would look like, and how hosts and managers would interact with it. So add a contrived trait package with two versions. The "generated" view classes have been manually created to reflect what the `traitgen` output should be, once implemented. These view classes are then used in a tutorial notebook that describes how one would go about working with versioned traits. This work may be abandoned or adapted in favour of more concrete documentation as the feature is developed. Signed-off-by: David Feltell <[email protected]>
Signed-off-by: David Feltell <[email protected]>
e3da0b5
to
0d25667
Compare
^ added a WIP DR to summarise the journey through this sketch. It's still in progress, no need to review yet. Also not sure where such a DR should go.
I reasoned, without much conviction, that the applicability is to the ecosystem as a whole, and we've previously discussed that MediaCreation is best placed as a "landing page" for the OpenAssetIO ecosystem 🤷♂️ |
Signed-off-by: David Feltell <[email protected]>
Version 1 should be a special case, where the ID is not suffixed with a version tag, and the unversioned class name continues to be available. This then means the introduction of versioning is a non-breaking change to people currently using the legacy library. Signed-off-by: David Feltell <[email protected]>
^ After discussion, we've determined that we really want to avoid a breaking change in MediaCreation. So I've updated the sketch to say that, rather than the non-suffixed classes aliasing the "latest" version, they instead alias version 1, and in addition flag a deprecation warning. This means MediaCreation should be entirely source compatible, even once we introduce versioning support. |
d3086ca
to
2eeaab0
Compare
DR is ready to review. Still not sure if it's in the correct repo, or if it should be pulled out to a separate PR? |
b3a11d9
to
578dbe3
Compare
Sorry not had time to catch up on all the changes - might TraitGen make sense? As ultimately, if I understand correctly, this is all an emergent phenomenon from there? And isn’t actually a core thing? |
You're technically correct, but I can't imagine anyone going to TraitGen to find project planning information. I say we put it here, this is where people will probably come. |
I think even if the DR is put in this repo, we should separate it from this notebook, since we can't merge this notebook until the implementation is done. Whereas, ideally, we should merge the DR before embarking on the implementation. As for which repo to put the DR in... Perhaps, if we had the resource, we'd split things up better and set up CI to run Jupyter in all repos, move DRs and notebooks to the repos that are most associated with them, and have a central ecosystem landing page/repo that links to them all. Alas, in the meantime, I'd also propose to put the DR in MediaCreation, since its not a core library concern, there is precedent for DRs in MediaCreation, and it keeps the DR next to the design notebook. There is perhaps something in the point that the DR will affect the YAML structure, which is a concern for MediaCreation, since it's a document as much as an input to codegen. I wonder if we should move the previous DR on trait versioning (in-data vs. out-of-data) into MediaCreation, since they're obviously closely related? - I know there's an argument that the previous DR might have led to core library changes, so its reasonable to leave where it is. |
Part of OpenAssetIO#88. Consolidate the discussion, provoked by iterations of the design proposal in OpenAssetIO#90, into a decision record. Signed-off-by: David Feltell <[email protected]>
578dbe3
to
928408a
Compare
Dropped the DR from this PR and created a new PR in this repo: #95 |
Hence set this PR back to Draft status, to prevent accidental merging. |
Part of OpenAssetIO#88. Consolidate the discussion, provoked by iterations of the design proposal in OpenAssetIO#90, into a decision record. Signed-off-by: David Feltell <[email protected]>
Part of OpenAssetIO#88. Consolidate the discussion, provoked by iterations of the design proposal in OpenAssetIO#90, into a decision record. Signed-off-by: David Feltell <[email protected]>
Part of OpenAssetIO#88. Consolidate the discussion, provoked by iterations of the design proposal in OpenAssetIO#90, into a decision record. Signed-off-by: David Feltell <[email protected]>
Merging this is blocked on OpenAssetIO/OpenAssetIO-TraitGen#80 |
Closes #88. As due diligence before going ahead with an implementation, we need to sketch out what a versioned trait package would look like, and how hosts and managers would interact with it.
So add a contrived trait package with two versions. The "generated" view classes have been manually created to reflect what the
traitgen
output should be, once implemented.These view classes are then used in a tutorial notebook that describes how one would go about working with versioned traits.
This work may be abandoned or adapted in favour of more concrete documentation as the feature is developed.