-
Notifications
You must be signed in to change notification settings - Fork 21
Design Doc: Schemas, Intents, and Delegation Groups #2312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- old SchemaId -> SchemaVersionId - protocol -> schema - namespace -> protocol
Additional question/discussion point: (Thinking mostly about on-chain storage like Currently, on-chain storage is stored in a storage child trie with the specific schema version id as part of the key. This made complete sense given that, at the time, there was no way to associate on schema version with another as being representative of minor variations of the same data format. However, with the advent of an encapsulating Schema for which individual Schema Versions are merely minor format changes, would it make sense to reorganize on-chain storage such that the on-chain storage references the Schema, rather than the Schema Version, and simply includes the Schema Version as a record header? This would have the following effects/benefits/drawbacks:
Thoughts? |
|
||
- New schemas must be approved by governance | ||
- Minor updates (new schema version) may be published by the protocol owner | ||
- Major updates require a new schema (e.g., change in semantic intent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the decision of what qualifies as a major update entirely at the owner’s discretion, with no on-chain enforcement or challenge mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this doc is being revised to defer the concept owner publishing capabilities and make everything a governance action (for now), the major/minor distinction will be validated by governance.
Whether a programmatic determination can ever be made in this regard may determine the fate of an ownership publishing (without governance) capability.
### 👤 Ownership and Control | ||
|
||
Protocols are envisioned to be owned by an entity, but there is an open question as to whether this entity should be an | ||
MSA (i.e., a Provider) or a raw account (i.e., a public key). This decision will impact how access is managed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: One of the advantages of MSAs is that, in theory, we can attach permissions to them. This would allow governance to add keys with limited access, which is especially valuable for organizations that need to delegate responsibilities. Being able to grant keys with restricted functionality becomes even more important in these contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Probably the subject of a separate design or issue, but the idea of adding a control key with explicit/limited rights is intriguing... though, delegations could accomplish the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work but it's hard to capture all the side effects and issues that might come with this by only reading this proposal. It might make sense to do a quick and dirty POC and check the weights and see some code.
- A new schema version ID is automatically assigned the next version | ||
- Older versions are preserved permanently | ||
- Delegations may apply to the schema, not a specific schema version ID | ||
- _Question: can versions be deprecated?_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should mention the max number of versions we support to be stored on chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's currently a config value & likely will not change, but I'll add a mention of it.
|
||
```rust | ||
pub enum DelegationTarget { | ||
SchemaVersion(SchemaVersionId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current SchemaVersionId
type is not a well defined structure and it's just an array of the SchemaIds and it doesn't have any id attached to it. I'm not sure how it would work if we don't have any canonical registration or identifier for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually SchemaVersionId
is what current Frequency calls SchemaId
and is just a u16
pub type Delegations<T> = StorageDoubleMap< | ||
_, Blake2_128Concat, MsaId, // Delegator | ||
Blake2_128Concat, ProviderId, // Provider | ||
BoundedVec<DelegationInfo, MaxDelegationsPerProvider> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to know the max encoded size of DelegationInfo
and that would have an effect on how many maybe we should fit in a BoundedVec
or maybe we might need a different structure if it's too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 things to consider here:
-
If we ignore
DelegationInfo
, and simply store delegations the way we currently do, using aBoundedBTreeMap
, with aDelegationTarget
enum key instead of aSchemaId
(u16), here are some back-of-napkin calculations:- current max storage for 30 schema delegations (current per-delegation max): ~180 bytes
- max storage for 30 delegations of
DelegationTarget
: ~211 bytes
-
However, our current delegation storage has some deficiencies; namely, we can know that a Schema is currently delegated, or when it was revoked, but we cannot know when that delegation started (because we only store a revoked block number or zero). The new
DelegationInfo
struct allows us to store that information. To store 30 delegations (we probably would need to useBoundedVec
instead ofBoundedBTree
due to the way we would need to do lookups), would cost ~361 bytes, so effectively double the storage we currently use for delegations.
|
||
### 🧪 Additional Considerations | ||
|
||
- In the future, extrinsics may support intent-based authorization **without referencing a schema** at all. These would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intent based authorization would cause some issues and would solve some issues.
Cons:
- It would allow actions being done on behalf of the user even if the user doesn't specifically gave permission for those. (Imagine a new schema gets registered with the delegated intent but the user actually doesn't want to do delegation for that new schema but since it already gave intent based auth it can not do anything)
Pros:
- it would allow smooth evolvement of protocol without asking for extra permissions for each change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional thoughts about intents... I think probably we need to have a discussion dedicated to the topic, since there seem to be as many problems created as solved:
- Namespaced intents mitigate the "con" above... somewhat. However, they would still involve an element of trust, that the owner of a namespaced intent does not evolve the protocol in ways that are not aligned with the original purpose
- The next version of this doc (in progress) will remove/defer the concept of "ownership" and revert all actions regarding Schemas and Intents to governance approval. While this also mitigates the concern above, (1) it may put too much burden/trust on governance, and (2) still leaves an open question of how to handle things if/when "ownership" of intents/schemas is later implemented
- Delegation via intents has implications both on-chain and off-chain:
- on-chain: looking up a delegation via intents has storage read implications that would affect weights. Mitigating this would require new extrinsics that include the
IntentId
in addition toSchemaVersionId
- off-chain: validating posted (or to-be-posted) content against delegations requires additional work, though this can be mitigated by providing custom runtime calls or client library functions to facilitate the lookup
- on-chain: looking up a delegation via intents has storage read implications that would affect weights. Mitigating this would require new extrinsics that include the
- Intents have an added benefit of being able to represent arbitrary permissions, whether related to a Schema or not. The current delegation model represents ONLY a singular "allow delegated WRITE to
schema_id
" permission. Intents could represent READ permissions as well, such as "allow access to users's graph key" that an off-chain wallet or other software could query.
I think the question of whether or not to implement Intents comes down to answering the following questions:
- Is it reasonable to ask a user to grant an Intent delegation if there is no way (other than chain governance controls) to prevent the Intent from being evolved in unknown ways?
- Are we willing to deprecate the existing chain API for the
messages
andstateful_storage
pallets such thatIntentId
becomes an input to extrinsics that formerly required only aSchemaId
?
And a follow-on:
- If we decide to implement Intents, how are they managed? The choice has implications for pallet storage and delegation lookup weight impact:
- Intent->Schema associations are managed through the Intent itself
- Schema->Intent associations are managed through the Schema (and therefore immutable from the Schema's perspective)
- SchemaVersion->Intent associations are managed as metadata in each published SchemaVersion (and therefore mutable between SchemaVersions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to defer the concept of Intents
for now; I'll separate it out to a separate issue for tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-thought the concept of Intents
. The problematic part of Intents was the ability to associate an Intent with a group of related (not versioned) schemas, ie ['broadcast', 'reply', 'react']
. If we restrict Intents to a list of versions of the same basic schema, then the problematic aspect goes away, and we can still avoid delegation churn.
TL;DR:
Previous version of this doc: protocol
-> schema
-> [schema@v1, schema@v2,...]
; intent
a separate entity where intent
-> [schema A, schema B, ...]
New version: protocol
-> intent
-> [schema@v1, schema@v2, ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version makes a lot more sense to me, too.
@JoeCap08055 I can see that might be helpful if we consider the abstraction and generic usecase of stateful storage but I can not see why we might want to change the graph related schemas. My 2 cents are that if we don't have to do it we probably should not try. The reason is that even though it might be beneficial for generic case it's problematic for graph which is 99% of the actual use of the stateful storage for us. We would need a deeper discussion about this. |
I really like the direction we’re heading with minor schema updates. It should make the update process more efficient and easier for providers to manage. That said, I think the concept of Intents could use some additional discussion. It might bring back some of the same challenges we previously encountered with updates. For instance, if a provider wants to add a new schema to an existing intent, it could unintentionally trigger an update. There are a few potential ways to address this, but each comes with trade-offs that might not be worth it. I am looking forward to hearing other thoughts on how we want to approach intent. |
- **protocol** - the top level of the tripartite nomenclature _protocol.schema@version_ (ex: _dsnp_ is the protocol in | ||
_dsnp.broadcast@v2_) | ||
- **schema** - the second level of _protocol.schema@version_ (ex: _dsnp.broadcast_ resolves to a specific schema). | ||
Schemas should always be referred to by their fully-qualfied name (ie, _dsnp.broadcast_, not just _broadcast_) | ||
- **version** - the third level of _protocol.schema@version_, resolves to a specific _minor_ iteration of a schema | ||
- **intent** - an on-chain primitive representing an abstract action or operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for some comments on naming.
I was initially in favor of: Namespace->Protocol->Schema; however, apparently "Protocol" for the 2nd level was rejected...?
@wilwade suggested Protocol->Schema->SchemaVersion. However, already in conversations it's clear that Schema vs. SchemaVersion is very confusing.
I think perhaps we should keep the third level as it currently is, Schema
, to avoid renaming/cognitive shift. I've got some suggestions here for the other levels:
Level 1: Namespace, Domain, Package, Module, Scope, Protocol
Level 2: Model, Template, Component
Happy to consider others...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming is hard 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... what if we did:
- Domain -> analagous to ENS Domain; will be an easy conversion if we ever implement/move to ENS
- MetaSchema
- Schema
This eliminates the confusion whereby L2 points to "versions" (would be confusing if "versions" of a Model
, for instance, were called Schemas
). So a MetaSchema
points to "versions" that are Schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea based on a re-read:
Protocol -> Intent -> Schema
Effectively remove "version" as a concept (force it to be a part of the name) and instead just have deprecated schemas.
So for example something might look like this eventually:
Protocol: DSNP
Intent: Graph
Schema: dsnp-private-connections
(deprecated)
Schema: dsnp-private-connections-v2
Permission: dsnp-private-graph-read
(future support for permissions, not part of it for now)
I think this works for all existing mainnet schema.
- Delegation by `SchemaId` (existing/legacy, for future deprecation) | ||
- Delegation by `IntentId` (implies all schema versions within the Intent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be ok with just migrating all the Schemas that are currently in use to "Intents" that have just one schema. That way we don't have to handle the dual setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think that might work...
- `RegisteredAt`: Block number | ||
- `Versions`: [optional] Ordered list of `SchemaId`s (e.g., `[7, 15, 27]`, for versions 0..2) | ||
|
||
### 🧬 Versioning Rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we dropped versions and instead just did schema deprecation tagging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. Elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bit more here: #2312 (comment)
Versions imply a semantic that we might not want to imply. Also some protocols might use other ways of versioning than int incrementing.
Effectively with Intents, we can just say "here are all the schemas for this intent and these ones are flagged as deprecated"
The schema naming we do could also be updated. With named Protocols and Intents, the only reason to name schemas is for ease of developer integration across mainnet and testnets.
// Q: Is this necessary as a separate map, or can be folded into the `IntentInfo` struct? | ||
pub type IntentVersions<T> = StorageMap< | ||
_, Blake2_128Concat, IntentId, BoundedVec<SchemaId, MaxSchemaVersionsPerIntent> | ||
>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reverse mapping needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends. I think we would store the parent IntentId in the SchemaInfo struct. So to get the reverse mapping of SchemaId -> IntentId would require 1 read; it's really a question of whether it's worth the optimization to read just a u16
vs. an entire struct, vs the extra storage map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so we are BoundedVec<SchemaId, MaxSchemaVersionsPerIntent>
which are schema ids, this could use some renaming from IntentVersions
, it could be a common state call for schema management
3. **Submit by IntentId only (No Schema):** | ||
|
||
- Enables non-schema-based actions (e.g., future custom actions) | ||
- Runtime must validate delegation by `IntentId` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from blockchain perspective, keeping things minimal, why we should support multiple ways to publish other than schemaId? that's more granular for the chain and can enforce some provider side management or a library from us for schema management .
Edit: On second thought, if we want to enable publishing against an intent, these could be good apis add to gateway service which can internally resolve schema id to use, reducing additional work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification:
You publish to a SchemaId
; you delegate to an IntentId
. Schemas
implement Intents
. After thinking about it some more, I realized that any extrinsic that publishes to a Schema already has to read the SchemaInfo
metadata to do some basic validation. So if the IntentId
is part of that metadata, we don't even need an extra read to look it up to validate the delegation. Thus our existing extrinsic model of just passing SchemaId
can remain.
Publishing to a schema on-chain, then, would look like:
- Read
SchemaInfo
bySchemaId
- Basic validation (ie, storage location)
- Get
IntentId
fromSchemaInfo
- Query delegations for
IntentId
just as we currently would forSchemaId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the doc to make ☝🏻 more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, so there will be some changes on schemas pallet and subsequent migration of schemas to record intent_id
Co-authored-by: Shannon Wells <[email protected]>
Co-authored-by: Shannon Wells <[email protected]>
Additional simplification: Since we've already removed the concept of ownership (deferred until future implementation of ENS and DAO on Frequency), the concept of a "Protocol" or "Namespace" as a primary entity seems superfluous. It can easily be introduced later with minimal effort if we want, but for now it seems sufficient to keep a model similar to the current on-chain storage for schema names; ie a double map Thoughts? |
Schema versionsI think there needs to be a way to reject messages for unsupported versions, i.e. ones that are past "deprecation". If app developers don't have to upgrade, it's guaranteed that some never will, and it seems that for social graph in particular, that will cause problems. I would want to see another field indicating whether the schema version is still valid for the protocol, and Gateway could even reject messages for that version if it's no longer supported. Maybe that feature doesn't need to be part of this design doc though. I thought we had discussed schema states at some point, like - current --> deprecated --> readonly. In your example, what if Provider A just insists on updating the graph using Schema v1 while everyone else is using Schema v2? Then I guess it's left to the user to figure out how to pressure Provider A to upgrade to the new version, or withdraw their delegation & stop using the app? Governance trustThis is getting even more off-piste, but I think it would be a good idea to push out announcements for these types of changes to end users. At first I thought should be a DSNP message type that is basically a system announcement (e.g. ProtocolChange, SchemaUpgrade, etc) but really, a Broadcast would be fine, and wouldn't require Providers to make any code changes. So Frequency Foundation can have a Provider account and post messages that can be picked up by any other Provider. Those users can go and vote on it (if they have rewards) or at least look at it, and decide what to do, if anything. |
To mitigate these issues, we will store data indexed by `IntentId` instead of `SchemaId`. This will require the | ||
encapsulated storage payload to contain an additional piece of meta-information; specifically, the concrete `SchemaId` | ||
that was used to format the encapsulated data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand this correctly. Would you mind elaborate more on this?
For the `stateful-storage` pallet, it would be necessary to re-write all storage pages to include the concrete | ||
`SchemaId` in the page header. How to accomplish this for ~1M user graphs and graph keys on-chain requires further | ||
analysis. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure why we need to do this for graph schemas. How often do they really change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not specifically graph data; that's merely the main data that currently uses that pallet. In future, I imagine other sorts of data will also use stateful-storage
; the point is on-chain data is expensive to migrate, so minimizing the impact of minor storage format changes so that migrations aren't required seems like a valuable change to make early on.
- Intents must be approved by governance, and are immutable except for appending new SchemaIds to the version list | ||
- New Schemas must be approved by governance | ||
- Minor Schema updates (semantic-preserving format changes, etc) may be approved for the same Intent | ||
- Major updates (change in meaning or semantics, or significant breaking format change) require publishing under a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is "major updates" rule enforced, with governance? Are we assuming, for example, that the Council have the ability to reject proposals for a radically different schema version?
StorageMap<_, Twox64Concat, DelegationGroupId, DelegationGroup, OptionQuery>; | ||
``` | ||
|
||
#### Name Registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note for discussion:
Instead of a monolithic name registry, we could define Protocol
as a first-level storage entity with an ID and a name mapping name->ProtocolId; then define separate maps (ProtocolId, Descriptor) -> IntentId, and (ProtocolId, Descriptor) -> DelegationGroupId
Not sure what that buys us, but it's an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing it buys is smaller, cheaper extrinsic payloads when creating/updating Intents and DelegationGroups--instead of passing both payload and descriptor names, could pass just ProtocolId
and descriptor name.
structure that will require migration: | ||
|
||
* `payload_location` & `settings` will migrate to `IntentInfo` | ||
* addition of `intent_id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a separate section for migrations would be useful but assuming this is mostly migrating SchemaInfo but anything on delegation side ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
nit: a section for rpcs or runtime api, if any, would be useful
Edit: noticed a separate section for runtime calls, looks good 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to cover all the bases well, looks good now!
1. For each `SchemaNamespace` '\<protocol_name>' | ||
1. For each `SchemaDescriptor` '\<name>' at index `n` belonging to a '\<protocol_name>' | ||
* Create a new name mapping in the `NameRegistry` as `<protocol_name>.<name>_n` to `Intent(id)` | ||
3. Store the `messages` pallet cutover block number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally could seed the chain with some pre-defined Delegation Groups as part of the migration.
Description
In the current implementation, schemas are registered with immutable numeric identifiers (
SchemaId
) and describe the layout and storage semantics (e.g., Avro/Parquet formats, on-chain/off-chain storage). These schema IDs are used as references by clients and runtime modules alike, particularly in the delegation system defined by themsa
pallet.Delegations currently allow a user to authorize a provider (e.g., an app or service) to act on their behalf, but this authorization is tightly bound to a specific
SchemaId
. This model has proven limiting in several ways:These limitations have motivated a re-architecture of the schema and delegation systems to introduce the concepts of:
Closes #2265