Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions text/0069-schema-libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Schema libraries

## Related issues and PRs

- Reference Issues: Inspired by [RFC 58], but includes new material not present in RFC 58
- Implementation PR(s):

## Timeline

- Started: 2024-06-07

## Summary

Allow Cedar schemas to include/import "libraries" of definitions from remote
URLs.

This RFC does not propose that the Cedar team would build or maintain any such
libraries; it only proposes the mechanism for importing libraries from URLs.

## Basic example

Human schema format:
```
import "https://raw.githubusercontent.com/cedar-policy/cedar-examples/release/3.2.x/cedar-example-use-cases/document_cloud/document_cloud.cedarschema"
import "https://example.com/cedar_schemas/oidc.cedarschema"
import "https://example.com/cedar_schemas_json/foobar.cedarschema.json"

namespace "MyApp" {
...
}
```

JSON schema format:
```
{
"imports" : [
"https://raw.githubusercontent.com/cedar-policy/cedar-examples/release/3.2.x/cedar-example-use-cases/document_cloud/document_cloud.cedarschema",
"https://example.com/cedar_schemas/oidc.cedarschema",
"https://example.com/cedar_schemas_json/foobar.cedarschema.json"
],
"MyApp": {
...
}
}
```

## Motivation

Some data sources are common across many applications and useful to many Cedar
users, either within the same organization or even across organizations.

### 1. Within the same organization

This RFC would allow an organization to define its own libraries of schema
definitions, which could be reused across many different schemas (say, for
different webapps owned by the organization).
For instance, the organization might have common definitions of `User` or
`Account` that apply in many different applications, and although those
applications may not want to share entire Cedar schemas, with this RFC they
could share just the definitions of `User` or `Account`, which could be defined
once in a central location (in the same or separate libraries).

### 2. Across organizations

For another motivating example, consider identity providers (IdPs) which comply
with the OpenID Connect standard (OIDC).
The OIDC standard includes a list of attributes that exist on a user type; this
is naturally declared as a Cedar entity type.
With this RFC, anyone could provide a "library" representing Cedar definitions for
OIDC types, and provide that library as a Cedar schema file at some URL; and then
other Cedar users could use those definitions simply by importing the file from that URL.
This would allow the Cedar community to gradually coalesce on the "best" way to
represent an OIDC user in Cedar.

### Motivations common to both scenarios
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have mechanism to use a particular namespace? For instance, say I'm building an app where everything is under the AWS namespace. It would be nice if I could just use AWS::IdentityCenter and then policy authors only need to specify the sub-entity. e.g. User vs AWS::IdentityCenter::User.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can totally see this being useful, but I'm torn whether it should be part of this RFC or a separate RFC.

It would be harder to envision what a use would look like for policies (as opposed to schemas); I'm not sure if users will be happy with a use mechanism for schemas but not having one for policies?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its most useful for policies IMO. Perhaps a separate RFC is better.


In both of the above scenarios (within-organization and cross-organization),
we obtain three key benefits:
1. Saving each Cedar user the effort of writing common declarations themselves.
This facilitates code reuse in schemas, and makes it easier to get started
with Cedar.
2. Providing a way to define common types and actions in a centralized way,
which ensures many schemas agree on the "correct" or "best practices"
definitions, and provides a single place to make updates if updates are
required.
3. Facilitating code reuse for Cedar authorization calls, not just schemas.
When everyone shares a common definition of `OIDC::User`, the community could
conceivably converge on a reusable library function for, e.g., converting an
OIDC token into Cedar entity data.
This would further make it easier to get started with Cedar.
(Note that this RFC does not propose the Cedar team writing or maintaining
either library definitions for use in schemas or library functions for use in
Cedar authorization calls. It only points out that the community could
converge on these things.)

## Detailed design

Import statements are only allowed at the top level, outside of all namespace
declarations.
(In the future, another RFC could propose allowing imports in other positions.)

The target of the import must be a raw file containing a valid Cedar schema.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they instead be a namespace? And then we leave providing the schemas from a location up to an upstream process?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of like how Smithy does it: https://smithy.io/1.0/spec/core/idl.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they instead be a namespace?

I might be misunderstanding something -- this RFC as-written allows importing items in a namespace; in fact, all declared items in the library (import target) must be in a (possibly empty) namespace, as otherwise it wouldn't be a valid Cedar schema.

You might be looking for a mechanism for "opening" a namespace, i.e., to use items from a namespace unqualified in the rest of the schema/policies? (Akin to Rust's use.) I think it makes sense to separate that into a different RFC; this one is about providing definitions from reusable libraries, while another one could be about making namespaced definitions usable without qualification.

(I realize that Java's import, and maybe imports in other languages, provide both functionalities -- introducing the library definitions and also making them available for use unqualified. The import being proposed here provides only the first functionality. For that reason maybe we should consider a different keyword to avoid confusion?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about how I would use this for a specific use-case I have now. But after thinking a bit more, I think the answer is: I wouldn't. In my case, I want to define my schema dependencies in "packages" and then use a package/dependency manager tool to declare my dependencies and deal with retrieving the artifacts. All I really need from the Cedar library is that ability to load a schema from a set of files (you can already mostly do this but there are some sharp edges.) I do think the "opening" a namespace feature would be useful, but I agree thats a separate RFC.

But that raises the question: do we want "import" in the schema spec at all? Are we bringing unnecessary complexity into Cedar that is best left to some upstream system more suited to deal with it (e.g. a package manager)? Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can already mostly do this but there are some sharp edges

Can you talk more about the sharp edges? We'd like to track these as issues :)

Are we bringing unnecessary complexity into Cedar that is best left to some upstream system more suited to deal with it (e.g. a package manager)?

Valid point; this kind of question was one reason for posting the RFC. Interested to see if a majority of folks feel this way, or if there are folks who feel that import is solving a problem for them. Probably this will become clear as this RFC generates more feedback.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest sharp edge was I wanted to load each schema file as a SchemaFragment and then produce the final schema and validate. However, there were cases where it would fail to validate the fragment since it referenced things in other fragments. So I end up handing to convert all the schemas to the human readable format and append them together. Then load and validate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. In my understanding, referencing things in other fragments should "just work", and if it doesn't, that is a bug. If you have any specific examples, it would be great if we could get reproducers so we can fix the bugs.

This schema may contain any definitions that are valid today in Cedar schemas,
including namespaces, entity type definitions, common type definitions, and
action declarations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the actions in a schema can impact authorization, a schema library adding or removing an action from a group could result in unexpected authorization decisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you propose that libraries cannot include action declarations, and can only include entity type definitions and common type definitions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't that's worth it. Presumably, if you're using a schema library, you're consuming entity data from the IdP as well. The issue is just that you might not expect a schema to impact authorization. Sub-resource integrity plus documentation should mitigate this concern.

The validator will essentially concatenate all of these definitions into the
schema at the location of the `import` statement.

Cedar will autodetect whether the imported schema is a human-format or
JSON-format schema.
(Today, there are no strings that are both valid human-format and valid
JSON-format schemas; this RFC proposes encoding that as a design principle in
perpetuity.)
In particular, all valid JSON-format schemas must have `{` as their first
non-whitespace character, and no valid human-format schemas have `{` as their
first non-whitespace character.

This RFC does not propose any mechanism for versioning libraries.
Instead, it proposes that versioning would be done _above_ the Cedar layer,
i.e., should be the responsibility of library authors.
For instance, library authors could provide a different URL for different
versions of their library, avoiding changing the contents of the URL for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide something akin to sub resource integrity so users can guard against a changing import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing here the summary of an offline conversation -- it makes a lot of sense to have some feature like this, but since including hashes directly in the schema file is a little ugly, maybe we'd want a separate lockfile, and at that point we might want to consider some separate Cedar.toml file (analogous to Cargo.toml) where you could declare all your dependencies, and possibly other global configuration. Does the long-term path see Cedar having something more and more equivalent to cargo, complete with dependency/lockfile manager, package repository, etc?

existing versions of the library.
This RFC doesn't preclude later adding a versioning feature, in which case the
syntax proposed in this RFC would be interpreted as "import the latest version
of this library".

In the JSON format, we do not need to reserve the namespace named `"imports"`:
if `"imports"` maps to a JSON object, it represents the namespace `"imports"`,
while if `"imports"` maps to a JSON array, it represents import statements as
defined in this RFC.

## Drawbacks

1. The Cedar validator, and other tools that rely on schemas, will have to make
network calls in order to perform their jobs. This has availability and latency
implications which may not be acceptable for some users. Of course, those users
could simply not use this feature.
2. Cedar schemas would no longer be self-contained, in that a single (hopefully
readable) file contains all of the relevant definitions. To mitigate this, we
could provide a utility that displays the schema with all imports expanded.
3. The Cedar Rust code would have to bring in substantial new dependencies, so
that it could download libraries from remote URLs. To mitigate this for users
who are concerned about this and don't need/want this feature (e.g., in
resource-constrained environments, offline environments, Wasm, etc), we could
put this RFC's functionality behind a Cargo feature, so that it and its
Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly local file system dependency enabled by default, with network dependencies behind a flag? Or maybe we could consider not including network dependencies at all if we think this is a large concern. Though that option would doubtlessly lead us to wanting to build a separate dependency management tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment makes me realize I should definitely explicitly call out allowing local file system URIs in addition to URLs, which is not currently made explicit in the RFC. I'll amend

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at what IonSchema does with their "SchemaAuthority": https://amazon-ion.github.io/ion-schema/docs/isl-2-0/spec. Essentially decoupling the loading of dependencies from their declaration.

But I'm wondering if we really need/want to bring this into Cedar. (See my other comment.)

dependencies could be opted-into / opted-out-of at compile time. (This RFC
proposes it would be enabled by default, but the Cargo feature would allow users
to compile-time disable it.)
4. Implementation complexity for the Cedar validator and other tools that rely
on schemas.

This is not a breaking change for any existing Cedar users.
All existing valid Cedar schemas remain valid.

## Alternatives

### Alternative A: Distribute libraries without the `import` mechanism

Cedar already supports schemas spread over multiple files, in APIs like
[`Schema::from_schema_fragments()`].
So, users could reasonably easily distribute and use libraries today, without
any `import` mechanism.
When calling Cedar APIs, they would provide library schema files in addition to
the rest of their schema.

### Alternative B: Explicit declaration of human/JSON format, not autodetection

Instead of the autodetection mechanism described above, we could require schema
authors to explicitly indicate whether they are importing a human-format or
JSON-format library.
For instance, in the human schema format, this could look like
```import [json] "https://..."```
(where the absence of `[json]` would indicate the human format, since Cedar
positions that as the default format).

[RFC 58]: https://github.com/cedar-policy/rfcs/pull/58
[`Schema::from_schema_fragments()`]: https://docs.rs/cedar-policy/latest/cedar_policy/struct.Schema.html#method.from_schema_fragments