-
Notifications
You must be signed in to change notification settings - Fork 713
[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an optional subject query parameter to the GET /schemas/ids/{id} endpoint to enable context-aware schema lookups. When provided, the subject name (which can include context in the format :.context:subject) is used to determine the context for retrieving the schema, rather than always using the default context.
Changes:
- Modified the endpoint handler to parse and use the optional
subjectparameter to extract context information - Updated the Python test client to support passing the
subjectparameter - Added comprehensive test coverage verifying the new parameter works correctly with default and non-default contexts
- Updated API documentation to describe the new query parameter
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/v/pandaproxy/schema_registry/handlers.cc | Added logic to parse the optional subject query parameter, extract context from it, and use the context when looking up schemas by ID |
| tests/rptest/tests/schema_registry_test.py | Updated test client method to accept subject parameter and added comprehensive test case covering various scenarios |
| src/v/pandaproxy/api/api-doc/schema_registry.json | Added documentation for the new subject query parameter |
6118d1f to
fa5356c
Compare
|
Force push to rebase on latest dev. |
CI test resultstest results on build#79806
|
| auto subject_param = parse::query_param<std::optional<ss::sstring>>( | ||
| *rq.req, "subject"); | ||
|
|
||
| // Extract context from subject, or use default context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behaviour is a bit trickier here unfortunately:
# Register the same schema in two contexts
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.prod:Ad
dress/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.shared:Address/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%
# Query the schemas
% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:"
{"subject":":.shared:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688260232,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=Address"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.:"
{"error_code":40403,"message":"Schema 1 not found"}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%I think we might need to treat the subject parameter differently depending on whether it contains only a context (empty subject) or a real subject.
To be honest, we might be able to get away with partial support of the parameter, by throwing if it contains a real subject, and not just a context. But if we can implement it fully, that would be great.
| "required": false, | ||
| "type": "string", | ||
| "description": "Redpanda version 25.2 or later. For Avro and Protobuf schemas only. Supported values: an empty string `''` returns the schema in its current format (default), and `serialized` (Protobuf only) returns the schema in its Base64-encoded wire binary format. Unsupported values return a 501 error." | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer in the ticket, but can you please also implement this for GET /schemas/ids/{id}/versions and GET /schemas/ids/{id}/schema too, in addition to GET /schemas/ids/{id}?
3a8f4d2 to
b6d4e37
Compare
| // The schema ID is not associated with the given subject in the | ||
| // given context. | ||
| schema_subjects = {}; | ||
| co_return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure we AuthZ even if there is no match found. Perhaps we could just return the schema_subjects from this function and do the authZ one function call up.
| std::optional<request_auth_result>& auth_result, | ||
| schema_id id, | ||
| context_subject ctx_sub) { | ||
| const context& ctx = ctx_sub.ctx().empty() ? default_context : ctx_sub.ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this bit? How could we get here with ctx_sub.ctx().empty()? I think we should make sure that ctx_sub.ctx() is valid (e.g. the default_context) further up the chain, instead of special casing for an invalid context down here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it's not really a case that would happen in the current flow, but I had it for general defensive programming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually try to avoid this pattern in internal functions. IMO, defensive checks are most useful at system boundaries (for example when validating raw user input). Internally, I think it’s better to rely on the guarantees of our types and invariants, otherwise we risk misleading future readers about what inputs are actually expected or valid.
Update context_subject::from_string to handle context-only strings.
Rename is_default_context() to is_default_context_only() to better reflect its behavior: it returns true only when the context is the default context AND the subject is empty (context-only). Add a new method is_non_default_context() that checks if a subject is in a non-default context. This will be used in future changes to handle subject query parameters.
Add an optional `subject` query parameter to control schema lookup context and subject restriction. The parameter value is parsed using context_subject::from_string(), which extracts a context substring and a subject substring from the input. Lookup behavior: - No parameter: search default context without subject restriction (existing behavior) - Context only (e.g., ":.ctx:"): search the specified context without subject restriction - Qualified (e.g., ":.ctx:sub"): search the specified context, restricted to the subject substring - Unqualified (e.g., "sub" or ":.:sub"): search the default context restricted to the subject substring; if not found, search all other contexts; if still not found, fall back to the default context without subject restriction A subject parameter is "unqualified" if it resolves to the default context, either implicitly (no context substring) or explicitly (context substring is ".").
Add test for the `subject` query parameter on GET /schemas/ids/{id},
verifying that it correctly extracts context for schema lookup.
Also update the test client to support the new parameter.
b6d4e37 to
26fe7f5
Compare
|
Force push to address PR comments. |
pgellert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the AuthZ logic is not quite correct yet. I think the simplest behaviour we could implement here that is consistent with the earlier "allow the lookup if we have access through any subject" behaviour is that resolve_schema_across_contexts could look up both the schema definition as well as the list of subjects that would provide access to the schema, and then call the AuthZ handler only once from the handler for the full list of subjects that would provide access.
| // Ensure requester is authorized to access at least one of the subjects | ||
| // associated with the schema ID in the given context. | ||
| enterprise::handle_get_schemas_ids_id_authz( | ||
| rq, auth_result, schema_subjects); | ||
|
|
||
| if (schema_subjects.empty()) { | ||
| // The schema ID is not associated with any subject that the requester | ||
| // is authorized to access. | ||
| co_return std::nullopt; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle_get_schemas_ids_id_authz throws if either schema_subjects is empty as an input, or if all the subjects in schema_subjects get filtered out. So the L189-190 is not quite true. It rather corresponds to the case when AuthZ is disabled and we didn't find any matching subjects to look up the schema id under.
|
|
||
| // Ensure requester is authorized to access at least one of the subjects | ||
| // associated with the schema ID in the given context. | ||
| enterprise::handle_get_schemas_ids_id_authz( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current logic will fail when:
- I have access to all subjects
- There are two contexts, ctx1 and ctx2; with ctx1 empty and ctx2 has a single subject, sub2 with version 1 having schema id 1
- I call
/schemas/ids/1?subjects=sub1 - This will fail on AuthZ because the first
try_get_schema_definitioncall will be in the default context under:.:sub1, which doesn't exist.
That seems incorrect because I should be able to look up the schema under :ctx2:sub1 given that I have access to all subjects.
| // Parse optional subject query parameter to extract context | ||
| auto subject_param = parse::query_param<std::optional<ss::sstring>>( | ||
| *rq.req, "subject") | ||
| .value_or(""); | ||
|
|
||
| // With deferred schema validation, there might be a schema that | ||
| // had invalid references. These might have already been posted, so | ||
| // we need to sync | ||
| co_await rq.service().writer().read_sync(); | ||
| auto ctx_sub = context_subject::from_string(subject_param); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the code would be clearer if we did not use value_or this early, but instead had ctx_sub as a std::optional<context_subject>.
Add an optional
subjectquery parameter to theGET /schemas/ids/{id}endpoint. This allows specifying the context for schema lookup by extracting the context from the provided subject name (e.g.,:.myctx:mysubject).Fixes CORE-15194
Backports Required
Release Notes