[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451

nguyen-andrew · 2026-01-28T22:53:09Z

Add an optional subject query parameter to the GET /schemas/ids/{id} endpoint. This allows specifying the context for schema lookup by extracting the context from the provided subject name (e.g., :.myctx:mysubject).

Fixes CORE-15194

Backports Required

Release Notes

none

Copilot

Pull request overview

This PR adds an optional subject query parameter to the GET /schemas/ids/{id} endpoint to enable context-aware schema lookups. When provided, the subject name (which can include context in the format :.context:subject) is used to determine the context for retrieving the schema, rather than always using the default context.

Changes:

Modified the endpoint handler to parse and use the optional subject parameter to extract context information
Updated the Python test client to support passing the subject parameter
Added comprehensive test coverage verifying the new parameter works correctly with default and non-default contexts
Updated API documentation to describe the new query parameter

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/v/pandaproxy/schema_registry/handlers.cc	Added logic to parse the optional `subject` query parameter, extract context from it, and use the context when looking up schemas by ID
tests/rptest/tests/schema_registry_test.py	Updated test client method to accept `subject` parameter and added comprehensive test case covering various scenarios
src/v/pandaproxy/api/api-doc/schema_registry.json	Added documentation for the new `subject` query parameter

nguyen-andrew · 2026-01-28T23:22:36Z

Force push to rebase on latest dev.

vbotbuildovich · 2026-01-29T00:49:32Z

CI test results

test results on build#79806

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ScalingUpTest	test_fast_node_addition	null	integration	https://buildkite.com/redpanda/redpanda/builds/79806#019c06fb-10a4-47cf-a3ce-73980f76c3be	FLAKY	19/21	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0206, p0=0.3402, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition

pgellert · 2026-01-29T12:15:43Z

src/v/pandaproxy/schema_registry/handlers.cc

+    auto subject_param = parse::query_param<std::optional<ss::sstring>>(
+      *rq.req, "subject");
+
+    // Extract context from subject, or use default context


I think the behaviour is a bit trickier here unfortunately:

# Register the same schema in two contexts % curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.prod:Ad dress/versions {"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}% % curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.shared:Address/versions {"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}% # Query the schemas % curl "http://localhost:8081/schemas/ids/1?subject=:.shared:" {"subject":":.shared:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688260232,"deleted":false}% % curl "http://localhost:8081/schemas/ids/1?subject=:.prod:" {"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}% % curl "http://localhost:8081/schemas/ids/1?subject=Address" {"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}% % curl "http://localhost:8081/schemas/ids/1?subject=:.:" {"error_code":40403,"message":"Schema 1 not found"}% % curl "http://localhost:8081/schemas/ids/1?subject=:.prod:NotAddress" {"error_code":40403,"message":"Schema 1 not found"}% % curl "http://localhost:8081/schemas/ids/1?subject=:.shared:NotAddress" {"error_code":40403,"message":"Schema 1 not found"}%

I think we might need to treat the subject parameter differently depending on whether it contains only a context (empty subject) or a real subject.

To be honest, we might be able to get away with partial support of the parameter, by throwing if it contains a real subject, and not just a context. But if we can implement it fully, that would be great.

pgellert · 2026-01-29T12:16:30Z

src/v/pandaproxy/api/api-doc/schema_registry.json

            "required": false,
            "type": "string",
            "description": "Redpanda version 25.2 or later. For Avro and Protobuf schemas only. Supported values: an empty string `''` returns the schema in its current format (default), and `serialized` (Protobuf only) returns the schema in its Base64-encoded wire binary format. Unsupported values return a 501 error."
+          },


Sorry, I should have been clearer in the ticket, but can you please also implement this for GET /schemas/ids/{id}/versions and GET /schemas/ids/{id}/schema too, in addition to GET /schemas/ids/{id}?

nguyen-andrew · 2026-02-02T04:28:02Z

Force pushes:

src/v/pandaproxy/schema_registry/types.cc

src/v/pandaproxy/schema_registry/types.h

src/v/pandaproxy/schema_registry/handlers.cc

pgellert · 2026-02-02T10:21:00Z

src/v/pandaproxy/schema_registry/handlers.cc

+            // The schema ID is not associated with the given subject in the
+            // given context.
+            schema_subjects = {};
+            co_return std::nullopt;


We need to make sure we AuthZ even if there is no match found. Perhaps we could just return the schema_subjects from this function and do the authZ one function call up.

pgellert · 2026-02-02T10:25:25Z

src/v/pandaproxy/schema_registry/handlers.cc

+  std::optional<request_auth_result>& auth_result,
+  schema_id id,
+  context_subject ctx_sub) {
+    const context& ctx = ctx_sub.ctx().empty() ? default_context : ctx_sub.ctx;


Why do we need this bit? How could we get here with ctx_sub.ctx().empty()? I think we should make sure that ctx_sub.ctx() is valid (e.g. the default_context) further up the chain, instead of special casing for an invalid context down here.

Yea it's not really a case that would happen in the current flow, but I had it for general defensive programming.

I usually try to avoid this pattern in internal functions. IMO, defensive checks are most useful at system boundaries (for example when validating raw user input). Internally, I think it’s better to rely on the guarantees of our types and invariants, otherwise we risk misleading future readers about what inputs are actually expected or valid.

tests/rptest/tests/schema_registry_test.py

src/v/pandaproxy/schema_registry/handlers.cc

src/v/pandaproxy/api/api-doc/schema_registry.json

Update context_subject::from_string to handle context-only strings.

Rename is_default_context() to is_default_context_only() to better reflect its behavior: it returns true only when the context is the default context AND the subject is empty (context-only). Add a new method is_non_default_context() that checks if a subject is in a non-default context. This will be used in future changes to handle subject query parameters.

Add an optional `subject` query parameter to control schema lookup context and subject restriction. The parameter value is parsed using context_subject::from_string(), which extracts a context substring and a subject substring from the input. Lookup behavior: - No parameter: search default context without subject restriction (existing behavior) - Context only (e.g., ":.ctx:"): search the specified context without subject restriction - Qualified (e.g., ":.ctx:sub"): search the specified context, restricted to the subject substring - Unqualified (e.g., "sub" or ":.:sub"): search the default context restricted to the subject substring; if not found, search all other contexts; if still not found, fall back to the default context without subject restriction A subject parameter is "unqualified" if it resolves to the default context, either implicitly (no context substring) or explicitly (context substring is ".").

…{id}

Add test for the `subject` query parameter on GET /schemas/ids/{id}, verifying that it correctly extracts context for schema lookup. Also update the test client to support the new parameter.

nguyen-andrew · 2026-02-02T20:31:44Z

Force push to address PR comments.

pgellert

I think the AuthZ logic is not quite correct yet. I think the simplest behaviour we could implement here that is consistent with the earlier "allow the lookup if we have access through any subject" behaviour is that resolve_schema_across_contexts could look up both the schema definition as well as the list of subjects that would provide access to the schema, and then call the AuthZ handler only once from the handler for the full list of subjects that would provide access.

pgellert · 2026-02-03T13:36:32Z

src/v/pandaproxy/schema_registry/handlers.cc

+    // Ensure requester is authorized to access at least one of the subjects
+    // associated with the schema ID in the given context.
+    enterprise::handle_get_schemas_ids_id_authz(
+      rq, auth_result, schema_subjects);
+
+    if (schema_subjects.empty()) {
+        // The schema ID is not associated with any subject that the requester
+        // is authorized to access.
+        co_return std::nullopt;
+    }


handle_get_schemas_ids_id_authz throws if either schema_subjects is empty as an input, or if all the subjects in schema_subjects get filtered out. So the L189-190 is not quite true. It rather corresponds to the case when AuthZ is disabled and we didn't find any matching subjects to look up the schema id under.

pgellert · 2026-02-03T13:44:28Z

src/v/pandaproxy/schema_registry/handlers.cc

+
+    // Ensure requester is authorized to access at least one of the subjects
+    // associated with the schema ID in the given context.
+    enterprise::handle_get_schemas_ids_id_authz(


I think the current logic will fail when:

I have access to all subjects

There are two contexts, ctx1 and ctx2; with ctx1 empty and ctx2 has a single subject, sub2 with version 1 having schema id 1

I call /schemas/ids/1?subjects=sub1

This will fail on AuthZ because the first try_get_schema_definition call will be in the default context under :.:sub1, which doesn't exist.

That seems incorrect because I should be able to look up the schema under :ctx2:sub1 given that I have access to all subjects.

pgellert · 2026-02-03T13:56:16Z

src/v/pandaproxy/schema_registry/handlers.cc

+    // Parse optional subject query parameter to extract context
+    auto subject_param = parse::query_param<std::optional<ss::sstring>>(
+                           *rq.req, "subject")
+                           .value_or("");

-    // With deferred schema validation, there might be a schema that
-    // had invalid references. These might have already been posted, so
-    // we need to sync
-    co_await rq.service().writer().read_sync();
+    auto ctx_sub = context_subject::from_string(subject_param);


I'm wondering if the code would be clearer if we did not use value_or this early, but instead had ctx_sub as a std::optional<context_subject>.

nguyen-andrew requested a review from pgellert January 28, 2026 22:53

nguyen-andrew self-assigned this Jan 28, 2026

Copilot AI review requested due to automatic review settings January 28, 2026 22:53

nguyen-andrew requested a review from a team as a code owner January 28, 2026 22:53

github-actions bot added the area/redpanda label Jan 28, 2026

Copilot AI reviewed Jan 28, 2026

View reviewed changes

nguyen-andrew force-pushed the sr/subject-query-param branch from 6118d1f to fa5356c Compare January 28, 2026 23:22

pgellert reviewed Jan 29, 2026

View reviewed changes

nguyen-andrew force-pushed the sr/subject-query-param branch 4 times, most recently from 3a8f4d2 to b6d4e37 Compare February 2, 2026 04:27

pgellert reviewed Feb 2, 2026

View reviewed changes

nguyen-andrew added 5 commits February 2, 2026 18:48

sr/types: Update context_subject::from_string

81282af

Update context_subject::from_string to handle context-only strings.

schema_registry/swagger: document subject param for GET /schemas/ids/…

3c45591

…{id}

schema_registry/dt: test context lookup via subject param

26fe7f5

Add test for the `subject` query parameter on GET /schemas/ids/{id}, verifying that it correctly extracts context for schema lookup. Also update the test client to support the new parameter.

nguyen-andrew force-pushed the sr/subject-query-param branch from b6d4e37 to 26fe7f5 Compare February 2, 2026 20:31

pgellert reviewed Feb 3, 2026

View reviewed changes

[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451

Are you sure you want to change the base?

[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451

Conversation

nguyen-andrew commented Jan 28, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

nguyen-andrew commented Jan 28, 2026

Uh oh!

vbotbuildovich commented Jan 29, 2026

CI test results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguyen-andrew commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nguyen-andrew commented Feb 2, 2026

Uh oh!

pgellert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nguyen-andrew commented Jan 28, 2026 •

edited by atlassian bot

Loading