Skip to content

Conversation

@nguyen-andrew
Copy link
Member

@nguyen-andrew nguyen-andrew commented Jan 28, 2026

Add an optional subject query parameter to the GET /schemas/ids/{id} endpoint. This allows specifying the context for schema lookup by extracting the context from the provided subject name (e.g., :.myctx:mysubject).

Fixes CORE-15194

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

@nguyen-andrew nguyen-andrew self-assigned this Jan 28, 2026
Copilot AI review requested due to automatic review settings January 28, 2026 22:53
@nguyen-andrew nguyen-andrew requested a review from a team as a code owner January 28, 2026 22:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional subject query parameter to the GET /schemas/ids/{id} endpoint to enable context-aware schema lookups. When provided, the subject name (which can include context in the format :.context:subject) is used to determine the context for retrieving the schema, rather than always using the default context.

Changes:

  • Modified the endpoint handler to parse and use the optional subject parameter to extract context information
  • Updated the Python test client to support passing the subject parameter
  • Added comprehensive test coverage verifying the new parameter works correctly with default and non-default contexts
  • Updated API documentation to describe the new query parameter

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/v/pandaproxy/schema_registry/handlers.cc Added logic to parse the optional subject query parameter, extract context from it, and use the context when looking up schemas by ID
tests/rptest/tests/schema_registry_test.py Updated test client method to accept subject parameter and added comprehensive test case covering various scenarios
src/v/pandaproxy/api/api-doc/schema_registry.json Added documentation for the new subject query parameter

@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from 6118d1f to fa5356c Compare January 28, 2026 23:22
@nguyen-andrew
Copy link
Member Author

Force push to rebase on latest dev.

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#79806
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/79806#019c06fb-10a4-47cf-a3ce-73980f76c3be FLAKY 19/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0206, p0=0.3402, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition

auto subject_param = parse::query_param<std::optional<ss::sstring>>(
*rq.req, "subject");

// Extract context from subject, or use default context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behaviour is a bit trickier here unfortunately:

# Register the same schema in two contexts
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.prod:Ad
dress/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%

% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.shared:Address/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%


# Query the schemas

% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:"
{"subject":":.shared:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688260232,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=Address"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=:.:"
{"error_code":40403,"message":"Schema 1 not found"}%

% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%

% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%

I think we might need to treat the subject parameter differently depending on whether it contains only a context (empty subject) or a real subject.

To be honest, we might be able to get away with partial support of the parameter, by throwing if it contains a real subject, and not just a context. But if we can implement it fully, that would be great.

"required": false,
"type": "string",
"description": "Redpanda version 25.2 or later. For Avro and Protobuf schemas only. Supported values: an empty string `''` returns the schema in its current format (default), and `serialized` (Protobuf only) returns the schema in its Base64-encoded wire binary format. Unsupported values return a 501 error."
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer in the ticket, but can you please also implement this for GET /schemas/ids/{id}/versions and GET /schemas/ids/{id}/schema too, in addition to GET /schemas/ids/{id}?

@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch 4 times, most recently from 3a8f4d2 to b6d4e37 Compare February 2, 2026 04:27
@nguyen-andrew
Copy link
Member Author

Comment on lines 177 to 180
// The schema ID is not associated with the given subject in the
// given context.
schema_subjects = {};
co_return std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure we AuthZ even if there is no match found. Perhaps we could just return the schema_subjects from this function and do the authZ one function call up.

std::optional<request_auth_result>& auth_result,
schema_id id,
context_subject ctx_sub) {
const context& ctx = ctx_sub.ctx().empty() ? default_context : ctx_sub.ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this bit? How could we get here with ctx_sub.ctx().empty()? I think we should make sure that ctx_sub.ctx() is valid (e.g. the default_context) further up the chain, instead of special casing for an invalid context down here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's not really a case that would happen in the current flow, but I had it for general defensive programming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually try to avoid this pattern in internal functions. IMO, defensive checks are most useful at system boundaries (for example when validating raw user input). Internally, I think it’s better to rely on the guarantees of our types and invariants, otherwise we risk misleading future readers about what inputs are actually expected or valid.

Update context_subject::from_string to handle context-only strings.
Rename is_default_context() to is_default_context_only() to better
reflect its behavior: it returns true only when the context is the
default context AND the subject is empty (context-only).

Add a new method is_non_default_context() that checks if a subject is
in a non-default context. This will be used in future changes to handle
subject query parameters.
Add an optional `subject` query parameter to control schema lookup
context and subject restriction. The parameter value is parsed using
context_subject::from_string(), which extracts a context substring and
a subject substring from the input.

Lookup behavior:
- No parameter: search default context without subject restriction
  (existing behavior)
- Context only (e.g., ":.ctx:"): search the specified context without
  subject restriction
- Qualified (e.g., ":.ctx:sub"): search the specified context,
  restricted to the subject substring
- Unqualified (e.g., "sub" or ":.:sub"): search the default context
  restricted to the subject substring; if not found, search all other
  contexts; if still not found, fall back to the default context
  without subject restriction

A subject parameter is "unqualified" if it resolves to the default
context, either implicitly (no context substring) or explicitly
(context substring is ".").
Add test for the `subject` query parameter on GET /schemas/ids/{id},
verifying that it correctly extracts context for schema lookup.
Also update the test client to support the new parameter.
@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from b6d4e37 to 26fe7f5 Compare February 2, 2026 20:31
@nguyen-andrew
Copy link
Member Author

Force push to address PR comments.

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AuthZ logic is not quite correct yet. I think the simplest behaviour we could implement here that is consistent with the earlier "allow the lookup if we have access through any subject" behaviour is that resolve_schema_across_contexts could look up both the schema definition as well as the list of subjects that would provide access to the schema, and then call the AuthZ handler only once from the handler for the full list of subjects that would provide access.

Comment on lines +183 to +192
// Ensure requester is authorized to access at least one of the subjects
// associated with the schema ID in the given context.
enterprise::handle_get_schemas_ids_id_authz(
rq, auth_result, schema_subjects);

if (schema_subjects.empty()) {
// The schema ID is not associated with any subject that the requester
// is authorized to access.
co_return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_get_schemas_ids_id_authz throws if either schema_subjects is empty as an input, or if all the subjects in schema_subjects get filtered out. So the L189-190 is not quite true. It rather corresponds to the case when AuthZ is disabled and we didn't find any matching subjects to look up the schema id under.


// Ensure requester is authorized to access at least one of the subjects
// associated with the schema ID in the given context.
enterprise::handle_get_schemas_ids_id_authz(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current logic will fail when:

  1. I have access to all subjects
  2. There are two contexts, ctx1 and ctx2; with ctx1 empty and ctx2 has a single subject, sub2 with version 1 having schema id 1
  3. I call /schemas/ids/1?subjects=sub1
  4. This will fail on AuthZ because the first try_get_schema_definition call will be in the default context under :.:sub1, which doesn't exist.

That seems incorrect because I should be able to look up the schema under :ctx2:sub1 given that I have access to all subjects.

Comment on lines +578 to +583
// Parse optional subject query parameter to extract context
auto subject_param = parse::query_param<std::optional<ss::sstring>>(
*rq.req, "subject")
.value_or("");

// With deferred schema validation, there might be a schema that
// had invalid references. These might have already been posted, so
// we need to sync
co_await rq.service().writer().read_sync();
auto ctx_sub = context_subject::from_string(subject_param);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the code would be clearer if we did not use value_or this early, but instead had ctx_sub as a std::optional<context_subject>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants