Skip to content

webhook: Admission metrics are not reported by status code #2972

@ahmetb

Description

@ahmetb

This is several bugs in one bug since it appears multiple things are broken. Apologies if it gets too complicated.

Problem 1: It appears that controller-runtime webhooks always respond with HTTP status code 200 OK even if the request is completely malformed.

Example (sending a malformed request to a webhook endpoint):

curl -k -v https://localhost:9443/validate-ship-example-org-v1beta1-frigate
< HTTP/2 200
< ... other response headers ...
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}

Problem 2: controller-runtime webhooks do not use the metav1.Status.Code and always respond with HTTP status code 200 OK.

Example code I used to register my custom handler at /test:

webhookServer.Register("/test", &webhook.Admission{Handler: &myHandler{}})
...

type myHandler struct{}
func (*myHandler) Handle(context.Context, admission.Request) admission.Response {
	return admission.Response{
		AdmissionResponse: admissionv1.AdmissionResponse{
			Allowed: false,
			Result: &metav1.Status{
				Code:    403,
				Message: "things suck",
				Status:  "Failure",
				Reason:  metav1.StatusReasonForbidden}}}
}

And here's the query I am making:

curl -XPOST -k -v -H "Content-Type: application/json" https://localhost:9443/test

< HTTP/2 200
...
<
{"kind":"AdmissionReview","apiVersion":"admission.k8s.io/v1","response":{"uid":"","allowed":false,"status":{"metadata":{},"status":"Failure","message":"things suck","reason":"Forbidden","code":403}}}

Problem 3: controller-runtime doesn't expose any metrics about rejected requests and their .status.codes. This is likely because

  1. we always report code=200 no matter the response (Problem 1 & 2)
  2. it appears we're capping controller_runtime_webhook_requests_total{code=...} dimension to only 200 and 500 here:
    lat := RequestLatency.MustCurryWith(lbl)
    cnt := RequestTotal.MustCurryWith(lbl)
    gge := RequestInFlight.With(lbl)
    // Initialize the most likely HTTP status codes.
    cnt.WithLabelValues("200")
    cnt.WithLabelValues("500")

so even if I make a faulty request, only the code=200 metric goes up:

controller_runtime_webhook_requests_total{code="200",webhook="/test"} 4

In an ideal state, I'd like to see controller_runtime_webhook_requests_total metric actually let me understand things like how many requests I'm denying (non-200 codes, such as 403, 400, 429, ...).

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions