Skip to content

Correct tryTimeoutAndWriteError to write timeout regardless of prior writes #15900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented May 26, 2025

Fixes #15486

Proposed Changes

Previously, the function comment suggested it would only write errors if nothing had been written, but the implementation correctly only checks the timedOut flag. This allows timeout errors to be written even after a response has started, which is the desired behavior for handling slow responses.

  • Fixed misleading function comment
  • Updated test to match actual behavior
  • Added comprehensive test coverage

Release Note

NONE

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 26, 2025
@knative-prow knative-prow bot requested review from dprotaso, gauron99 and skonto May 26, 2025 16:14
Copy link

codecov bot commented May 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.93%. Comparing base (c36383e) to head (2ce75a8).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15900      +/-   ##
==========================================
- Coverage   80.94%   80.93%   -0.01%     
==========================================
  Files         210      210              
  Lines       16769    16769              
==========================================
- Hits        13573    13572       -1     
- Misses       2844     2845       +1     
  Partials      352      352              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

//
// If this writes an error, all subsequent calls to Write will
// result in http.ErrHandlerTimeout.
func (tw *timeoutWriter) tryTimeoutAndWriteError(msg string) bool {
tw.mu.Lock()
defer tw.mu.Unlock()

if tw.lastWriteTime.IsZero() {
if !tw.timedOut {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always true when nothing has been written?

Looking at the changes to the comment and the removal of lastWriteTime. (Sounded to me the "has not been written" part was added b/c of the lastWriteTime ?

Just wondering 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are two situations when we set the timeout flag:
1. When we reach the response start timeout: https://github.com/knative/serving/blob/main/pkg/http/handler/timeout.go#L248
2. When we hit the regular timeout: https://github.com/knative/serving/blob/main/pkg/http/handler/timeout.go#L230

Currently, both functions (tryResponseStartTimeoutAndWriteError and tryTimeoutAndWriteError) are identical and do the following:

Timeout (write an error header and set the timeout flag to true) if nothing has been written to the response.

This means that we never get inside the condition on line 228: either we write something and lastWriteTime.IsZero() is false, or we don’t write anything and fail earlier by response start timeout.

In my vision, tryTimeoutAndWriteError should always write the error, ignoring whether anything was written before or not. Checking the timedOut flag is needed to make this function idempotent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fedosin thanks for the further explanation. It makes sense to me even though I don't a deep knowledge of Serving's timeout handling.

Copy link
Member

@dprotaso dprotaso Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my vision, tryTimeoutAndWriteError should always write the error, ignoring whether anything was written before or not.

That would be unexpected for end-users - so we shouldn't do this.

If something is written on the wire already and a timeout occurs we should just close the connection.

@Fedosin Fedosin force-pushed the tryTimeoutAndWriteError branch from bad6248 to d2f583e Compare May 27, 2025 08:22
…writes

Previously, the function comment suggested it would only write errors if nothing
had been written, but the implementation correctly only checks the timedOut flag.
This allows timeout errors to be written even after a response has started,
which is the desired behavior for handling slow responses.

- Fixed misleading function comment
- Updated test to match actual behavior
- Added comprehensive test coverage
@Fedosin Fedosin force-pushed the tryTimeoutAndWriteError branch from d2f583e to 2ce75a8 Compare May 27, 2025 08:22
Copy link
Contributor

@dsimansk dsimansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2025
@dsimansk
Copy link
Contributor

PTAL
/assign @dprotaso

Copy link
Member

@matzew matzew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks for the details, @Fedosin

@dsimansk
Copy link
Contributor

dsimansk commented Jun 5, 2025

/approve

Copy link

knative-prow bot commented Jun 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dsimansk, Fedosin, matzew

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2025
@knative-prow knative-prow bot merged commit 80d7335 into knative:main Jun 5, 2025
68 checks passed
@dprotaso
Copy link
Member

dprotaso commented Jun 5, 2025

One odd thing I find weird about this package is if a time out happens - we should exit the for loop and close the connection - but we only do that conditionally for some reason which doesn't seem right. The idle timeout should be the only one that is looping.

for {
	select {
  		// ...
		case <-timeout.C():
			timeoutDrained = true
			if tw.tryTimeoutAndWriteError(h.body) {
				return
			}
	}
}

norman465 pushed a commit to src2img/knative-serving that referenced this pull request Jun 17, 2025
…writes (knative#15900)

Previously, the function comment suggested it would only write errors if nothing
had been written, but the implementation correctly only checks the timedOut flag.
This allows timeout errors to be written even after a response has started,
which is the desired behavior for handling slow responses.

- Fixed misleading function comment
- Updated test to match actual behavior
- Added comprehensive test coverage
SaschaSchwarze0 pushed a commit to src2img/knative-serving that referenced this pull request Jun 17, 2025
* Fix flakes in TestIdleTimeoutHandler (knative#15918)

* Correct tryTimeoutAndWriteError to write timeout regardless of prior writes (knative#15900)

Previously, the function comment suggested it would only write errors if nothing
had been written, but the implementation correctly only checks the timedOut flag.
This allows timeout errors to be written even after a response has started,
which is the desired behavior for handling slow responses.

- Fixed misleading function comment
- Updated test to match actual behavior
- Added comprehensive test coverage

* Fix request hanging after response start timeout expires (knative#15899)

When a response starts before the responseStartTimeout but the timeout
still fires, the timeout handler would not properly continue processing
the request. This caused requests to hang indefinitely if they started
responding just before the responseStartTimeout expired.

The issue occurred because after handling the responseStartTimeout case,
the select loop would continue but without properly waiting for the
handler to complete. Setting responseStartTimeoutDrained ensures the
timer is properly cleaned up and the loop continues to process other
events (completion, overall timeout, or idle timeout).

Fixes requests that start responding before responseStartTimeout but
take longer than responseStartTimeout to complete.

---------

Co-authored-by: Mike Fedosin <[email protected]>
isibeni pushed a commit to src2img/knative-serving that referenced this pull request Jun 23, 2025
* Fix flakes in TestIdleTimeoutHandler (knative#15918)

* Correct tryTimeoutAndWriteError to write timeout regardless of prior writes (knative#15900)

Previously, the function comment suggested it would only write errors if nothing
had been written, but the implementation correctly only checks the timedOut flag.
This allows timeout errors to be written even after a response has started,
which is the desired behavior for handling slow responses.

- Fixed misleading function comment
- Updated test to match actual behavior
- Added comprehensive test coverage

* Fix request hanging after response start timeout expires (knative#15899)

When a response starts before the responseStartTimeout but the timeout
still fires, the timeout handler would not properly continue processing
the request. This caused requests to hang indefinitely if they started
responding just before the responseStartTimeout expired.

The issue occurred because after handling the responseStartTimeout case,
the select loop would continue but without properly waiting for the
handler to complete. Setting responseStartTimeoutDrained ensures the
timer is properly cleaned up and the loop continues to process other
events (completion, overall timeout, or idle timeout).

Fixes requests that start responding before responseStartTimeout but
take longer than responseStartTimeout to complete.

---------

Co-authored-by: Mike Fedosin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Receiving response even after the timeoutSeconds is reached
4 participants