Skip to content

fix(test): rework reload_nginx #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

dmehala
Copy link
Contributor

@dmehala dmehala commented Apr 16, 2025

This change comes after a deep investigation into why certain jobs were failing when upgrading to the latest dd-trace-cpp version. The root cause turned out to be increased contention in the default curl HTTP client due to telemetry refactoring. This caused NGINX to take significantly longer to shut down, which in turn led to timeouts during reload_nginx.

While investigating, I also realized the logic in reload_nginx was flawed. It was only checking whether a worker process was running, rather than verifying that the old worker (by PID) had exited. Since NGINX spawns a new worker and sends a SIGQUIT to the old one during reloads, our previous check was resulting in false positives. I've updated the logic to explicitly wait for the old worker PID to terminate and confirm the new worker is up before proceeding.

Additionally, while fixing this, I encountered issues with some AppSec tests. These tests were intentionally causing the worker to fail during initialization (e.g. by using invalid config paths), but this left NGINX in a broken state for subsequent tests. To resolve this, the tests for
datadog_appsec_http_blocked_template_json,
datadog_appsec_http_blocked_template_html, and
datadog_appsec_ruleset_file now validate the existence of their required files during the configuration process instead of at worker startup.

@dmehala dmehala requested review from a team as code owners April 16, 2025 09:21
@dmehala dmehala requested review from cataphract and removed request for a team April 16, 2025 09:21
Comment on lines +34 to +35
if span.get("meta", {}).get("_dd.appsec.json"):
return json.loads(span["meta"]["_dd.appsec.json"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Code Quality Violation

too many nesting levels (...read more)

Avoid to nest too many loops together. Having too many loops make your code harder to understand.
Prefer to organize your code in functions and unit of code you can clearly understand.

Learn More

View in Datadog  Leave us feedback  Documentation

@dmehala dmehala force-pushed the dmehala/rework-tests branch from b08e007 to 9fe4f15 Compare April 16, 2025 09:25
@dmehala dmehala marked this pull request as draft April 16, 2025 09:51
@@ -380,8 +387,9 @@ def header_args():
)
fields_json, headers_json, body_json, *rest = result.stdout.split("\n")
if any(line for line in rest):
raise Exception("Unexpected trailing output to curljson.sh: " +
json.dumps(rest))
raise Exception(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Exception is too generic (...read more)

Do not raise Exception and BaseException. These are too generic. Having generic exceptions makes it difficult to differentiate errors in a program. Use a specific exception, for example, ValueError, or create your own instead of using generic ones.

Learn More

View in Datadog  Leave us feedback  Documentation

@@ -691,8 +695,7 @@ def sync_nginx_access_log(self):
token = str(uuid.uuid4())
status, _, body = self.send_nginx_http_request(f"/sync?token={token}")
if status != 200:
raise Exception(
f"nginx returned error (status, body): {(status, body)}")
raise Exception(f"nginx returned error (status, body): {(status, body)}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Exception is too generic (...read more)

Do not raise Exception and BaseException. These are too generic. Having generic exceptions makes it difficult to differentiate errors in a program. Use a specific exception, for example, ValueError, or create your own instead of using generic ones.

Learn More

View in Datadog  Leave us feedback  Documentation

@dmehala dmehala force-pushed the dmehala/rework-tests branch from 210c0e6 to bf39393 Compare April 17, 2025 11:17
This change comes after a deep investigation into why certain jobs were
failing when upgrading to the latest dd-trace-cpp version.

The root cause turned out to be increased contention in the default curl HTTP
client due to telemetry refactoring. This caused NGINX to take significantly
longer to shut down, which in turn led to timeouts during reload_nginx.

While investigating, I also realized the logic in `reload_nginx` was flawed.
It was only checking whether a worker process was running, rather than verifying
that the old worker (by PID) had exited. Since NGINX spawns a new worker and
sends a SIGQUIT to the old one during reloads, our previous check was resulting
in false positives. I've updated the logic to explicitly wait for the old worker
PID to terminate and confirm the new worker is up before proceeding.

Additionally, while fixing this, I encountered issues with some AppSec tests.
These tests were intentionally causing the worker to fail during initialization
(e.g. by using invalid config paths), but this left NGINX in a broken state for
subsequent tests. To resolve this, the tests for
`datadog_appsec_http_blocked_template_json`,
`datadog_appsec_http_blocked_template_html`, and
`datadog_appsec_ruleset_file` now validate the existence of their required files
during the configuration process instead of at worker startup.
@dmehala dmehala force-pushed the dmehala/rework-tests branch from bf39393 to 50fec9b Compare April 17, 2025 11:22
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.20%. Comparing base (4e94d09) to head (824e033).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
- Coverage   70.26%   70.20%   -0.06%     
==========================================
  Files          47       48       +1     
  Lines        6224     6233       +9     
  Branches      882      883       +1     
==========================================
+ Hits         4373     4376       +3     
- Misses       1423     1429       +6     
  Partials      428      428              
Files with missing lines Coverage Δ
src/common/directives.cpp 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dmehala dmehala marked this pull request as ready for review April 17, 2025 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants