-
Notifications
You must be signed in to change notification settings - Fork 10
fix(test): rework reload_nginx
#197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
if span.get("meta", {}).get("_dd.appsec.json"): | ||
return json.loads(span["meta"]["_dd.appsec.json"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Code Quality Violation
too many nesting levels (...read more)
Avoid to nest too many loops together. Having too many loops make your code harder to understand.
Prefer to organize your code in functions and unit of code you can clearly understand.
Learn More
b08e007
to
9fe4f15
Compare
test/cases/orchestration.py
Outdated
@@ -380,8 +387,9 @@ def header_args(): | |||
) | |||
fields_json, headers_json, body_json, *rest = result.stdout.split("\n") | |||
if any(line for line in rest): | |||
raise Exception("Unexpected trailing output to curljson.sh: " + | |||
json.dumps(rest)) | |||
raise Exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚪ Code Quality Violation
Exception is too generic (...read more)
Do not raise Exception
and BaseException
. These are too generic. Having generic exceptions makes it difficult to differentiate errors in a program. Use a specific exception, for example, ValueError
, or create your own instead of using generic ones.
Learn More
test/cases/orchestration.py
Outdated
@@ -691,8 +695,7 @@ def sync_nginx_access_log(self): | |||
token = str(uuid.uuid4()) | |||
status, _, body = self.send_nginx_http_request(f"/sync?token={token}") | |||
if status != 200: | |||
raise Exception( | |||
f"nginx returned error (status, body): {(status, body)}") | |||
raise Exception(f"nginx returned error (status, body): {(status, body)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚪ Code Quality Violation
Exception is too generic (...read more)
Do not raise Exception
and BaseException
. These are too generic. Having generic exceptions makes it difficult to differentiate errors in a program. Use a specific exception, for example, ValueError
, or create your own instead of using generic ones.
Learn More
210c0e6
to
bf39393
Compare
This change comes after a deep investigation into why certain jobs were failing when upgrading to the latest dd-trace-cpp version. The root cause turned out to be increased contention in the default curl HTTP client due to telemetry refactoring. This caused NGINX to take significantly longer to shut down, which in turn led to timeouts during reload_nginx. While investigating, I also realized the logic in `reload_nginx` was flawed. It was only checking whether a worker process was running, rather than verifying that the old worker (by PID) had exited. Since NGINX spawns a new worker and sends a SIGQUIT to the old one during reloads, our previous check was resulting in false positives. I've updated the logic to explicitly wait for the old worker PID to terminate and confirm the new worker is up before proceeding. Additionally, while fixing this, I encountered issues with some AppSec tests. These tests were intentionally causing the worker to fail during initialization (e.g. by using invalid config paths), but this left NGINX in a broken state for subsequent tests. To resolve this, the tests for `datadog_appsec_http_blocked_template_json`, `datadog_appsec_http_blocked_template_html`, and `datadog_appsec_ruleset_file` now validate the existence of their required files during the configuration process instead of at worker startup.
bf39393
to
50fec9b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #197 +/- ##
==========================================
- Coverage 70.26% 70.20% -0.06%
==========================================
Files 47 48 +1
Lines 6224 6233 +9
Branches 882 883 +1
==========================================
+ Hits 4373 4376 +3
- Misses 1423 1429 +6
Partials 428 428
🚀 New features to boost your workflow:
|
This change comes after a deep investigation into why certain jobs were failing when upgrading to the latest
dd-trace-cpp
version. The root cause turned out to be increased contention in the default curl HTTP client due to telemetry refactoring. This caused NGINX to take significantly longer to shut down, which in turn led to timeouts duringreload_nginx
.While investigating, I also realized the logic in
reload_nginx
was flawed. It was only checking whether a worker process was running, rather than verifying that the old worker (by PID) had exited. Since NGINX spawns a new worker and sends a SIGQUIT to the old one during reloads, our previous check was resulting in false positives. I've updated the logic to explicitly wait for the old worker PID to terminate and confirm the new worker is up before proceeding.Additionally, while fixing this, I encountered issues with some AppSec tests. These tests were intentionally causing the worker to fail during initialization (e.g. by using invalid config paths), but this left NGINX in a broken state for subsequent tests. To resolve this, the tests for
datadog_appsec_http_blocked_template_json
,datadog_appsec_http_blocked_template_html
, anddatadog_appsec_ruleset_file
now validate the existence of their required files during the configuration process instead of at worker startup.