Skip to content

Perf release gate #9068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .gitlab/benchmarks/bp-runner.fail-on-breach.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Thresholds set based on guidance in https://datadoghq.atlassian.net/wiki/x/LgI1LgE#How-to-choose-thresholds-for-pre-release-gates%3F

experiments:
- name: Run SLO breach check
steps:
- name: SLO breach check
run: fail_on_breach
# https://datadoghq.atlassian.net/wiki/x/LgI1LgE#How-to-choose-a-warning-range-for-pre-release-gates%3F
warning_range: 10
# File spec
# https://datadoghq.atlassian.net/wiki/x/LgI1LgE#Specification
# Measurements
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario
scenarios:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the results of https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-java/-/jobs/1031460223, I suggest to update the SLOs to the following values:


          # Standard macrobenchmarks
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fonly-tracing&trendsType=scenario
          - name: normal_operation/only-tracing
            thresholds:
              - agg_http_req_duration_p50 < 2.36 ms
              - agg_http_req_duration_p99 < 7.89 ms
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fotel-latest&trendsType=scenario
          - name: normal_operation/otel-latest
            thresholds:
              - agg_http_req_duration_p50 < 2.5 ms
              - agg_http_req_duration_p99 < 10 ms

          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fonly-tracing&trendsType=scenario
          - name: high_load/only-tracing
            thresholds:
              - throughput > 1100.0 op/s
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fotel-latest&trendsType=scenario
          - name: high_load/otel-latest
            thresholds:
              - throughput > 1100.0 op/s

          # Startup macrobenchmarks
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Atracing%3AGlobalTracer&trendsType=scenario
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aappsec%3AGlobalTracer&trendsType=scenario
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aiast%3AGlobalTracer&trendsType=scenario
          - name: "startup:petclinic:(tracing|appsec|iast):GlobalTracer"
            thresholds:
              - execution_time < 280 ms
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aprofiling%3AGlobalTracer&trendsType=scenario
          - name: "startup:petclinic:profiling:GlobalTracer"
            thresholds:
              - execution_time < 420 ms

# Note that thresholds there are choosen based the confidence interval with a 10% adjustment.

# Standard macrobenchmarks
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fonly-tracing&trendsType=scenario
- name: normal_operation/only-tracing
thresholds:
- agg_http_req_duration_p50 < 2.36 ms
- agg_http_req_duration_p99 < 7.89 ms
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fotel-latest&trendsType=scenario
- name: normal_operation/otel-latest
thresholds:
- agg_http_req_duration_p50 < 2.34 ms
- agg_http_req_duration_p99 < 9.50 ms

# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fonly-tracing&trendsType=scenario
- name: high_load/only-tracing
thresholds:
- throughput > 1100.0 op/s
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fotel-latest&trendsType=scenario
- name: high_load/otel-latest
thresholds:
- throughput > 1100.0 op/s

# Startup macrobenchmarks
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Atracing%3AGlobalTracer&trendsType=scenario
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aappsec%3AGlobalTracer&trendsType=scenario
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aiast%3AGlobalTracer&trendsType=scenario
- name: "startup:petclinic:(tracing|appsec|iast):GlobalTracer"
thresholds:
- execution_time < 260 ms
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aprofiling%3AGlobalTracer&trendsType=scenario
- name: "startup:petclinic:profiling:GlobalTracer"
thresholds:
- execution_time < 368 ms
81 changes: 75 additions & 6 deletions .gitlab/macrobenchmarks.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
include:
project: 'DataDog/benchmarking-platform-tools'
file: 'images/templates/gitlab/notify-slo-breaches.template.yml'
ref: '925e0a3e7dd628885f6fc69cdaea5c8cc9e212bc'

.macrobenchmarks:
stage: macrobenchmarks
rules:
- if: $POPULATE_CACHE
when: never
- if: ($NIGHTLY_BENCHMARKS || $CI_PIPELINE_SOURCE != "schedule") && $CI_COMMIT_REF_NAME == "master"
when: always
- when: manual
allow_failure: true
# - if: $POPULATE_CACHE
# when: never
# - if: ($NIGHTLY_BENCHMARKS || $CI_PIPELINE_SOURCE != "schedule") && $CI_COMMIT_REF_NAME == "master"
# when: always
# - when: manual
# allow_failure: true
- when: on_success # TODO: PLEASE revert before merging the PR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo:

  • To revert before merging

tags: ["runner:apm-k8s-same-cpu"]
needs: ["build"]
interruptible: true
Expand Down Expand Up @@ -68,3 +74,66 @@ otel-latest:
BP_BENCHMARKS_CONFIGURATION: otel-latest
TRACER_OPTS: -javaagent:/app/otel-java-agent.jar -Ddd.env=otel-latest -Ddd.service=bp-java-petclinic
JAVA_OPTS: -javaagent:/app/memcheck/stability-testing-memwatch.jar -Xmx128M


check-slo-breaches:
stage: macrobenchmarks
interruptible: true
tags: ["arch:amd64"]
image: registry.ddbuild.io/images/benchmarking-platform-tools-ubuntu:latest
when: on_success
needs:
- job: baseline
artifacts: true
- job: only-tracing
artifacts: true
- job: otel-latest
artifacts: true
- job: benchmarks-startup
artifacts: true
- job: benchmarks-load
artifacts: true
- job: benchmarks-dacapo
artifacts: true
script:
# macrobenchmarks are located here, files are already in "converted" format
- export ARTIFACTS_DIR="$(pwd)/platform/artifacts/" && mkdir -p "${ARTIFACTS_DIR}"

# Need to move the artifacts the benchmarks-* job
- |
export BENCHMARKS_ARTIFACTS_DIR="$(pwd)/reports" && mkdir -p "${BENCHMARKS_ARTIFACTS_DIR}"
for benchmarkType in startup load dacapo; do
find "$BENCHMARKS_ARTIFACTS_DIR/$benchmarkType" -name "benchmark-baseline.json" -o -name "benchmark-candidate.json" | while read file; do
relpath="${file#$BENCHMARKS_ARTIFACTS_DIR/$benchmarkType/}"
prefix="${relpath%/benchmark-*}" # Remove the trailing /benchmark-(baseline|candidate).json
prefix="${prefix#./}" # Remove any leading ./
prefix="${prefix//\//-}" # Replace / with -
case "$file" in
*benchmark-baseline.json) type="baseline" ;;
*benchmark-candidate.json) type="candidate" ;;
esac
echo "Moving $file to $ARTIFACTS_DIR/${type}-${benchmarkType}-${prefix}.converted.json"
cp "$file" "$ARTIFACTS_DIR/${type}-${benchmarkType}-${prefix}.converted.json"
done
done
- ls -lah "$ARTIFACTS_DIR"
- bp-runner .gitlab/benchmarks/bp-runner.fail-on-breach.yml
artifacts:
name: "artifacts"
when: always
paths:
- platform/artifacts/
expire_in: 1 week
variables:
UPSTREAM_PROJECT_ID: $CI_PROJECT_ID # The ID of the current project. This ID is unique across all projects on the GitLab instance.
UPSTREAM_PROJECT_NAME: $CI_PROJECT_NAME # "dd-trace-java"
UPSTREAM_BRANCH: $CI_COMMIT_REF_NAME # The branch or tag name for which project is built.
UPSTREAM_COMMIT_SHA: $CI_COMMIT_SHA # The commit revision the project is built for.

notify-slo-breaches:
extends: .notify-slo-breaches
stage: macrobenchmarks
needs: ["check-slo-breaches"]
when: always
variables:
CHANNEL: "apm-release-platform"
Loading