Benchmark latency script #6025

yuki-stripe · 2026-01-28T23:06:56Z

Summary

Measures and compares iOS app latency between two git commits.

This script:

Checks out each commit and runs StripePaymentSheet latency tests multiple times
Parses SYNTHETIC_LATENCY_RESULT output from test logs
Performs statistical analysis to compare latency differences
Reports mean delta with 95% confidence intervals and statistical significance (p < 0.05)

Examples:
  # Compare two different commits
  ./measure_latency_difference.rb --base-commit abc123 --commit def456

  # A/A test (same commit twice to verify no false positives)
  ./measure_latency_difference.rb --base-commit abc123 --commit abc123

Options:
        --base-commit COMMIT         Base commit to compare against (required)
        --commit COMMIT              New commit to compare (required)
        --iterations N               Number of test iterations to run (default: 20)
    -h, --help                       Prints this help

See https://docs.google.com/document/d/1nQdXDF4tzWEWCSGcy0XEq6pzDl6NuHz_0_8O3soCkj0/edit?tab=t.0#heading=h.50gu6nbhfvxf

Testing

Manually tested A/A. Example output:

================================================================================
LATENCY DELTA REPORT (vs Base Commit)
================================================================================

Test                                     |   Base commit |   New commit | Mean Delta (95% CI)                           | Significant Difference?
----------------------------------------------------------------------------------------------------------------------------------------------
test_customer_customersession            |         678ms |        662ms | Δ -2.3% ± 3.2%; -16ms ± 22ms                  | No
test_customer_customersession_and_link   |         920ms |        954ms | Δ +3.8% ± 5.6%; +35ms ± 52ms                  | No
test_customer_legacy                     |         552ms |        546ms | Δ -1.2% ± 3.0%; -6ms ± 17ms                   | No
test_default                             |         390ms |        372ms | Δ -4.7% ± 5.3%; -18ms ± 21ms                  | No

Example output running TEST_STATS=1 ruby ./ci_scripts/measure_latency_difference.rb:

================================================================================
STATISTICS TEST MODE
================================================================================

Test data:
  Base: [1.0, 1.1, 1.2, 1.1, 1.0, 1.0, 1.1, 1.2, 1.1, 1.0]
  Base mean: 1.08
  New: [1.1, 1.2, 1.3, 1.2, 1.1, 1.1, 1.2, 1.3, 1.2, 1.1]
  New mean: 1.18

Computed statistics:
  Percentage delta: 9.3% ± 6.9%
  Absolute delta: 100ms ± 74ms
  Statistically significant: true

Changelog

Not user facing.

1. Expand critical values for df to cover wider and more realistic range 2. Recalculate std error using formala for independent samples, not paired 3. Calculate df using formula for independent samples, not paired

xiner-stripe

lgtm!

yuki-stripe requested a review from davidme-stripe January 28, 2026 23:06

yuki-stripe assigned davidme-stripe Jan 28, 2026

yuki-stripe requested review from a team as code owners January 28, 2026 23:06

yuki-stripe added 3 commits January 29, 2026 13:53

Benchmark latency script

394bbb6

Fixed some problems found by DS

208f06f

1. Expand critical values for df to cover wider and more realistic range 2. Recalculate std error using formala for independent samples, not paired 3. Calculate df using formula for independent samples, not paired

add simple test mode

0a44720

yuki-stripe force-pushed the yuki/latency-benchmark-script branch from f42c35d to 0a44720 Compare January 29, 2026 21:53

yuki-stripe requested a review from xiner-stripe January 29, 2026 21:55

xiner-stripe approved these changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark latency script #6025

Benchmark latency script #6025

yuki-stripe commented Jan 28, 2026 •

edited

Loading

Uh oh!

xiner-stripe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Benchmark latency script #6025

Are you sure you want to change the base?

Benchmark latency script #6025

Conversation

yuki-stripe commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Changelog

Uh oh!

xiner-stripe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuki-stripe commented Jan 28, 2026 •

edited

Loading