Skip to content

Conversation

@yuki-stripe
Copy link
Collaborator

@yuki-stripe yuki-stripe commented Jan 28, 2026

Summary

Measures and compares iOS app latency between two git commits.

This script:

  1. Checks out each commit and runs StripePaymentSheet latency tests multiple times
  2. Parses SYNTHETIC_LATENCY_RESULT output from test logs
  3. Performs statistical analysis to compare latency differences
  4. Reports mean delta with 95% confidence intervals and statistical significance (p < 0.05)
Examples:
  # Compare two different commits
  ./measure_latency_difference.rb --base-commit abc123 --commit def456

  # A/A test (same commit twice to verify no false positives)
  ./measure_latency_difference.rb --base-commit abc123 --commit abc123

Options:
        --base-commit COMMIT         Base commit to compare against (required)
        --commit COMMIT              New commit to compare (required)
        --iterations N               Number of test iterations to run (default: 20)
    -h, --help                       Prints this help

See https://docs.google.com/document/d/1nQdXDF4tzWEWCSGcy0XEq6pzDl6NuHz_0_8O3soCkj0/edit?tab=t.0#heading=h.50gu6nbhfvxf

Testing

Manually tested A/A. Example output:

================================================================================
LATENCY DELTA REPORT (vs Base Commit)
================================================================================

Test                                     |   Base commit |   New commit | Mean Delta (95% CI)                           | Significant Difference?
----------------------------------------------------------------------------------------------------------------------------------------------
test_customer_customersession            |         678ms |        662ms | Δ -2.3% ± 3.2%; -16ms ± 22ms                  | No
test_customer_customersession_and_link   |         920ms |        954ms | Δ +3.8% ± 5.6%; +35ms ± 52ms                  | No
test_customer_legacy                     |         552ms |        546ms | Δ -1.2% ± 3.0%; -6ms ± 17ms                   | No
test_default                             |         390ms |        372ms | Δ -4.7% ± 5.3%; -18ms ± 21ms                  | No

Example output running TEST_STATS=1 ruby ./ci_scripts/measure_latency_difference.rb:

================================================================================
STATISTICS TEST MODE
================================================================================

Test data:
  Base: [1.0, 1.1, 1.2, 1.1, 1.0, 1.0, 1.1, 1.2, 1.1, 1.0]
  Base mean: 1.08
  New: [1.1, 1.2, 1.3, 1.2, 1.1, 1.1, 1.2, 1.3, 1.2, 1.1]
  New mean: 1.18

Computed statistics:
  Percentage delta: 9.3% ± 6.9%
  Absolute delta: 100ms ± 74ms
  Statistically significant: true

Changelog

Not user facing.

1. Expand critical values for df to cover wider and more realistic range
2. Recalculate std error using formala for independent samples, not paired
3. Calculate df using formula for independent samples, not paired
@yuki-stripe yuki-stripe force-pushed the yuki/latency-benchmark-script branch from f42c35d to 0a44720 Compare January 29, 2026 21:53
Copy link
Collaborator

@xiner-stripe xiner-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants