You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/glossary.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -25,20 +25,20 @@ The following is a glossary of domain specific terminology. Although benchmarks
25
25
***test case**: a combination of a benchmark, a profile, and a scenario.
26
26
***test**: the act of running an artifact under a test case. Each test result is composed of many iterations.
27
27
***test iteration**: a single iteration that makes up a test. Note: we currently normally run 2 test iterations for each test.
28
-
***test result**: the result of the collection of all statistics from running a test. Currently the minimum of the statistics.
28
+
***test result**: the result of the collection of all statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used.
29
29
***statistic**: a single value of a metric in a test result
30
30
***statistic description**: the combination of a metric and a test case which describes a statistic.
31
31
***statistic series**: statistics for the same statistic description over time.
32
32
***run**: a collection of test results for all currently available test cases run on a given artifact.
33
-
***test result delta**: the delta between two test results for the same test case but (optionally) different artifacts. The [comparison page](https://perf.rust-lang.org/compare.html) lists all the test result deltas as percentages comparing two runs.
34
33
35
34
## Analysis
36
35
37
-
***test result delta**: the difference between two test results for the same metric and test case.
38
-
***significance threshold**: the threshold at which a test result delta is considered "significant" (i.e., a real change in performance and not just noise). This is calculated using [the upper IQR fence](https://www.statisticshowto.com/upper-and-lower-fences/#:~:text=Upper%20and%20lower%20fences%20cordon,%E2%80%93%20(1.5%20*%20IQR)) as seen [here](https://github.com/rust-lang/rustc-perf/blob/8ba845644b4cfcffd96b909898d7225931b55557/site/src/comparison.rs#L935-L941).
39
-
***significant test result delta**: a test result delta above the significance threshold. Significant test result deltas can be thought of as "statistically significant".
36
+
***test result comparison**: the delta between two test results for the same test case but (optionally) different artifacts. The [comparison page](https://perf.rust-lang.org/compare.html) lists all the test result comparisons as percentages between two runs.
37
+
***significance threshold**: the threshold at which a test result comparison is considered "significant" (i.e., a real change in performance and not just noise). This is calculated using [the upper IQR fence](https://www.statisticshowto.com/upper-and-lower-fences/#:~:text=Upper%20and%20lower%20fences%20cordon,%E2%80%93%20(1.5%20*%20IQR)) as seen [here](https://github.com/rust-lang/rustc-perf/blob/8ba845644b4cfcffd96b909898d7225931b55557/site/src/comparison.rs#L935-L941).
38
+
***significant test result comparison**: a test result comparison above the significance threshold. Significant test result comparisons can be thought of as being "statistically significant".
39
+
***relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's *magnitude*. Comparisons are considered relevant if they have a small magnitude or more. This term is often used to mean "significant *and* relevant" since relevant changes are necessarily also significant.
40
+
***test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold).
40
41
***dodgy test case**: a test case for which the significance threshold is significantly large indicating a high amount of variability in the test and thus making it necessary to be somewhat skeptical of any results too close to the significance threshold.
41
-
***relevant test result delta**: a synonym for *significant test result delta* in situations where the term "significant" might be ambiguous and readers may potentially interpret *significant* as "large" or "statistically significant". For example, in try run results, we use the term relevant instead of significant.
0 commit comments