Skip to content

Commit 3eff591

Browse files
authored
Merge pull request #5 from WebAssembly/inlining-hints
Add inlining hints section draft
2 parents 5a2fa45 + 5ad4363 commit 3eff591

File tree

1 file changed

+41
-9
lines changed

1 file changed

+41
-9
lines changed

proposals/compilation-hints/Overview.md

Lines changed: 41 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,46 +19,78 @@ One interesting aspect to other environments where profiling feedback is to be i
1919
Based on the [branch hinting proposal](https://github.com/WebAssembly/branch-hinting), we extend this mechanism by adding additional custom sections for extra functionality. This can be integrated with the [annotations proposal](https://github.com/WebAssembly/annotations) for the ability to generate these from text format and for round-trips to the text format to preserve them.
2020

2121
Each family of hints is bundled in a respective custom section following the example of branch hints. These sections all have the naming convention `metadata.code.*` and follow the structure
22-
2322
* *function index* |U32|
2423
* a vector of hints with entries
2524
* *byte offset* |U32| of the hinted instruction from the beginning of the function body (0 for function level hints),
26-
* *hint length* |U32| indicating the number of values each hint requires,
25+
* *hint length* |U32| indicating the number of bytes each hint requires,
2726
* *values* |U32| with the actual hint information
2827

28+
If not specified otherwise, all numeric values are encoded using the [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer encoding either in its signed or unsigned variant.
29+
30+
*Note: If custom annotations support a `metadata.function.*` namespace, the byte offset could be dropped for function-level annotations.*
31+
2932
The following contains a list of hints to be included in the first version of the proposal. Future extensions can be added as necessity arises. This also includes annotations outside of function or instruction level like annotations for memories, etc.
3033

3134

3235
### Compilation order
3336

3437
The section `metadata.code.compilation_order` contains the order in which functions should be compiled in order to minimize wait times until the compilation is completed. This is especially relevant during instantiation and startup but might also be relevant later.
35-
3638
* *byte offset* |U32| with value 0 (function level only)
37-
* *hint length* |U32| with value 2
39+
* *hint length* |U32| in bytes
3840
* *compilation order* |U32| starting at 0 (functions with the same order value will be compiled in an arbitrary order but before functions with a higher order value)
3941
* *hotness* |U32| defining how often this function is called
4042

41-
If a length of larger than 2 is present, only the first two values of the following hint data is evalued while the rest is ignored. This leaves space for future extensions, e.g. grouping functions. Similarly, the *hotness* can be dropped if a length of 1 is given.
43+
If a length of larger than required to store 2 values is present, only the first two values of the following hint data is evalued while the rest is ignored. This leaves space for future extensions, e.g. grouping functions. Similarly, the *hotness* can be dropped if a length corresponds to only 1 value is given.
4244

4345
The *hotness* attribute has no pre-defined meaning. The larger the value, to more often a function is expected to run. So an engine can simply order the functions by hotness and tier up the ones with the largest *hotness* until the compilation budget is exceeded. The compilation budget might depend on the engine, compiler, available resources, how long the program has been running, etc. The special value of 0 is reserved for functions that only run once (e.g. initialization functions). An engine can decide to interpret those functions only or to free up code space by removing the compiled code after execution. Applications can run sich a function multiple times, but they should not because this might come with severe performance penalties, e.g. for repeated recompilation, not ever getting tiered up, etc.
4446

4547
It is expected and even desired that not all functions are annotated to keep this section small. It is up to th engine if and when the unannotated functions are compiled. It's recommended that these functions get compiled last or lazily on demand.
4648

49+
*Note: This should be moved to `metadata.function.compilation_order` without the byte offset if such a namespace will be supported by custom annotations.*
50+
51+
52+
### Inlining
53+
54+
An engine might decide to inline certain call targets based on its own feedback collection or other hints (e.g. *call targets* section), but explicit hints can be added per call target and per function using the following annotations.
55+
56+
The `metadata.code.inline` section contains instruction level annotations for all affected call sites.
57+
* *byte offset* |U32| from the beginning of the function to the wire byte index of the call instruction (this must be a `call`, `call_ref` or a `call_indirect`, otherwise the hint will be ignored)
58+
* *hint length* |U32| in bytes (always 1 for now, might be higher for future extensions)
59+
* *log call frequency* |U8| determining the estimated number of times the callee gets called per call of the caller.
60+
61+
The call frequency can be thought of the estimated number of times a callee gets called during one call of the caller. It is a logarithmic value based on the formula $f = \max(1, \min(126, 10 \log_{10} \frac{n}{N} + 32))$ where $n$ is the number of callee calls from this call site and $N$ is the number of caller calls.
62+
63+
The actual decision which function should be inlined can be based on runtime data that the engine collected, additional heuristics and available resources. There is no guarantee that a function is or is not inlined, but it should roughly be expected that functions of higher call frequency are prefered over ones with lower frequency.
64+
Special values of 0 and +127 indicate that a function should never or always be inlined respectively. Engines should respect such annotations over their own heuristics and toolchains should therefore avoid generating such annotations unless there is a good reason for it (e.g. "no inline" annotations in the source).
65+
66+
|log call frequency|calls per parent call|
67+
|-----------------:|:-------------------:|
68+
| 0| *never inline*|
69+
| 1| <0.0008|
70+
| 22| 0.1 |
71+
| 32| 1 |
72+
| 42| 10 |
73+
| 52| 100 |
74+
| 62| 1,000 |
75+
| 126| >2,511,886,432 |
76+
| 127| *always inline*|
77+
78+
If the *byte offset* is 0, the hint applies to all call sites where the function is the **target**. It serves as a shorthand notation unless explicitly overridden. In this case, the call frequency should be a rough estimate of the average call frequency of all potential sites. *Note: This should likely be moved to a dedicated section for clearer separation, e.g. `metadata.function.inline` if such a namespace will be supported by custom annotations.*
79+
4780

4881
### Call targets
4982

50-
When dealing with `call_indirect` or `call_ref`, often inefficient code is generated, because inlining is not possible. With code that e.g. uses virtual function calls, there are often very few commonly called targets which a compiler could optimize for. It still needs to have the ability to handle other call targets, but that can then happen at a much lower performance in favor of optimizing for the more commonly called target.
83+
When dealing with `call_indirect` or `call_ref`, often inefficient code is generated, because the call target is unknown. With code that e.g. uses virtual function calls, there are often very few commonly called targets which a compiler could optimize for. It still needs to have the ability to handle other call targets, but that can then happen at a much lower performance in favor of optimizing for the more commonly called target.
5184

5285
This is especially interesting if functions need to be compiled to the top tier early on, either because they're annotated with a low compilation order, because eager compilation or even AOT compilation is desired.
5386

5487
The `metadata.code.call_targets` section contains instruction level annotations for all relevant call targets identified by their function indexes.
55-
5688
* *byte offset* |U32| from the beginning of the function to the wire byte index of the call instruction (this must be a `call_ref` or a `call_indirect`, otherwise the hint will be ignored)
57-
* *hint length* |U32| (always even, 2 entries for each call target)
89+
* *hint length* |U32| in bytes
5890
* call target information
5991
* *function index* |U32|
6092
* *call frequency* |U32| in percent
6193

62-
The accumulated call frequency must add up to 100 or less. If it is less than 100, then other call targets that are not listed are responsible for the missing calls.
94+
The accumulated call frequency must add up to 100 or less. If it is less than 100, then other call targets that are not listed are responsible for the missing calls. Together with the inline hints on call frequency, this can information can be used to inline function calls as well. The effective call frequency for each call target is then the inlining call frequency multiplied by the fractional call frequency encoded in this section.
6395

6496
Similarly to the compilation order section, not all call sites need to be annotated and not all call targets be listed. However, if other call targets are known but not emitted, then the frequency must be below 100 to inform the engine of the missing information.

0 commit comments

Comments
 (0)