Skip to content
This repository has been archived by the owner on Apr 4, 2022. It is now read-only.

Dealing with the Effects of Global Optimizations

Paul Bowen-Huggett edited this page Oct 27, 2020 · 1 revision

“You have 3 independent functions: a(), b(), c(). The outliner notices that they all share a few common instructions and synthesises z() as a new function called by a(), b() and c(). The user modifies those common instructions in both b() and c() but not in a(). The outliner notices that b() and c() share a few common instructions and synthesises z() as a new function which is called by b() and c().”

Let’s translate that into code. For the purposes of keeping everything simple, assume that c1, c2, and s1…3 are defined elsewhere. They may be functions, global variables, or something more complex. It doesn’t matter.

Version 1 Version 2
int a(void) { return c1 + c2 + s1; }
int b(void) { return c1 + c2 + s2; }
int c(void) { return c1 + c2 + s3; }
int a(void) { return c1 + c2 + s1; }
int b(void) { return c1 - c2 + s2; }
int c(void) { return c1 - c2 + s3; }

Version 1

Starting with the version 1 code above, let’s examine how this will be represented after the optimization described above. The resulting optimized code, if written in (not quite) C might look something like:

static int z.1(void) { return c1 + c2; }
int a(void) { return z.1() + s1; }
int b(void) { return z.1() + s2; }
int c(void) { return z.1() + s3; }

(Notice that the newly created z.1() includes a disambiguating “.1” suffix. The reasons for this should become clear when we consider the results of the incremental compile in version 2. For further information, see this page.)

This is an approximation of how we’ll store this code in the repository.

Definition Digest Fragment Linked Definitions
a H(a1) return z.1() + s1; { H(z1) }
b H(b1) return z.1() + s2; { H(z1) }
c H(c1) return z.1() + s3; { H(z1) }
z.1 H(z1) return c1 + c2;

This table aspires to show an overview of the data structures in the repository focussing just on the data that’s relevant to this discussion. The columns are:

Definition
The named definition of an individual function. (Glossary entry.)
Digest
A symbolic description of a function’s hash digest. The digest of each function is given as H(nx). Read that as “the digest of version x of the function named n”. This appears as a key in the fragment index. The construction of the digest is discussed in detail on this page.
Fragment
A value in the fragment index containing the binary code and data for a function/variable shown here as (not quite) C code. (Glossary entry.)
Linked Definitions
A set of associated “linked” definitions. These were not present in the user’s source code but were generated by the optimizer and must appear whenever the owning definition is copied to the output by a linker or tool such as repo2obj. (Glossary entry.)

Version 2

Modify the source code as per version 2 in the above table and re-compile.

It’s important to remember that this is an incremental compiler, so we’re looking to compile things that have changed and avoid re-compiling things that haven’t. We notice that the source code for a() is unchanged from version 1: this means that its hash digest also does not change (H(a1)). The code for the remaining functions (b() and c()) is modified, so new hashes are generated as H(b2) and H(c2) respectively.

The pruning pass notices that H(a1) is already present in the fragment index, so the code for a() can be deleted.

Let’s once again see the code not-quite-C that the optimizer might produce:

static int z.2(void) { return c1 - c2; }
int b(void) { return z.2() + s2; }
int c(void) { return z.2() + s3; }

This code passes through each of the compiler’s code-generation passes before reaching the “repo object writer” stage to emit the final binary. This phase writes the newly generated fragments and builds a compilation record re-assembling the definitions present in the original compilation along with any linked definitions. Here, a() and z.1() are added.

Now to view the representation of this code in the repository:

Definition Digest Fragment Linked Definitions
a H(a1) return z.1() + s1; { H(z1) }
b H(b2) return z.2() + s2; { H(z2) }
c H(c2) return z.2() + s3; { H(z2) }
z.1 H(z1) return c1 + c2;
z.2 H(z2) return c1 - c2;