notes/notes/Copy-and-Patch Compilation.md
2024-07-03 11:43:25 -06:00

5.8 KiB
Raw Blame History

https://dl.acm.org/doi/pdf/10.1145/3485513

Copy and patch is a compilation technique introduced in the above paper that functions (on a broad level) by stitching together code from a large library of binary implementation variants.

They provide two example use cases, a compiler for a C-like language, and a WebAssembly compiler, and show promising results for both startup time and execution performance.

Performance Results

(Bottom of Page 2):

Our compiler achieves both lower startup delay and better execution performance than prior baseline compilers. Figure 2 shows the performance of six WebAssembly compilers on the PolyBenchC benchmark, normalized to our performance. Our compiler has 6.5× lower startup delay than Liftoff, while generating on average 63% better-performing code.

Their C-like implementation is feature complete, and even includes some functionality like C++ (classes, destructor semantics).

One proposed use case of copy-and-patch compilation in the paper is as an SQL query engine, noting that they believe they have built the first baseline compiler for an SQL query engine. Using the above C-like DSL, they built a simple SQL query engine, and note impressive performance gains over previous optimizing compilers or interpreters:

(Halfway through Page 3):

The compilation time of our compiler is so low that it is less than the time it takes to construct the AST of the program. Compared with interpreters, both have negligible startup delay (since constructing ASTs takes longer), but our execution performance is an order of magnitude faster. Compared with LLVM -O0, our implementation compiles two orders of magnitude faster and generates code that performs on average 14% better. Therefore, we conclude that copy-and-patch renders both interpreters and LLVM -O0 compilation obsolete in this use case.

Implementation Details

At a broad level, copy and patch code compilation works by having a pre-built library of composable binary code snippets, referred to as binary stencils. Each binary stencil performs the operation of a single AST node/bytecode instruction. This makes both code generation and optimization a simple task, achieved simply by performing a lookup in a data table to select the stencil, then copying it to the output and patching in the missing values.

First, prior to compilation, MetaVar generates a stencil library.

That stencil library is used as input for a copy and patch code generator, alongside a bytecode sequence or AST node. In the copy step, stencils are copied from the stencil library that implement the bytecode/AST node. In the patch step, pre-determined places in the binary code (Operands of machine instructions, jump addresses, the values of constants).

Even though machine code is patched, a copy and patch compiler doesn't need to have knowledge of platform specific machine instructions, and is portable across architectures supported by the patch library.

MetaVar

Overview

MetaVar generates binary stencils, which allows the user to systematically generate the binary stencil variants in clean and pure C++, and leverages the Clang + LLVM compiler infrastructure to hide all platform-specific low-level detail.

MetaVar can generate binary stencils of different optimization levels for every bytecode or AST node, which the generator can select from at generation time. As am example, if the instruction is for adding a constant to a literal, the generator can select from addition implementations for the most optimized variant. It can also make register allocation decisions by keeping track of register availability and picking between stack instructions and register instructions.

Stencil Library Construction

MetaVar constructs the stencil library from programmer defined stencil generators. One stencil generator is defined for every AST node using C++ template meta-variables to express variants, and special macros to express missing values to be patched at runtime.

At compile time, the compiler iterates over the values of the meta-variables and creates a library entry for every valid combination.

Compile time

Stencil generators are templated C++ functions whose template instantiations produce stencils. At "runtime", the generator will perform tree pattern matching to determine the correct variant, then complete a hash table lookup to retrieve the stencil, then copy it to the output.

Terminology

Phrase Definition
Stencil A binary implementation that has holes where missing values must be inserted during codegen.
Full Compiler A compiler that compiles from a high level language to machine code
Bytecode Assembler An assembler that converts low level bytecode to machine code.
Baseline compiler In tiered compilation, a baseline compiler is the first compiler. It's meant to be the fastest, with the lowest priority on generating performant code.
Pareto Frontier In multi-objective optimization, the pareto frontier is the set of solutions that represent "ideal" tradeoffs between one and the other, rejecting suboptimal solutions.
MetaVar A system developed by the researchers for generating binary stencils using C++ through an LLVM backend.