Copy and patch is a novel compilation technique introduced in the above paper that functions (on a broad level) by stitching together code from a large library of binary implementation variants.
They provide two example use cases, a compiler for a C-like language, and a WebAssembly compiler, and show promising results for both startup time and execution performance.
> Our compiler achieves both lower startup delay and better execution performance than prior baseline compilers. Figure 2 shows the performance of six WebAssembly compilers on the PolyBenchC benchmark, normalized to our performance. Our compiler has 6.5× lower startup delay than Liftoff, while generating on average 63% better-performing code.
One proposed use case of copy-and-patch compilation in the paper is as an SQL query engine, noting that they believe they have built the first baseline compiler for an SQL query engine. Using the above C-like DSL, they built a simple SQL query engine, and note impressive performance gains over previous optimizing compilers or interpreters:
(Halfway through Page 3):
> The compilation time of our compiler is so low that it is less than the time it takes to construct the AST of the program. Compared with interpreters, both have negligible startup delay (since constructing ASTs takes longer), but our execution performance is an order of magnitude faster. Compared with LLVM -O0, our implementation compiles two orders of magnitude faster and generates code that performs on average 14% better. Therefore, we conclude that copy-and-patch renders both interpreters and LLVM -O0 compilation obsolete in this use case.
At a broad level, copy and patch code compilation works by having a pre-built library of composable binary code snippets, referred to as binary stencils. This makes both code generation and optimization a simple task, achieved simply by performing a lookup in a data table to select the stencil, then copying it to the output and patching in the missing values.
First, prior to compilation, MetaVar generates a stencil library.
That stencil library is used as input for a
## MetaVar
MetaVar generates binary stencils, which allows the user to systematically generate the binary stencil variants in clean and pure C++, and leverages the
Clang + LLVM compiler infrastructure to hide all platform-specific low-level detail.
The MetaVar compiler generates binary stencils of different optimization levels for every bytecode or AST node.
| Stencil | A binary implementation that has holes where missing values must be inserted during codegen. |
| Full Compiler | A compiler that compiles from a high level language to machine code |
| Bytecode Assembler | An assembler that converts low level bytecode to machine code. |
| Baseline compiler | In tiered compilation, a baseline compiler is the first compiler. It's meant to be the fastest, with the lowest priority on generating performant code. |
| Pareto Frontier | In multi-objective optimization, the pareto frontier is the set of solutions that represent "ideal" tradeoffs between one and the other, rejecting suboptimal solutions. |
| MetaVar | A system developed by the researchers for generating binary stencils using C++ through an LLVM backend. |