OSR in LLVM

\label{se:osr-llvm}

In this section we discuss one possible embodiment of the OSR approach of Section \ref{se:overview} in LLVM. Our discussion is based on a simple running example that illustrates a profile-driven optimization scenario. We start from a simple base function (isord) that checks whether an array of numbers is ordered according to some criterion specified by a comparator (see Figure \ref{fi:isord-example}). Our goal is to instrument isord so that, whenever the number of loop iterations exceeds a certain threshold, control is dynamically diverted to a faster version generated on the fly by inlining the comparator.

The IR code shown in this section11Virtual register names and labels in the LLVM-produced IR code shown in this paper have been refactored to make the code more readable. has been generated with clang and instrumented with OSRKit, a library we prototyped to help VM builders deploy OSR in LLVM. OSRKit provides a number of useful abstractions that include open and resolved OSR instrumentation of IR base functions without breaking the SSA (Static Single Assignment) form, liveness analysis, generation of OSR continuation functions, and mapping of LLVM values between different versions of a program along with compensation code generation22An accompanying artifact will allow the interested reader to get acquainted with OSRKit and repeat the sample scenario described in this section..

OSR Instrumentation in IR.

To defer the compilation of the continuation function until the comparator is known at run time, we used OSRKit to instrument isord with an open OSR point at the beginning of the loop body, as shown in Figure \ref{fig:isordfrom}. Portions added to the original code by OSR instrumentation are highlighted in grey. New instructions are placed at the beginning of the loop body to increment a hotness counter p.osr and jump to an OSR-firing block if the counter reaches the threshold (1000 iterations in this example). The OSR block contains a tail call to the target generation stub, which receives as parameters the four live variables at the OSR point (v, n, c, i). OSRKit allows the stub to receive the run-time value val of an IR object that can be used to produce the continuation function – in our example, the pointer to the comparator function to be inlined. The stub (see Figure \ref{fig:isordstub}) calls a code generator that: 1) builds an optimized version of isord by inlining the comparator, and 2) uses it to create the continuation function isordto shown in Figure \ref{fig:isordascto}. The stub passes to the code generator four parameters: 1) a pointer to the isord IR code, 2) a pointer to the basic block in isord from which the OSR is fired, 3) a user-defined object to support code generation in MCJIT, and 4) the stub’s val parameter (the first three are hard-wired by OSRKit). The stub terminates with a tail call to isordto. To generate the continuation function from the optimized version created by the inliner, OSRKit replaces the function entry point, removes dead code, replaces live variables with the function parameters, and fixes \(\phi\)-nodes accordingly. Additions resulting from the IR instrumentation are in grey, while removals are struck-through.

x86-64 Lowering.

\label{se:ir-x86-lowering}

Figure \ref{fig:isordx86-64} shows the x86-64 code generated by the LLVM back-end for isordfrom and isordto. For the sake of comparison with the native code that would be generated for the original non-OSR versions, additions resulting from the IR instrumentation are in grey, while removals are struck-through. Notice that the OSR intrusiveness in isordfrom is minimal, consisting of just two assembly instructions with register and immediate operands. As a result of induction variable canonicalization in the LLVM back-end, loop index i and hotness counter p.osr are fused in register %r12. We also note that tail call optimization is applied in the OSR-firing block, resulting in no stack growth during an OSR. The continuation function isordto is identical to the specialized version of isord with inlined comparator, except that the loop index is passed as a parameter in %rdx and no preamble is needed since OSR jumps directly in the loop body.

Comparison with McOSR.

McOSR \cite{lameed2013modular} is a library for inserting open OSR points in the legacy LLVM JIT, encoding the OSR machinery entirely in IR as OSRKit does. When an OSR is fired, live variables are stored into a pool of globals allocated by the library. McOSR then invokes a user-defined method to transform f into f’ and calls f with empty parameters. The new entrypoint of f checks a global flag to discriminate if it is being invoked in an OSR transition or as a regular call: in the first case, the state is restored from the pool of global variables before jumping to the OSR landing pad. As the new entrypoint can disrupt LLVM optimizations and lead to poorer performance on subsequent invocations of f, McOSR promptly recompiles f after an OSR. However, lessons from the Jikes RVM \cite{fink2003design} suggest that generating a dedicated function to resume the execution (as OSRKit does with \(\textsf{f}\textsf{'}_{\textsf{to}}\)) is likely to yield better performance. OSRKit improves upon McOSR in a number of aspects, including: 1) support for multiple versions of a function active in memory at the same time; 2) support for resolved OSR and compensation code; 3) insertion of OSR points at arbitrary locations (e.g., at function calls), and not only at loop headers; 4) compatibility with the new MCJIT’s design (e.g., a JIT-ted function or module cannot be updated); 5) simpler design that does not require pools of globals to transfer live variables.