Authorea

Introduction

\label{se:intro}

The LLVM compiler infrastructure \cite{lattner2004llvm} provides a Just-In-Time compiler called MCJIT that is currently being used for generating optimized code at run-time in virtual machines for dynamic languages. MCJIT is employed in both industrial and research projects, including Webkit’s Javascript engine, the open-source Python implementation Pyston, the Rubinius project for Ruby, Julia for high-performance technical computing, McVM for MATLAB, CXXR for the R language, Terra for Lua, and the Pure functional programming language. The MCJIT compiler shares the same optimization pipeline with static compilers such as clang, and it provides dynamic features such as native code loading and linking, as well as a customizable memory manager.

A piece that is currently missing in MCJIT is a feature to enable on-the-fly transitions between different versions of a running program’s function. This feature is commonly known as On-Stack-Replacement (OSR) and is typically used in high-performance virtual machines, such as HotSpot and the Jikes RVM for Java, to interrupt a long-running function and recompile it at a higher optimization level. OSR can be a powerful tool for dynamic languages, for which most effective optimization decisions can typically be made only at run-time, when critical information such as type and shape of objects becomes available. In this scenario, OSR becomes useful also to perform deoptimization, i.e. when the running code has been speculatively optimized and the assumption used for the optimization does not hold anymore, the optimized function is interrupted and the execution continues in a safe version of the code.

Currently VM builders using MCJIT are required to have a deep knowledge of the internals of LLVM in order to mimic a transitioning mechanism. In particular, they can rely on two experimental intrinsics, Stackmap and Patchpoint, to inspect the details of the compiled code generated by the back-end and to patch it manually with a sequence of assembly instructions. In particular, a Stackmap records the location of live values at a particular instruction address and during the compilation it is emitted into the object code within a designated section; a Patchpoint instead allows to reserve space at an instruction address for run-time patching and can be used to implement an inline caching mechanism \cite{webkit14}.

Lameed and Hendren propose McOSR \cite{lameed2013modular}, a technique for OSR that stores the live values in a global buffer, recompiles the current function, and then loads in it the saved state when the execution is resumed. McOSR was designed for the legacy JIT that is no longer included in LLVM since release 3.6 and has some limitations that we discuss in Section \ref{se:osr-llvm}.

Contributions.

In this paper we propose a general-purpose, target-independent implementation of on-stack replacement. Specific goals of our approach include:

•

The ability for a function reached via OSR to fire an OSR itself: this would allow switching from a base function \(f\) to an optimized function \(f^{\prime}\), and later on to a further optimized version \(f^{\prime\prime}\), and so on.
•

Supporting deoptimization, i.e., transitions from an optimized function to a less optimized function from which it was derived.
•

Supporting transitions at arbitrary function locations.
•

Supporting OSR targets either generated at run-time (e.g., using profiling information) or already known at compilation time.
•

Hiding from the front-end that generates the different optimized versions of a function all the implementation details of how on-the-fly transitions between them are handled at specific OSR points.

We implemented the proposed approach in OSRKit¹¹OSRKit is available at https://github.com/dcdelia/tinyvm., a prototype library for LLVM IR manipulation based on MCJIT with the following design goals:

•

Encoding OSR transitions in terms of pure IR code only, avoiding manipulations at machine-code level.
•

Incurring a minimal level of intrusiveness in terms of both the instrumentation of the code generated by the front-end and the impact of OSR points on native code quality.
•

Relying on LLVM’s compilation pipeline to generate the most efficient native code for an instrumented function.

While the general ideas we propose have been prototyped in LLVM, we believe that they could be applied to other toolchains as well. To investigate the potential of our approach, we show how to optimize the feval construct – a major source of inefficiency in MATLAB execution engines \cite{lameed2013feval, radpour2013refactoring}. We present an extension of the MATLAB McVM runtime \cite{chevalier2010mcvm} based on OSRKit to enable aggressive specialization mechanisms for feval that were not supported by extant techniques \cite{lameed2013feval}. An experimental evaluation of our technique reveals that the OSR machinery injected by OSRKit has a small level of intrusiveness and the optimizations enabled by our approach can yield significant speedups in practical scenarios.

Structure of the paper.

The remainder of this paper is organized as follows. In Section \ref{se:overview} we present our OSR technique and in Section \ref{se:osr-llvm} we outline its implementation in LLVM. Section \ref{se:case-study} illustrates our feval case study in McVM. In Section \ref{se:experiments}, we present our experimental study and discuss implications of injecting OSR points in LLVM IR programs. Related work is discussed in Section \ref{se:related} and concluding remarks are given in Section \ref{se:conclusions}.