Authorea

Daniele Cono D'Elia edited intro.tex over 8 years ago

Commit id: 456ebf3c1dd68c85f128a6ffd0b1f02a9aa14623

deletions | additions

The LLVM compiler infrastructure provides a Just-In-Time compiler called MCJIT that is currently being used for generating optimized code at run-time in virtual machines for dynamic languages. MCJIT is employed in both industrial and research projects, including Webkit's Javascript engine, the open-source Python implementation Pyston, the Rubinius project for Ruby, Julia for high-performance technical computing, McVM for MATLAB, CXXR for the R language, Terra for Lua, and the Pure functional programming language. The MCJIT compiler shares the same optimization pipeline with static compilers such as clang, and it provides dynamic features such as native code loading and linking, as well as a customizable memory manager. A piece that is currently missing in the environment is a feature to enable on-the-fly transitions between different versions of a running program's function. This feature is commonly known as On-Stack-Replacement (OSR) and is typically used in high-performance virtual machines, such as HotSpot and the Jikes RVM for Java, to interrupt a long-running function and recompile it at a higher optimization level. OSR can be a powerful tool for dynamic languages, for which most effective optimization decisions can typically be made only at run-time, when critical information such as type and shape of objects becomes available. In this scenario, OSR becomes useful also to perform optimization, deoptimization, i.e. when the running code has been speculatively optimized and the assumption used for the optimization does not hold anymore, the optimized function is interrupted and the execution continues in a safe version of the code. Currently VM builders using MCJIT are required to have a deep knowledge of the internals of LLVM in order to mimic a transitioning mechanism. In particular, they can rely on two experimental intrinsics, {\em Stackmap} and {\em Patchpoint}, to inspect the details of the compiled code generated by the back-end and to patch it manually with a sequence of assembly instructions. In particular, a Stackmap records the location of live values at a particular instruction address and during the compilation it is emitted into the object code within a designated section; a Patchpoint instead allows to reserve space at an instruction address for run-time patching and can be used to implement an inline caching mechanism~\cite{deutsch1984inlinecaching}. In a 2013 paper Lameed and Hendren propose McOSR~\cite{lameed2013modular}, a technique for OSR that essentially stores the live values in a global buffer, recompiles the current function and then loads in it the saved state when the execution is resumed. Their approach shows a few limitations that we discuss later on in this paper, and because of some relevant design choices it can work only with the legacy JIT which has been dropped from recent LLVM releases. \paragraph{Contributions.} In this paper we present our novel, platform-independent framework for LLVM to enable switching between different versions of a function at run-time. Specific goals of our approach include: \begin{itemize} \item The ability for a function reached via OSR to fire an OSR itself: this would allow switching from a base function $f_0$ to an optimized function $f_1$, and later on to a further optimized version $f_2$, etc. \item Supporting deoptimization, i.e., transitions from an optimized function to a less optimized function from which it was derived. \item Supporting transitions at arbitrary locations within a function. function locations. \item Supporting OSR targets either generated at run-time (e.g., using profiling compilation) information) or already known at compilation time. \item Hiding from the front-end that generates the different optimized versions of a function all the implementation details of how on-the-fly transitions between them are handled at specific OSR points. \end{itemize} \noindent Design goals of our implementation include: \begin{itemize} \item Supporting Encoding OSR transitions in terms ofinstrumentation of pure IR code only, avoiding manipulations at machine-code level. \item Incurring a minimal level of intrusiveness in terms of both the instrumentation of the code generated by the front-end and the degree of optimization opportunities influenced by the presence impact of OSR points. points on native code quality. \item Relying on LLVM's compilation pipeline to generate the most efficient native code for an instrumented function. %\item Providing support for the redirection of future invocations of a function to the latest compiled version without recompiling the callers or performing linking again. \end{itemize} \noindent Our implementation ships as a library for IR manipulation, and we present a preliminary experimental study of our technique in TinyVM, \TinyVM\, a proof-of-concept virtual machine for run-time IR manipulation and compilation based on MCJIT. We then also present a case study on the integration of our technique in McVM~\cite{chevalier2010mcvm}: we show the potential of our approach by enabling an aggressive specialization mechanism for the {\tt feval} construct - a source of bottlenecks in many MATLAB programs - that could not have been implemented using extant OSR techniques. \paragraph{Structure of the paper.} The remainder of this paper is organized as follows. In \mysection\ref{se:overview} we present our OSR technique and in \mysection\ref{se:osr-llvm} we outline its implementation and Application Program Interface (API) in LLVM. \mysection\ref{case-study} illustrates a case study in McVM. In \mysection\ref{se:experiments}, we present the our experimental studyin TinyVM and discuss implications of inserting OSR points in a LLVM function, while in Section~\ref{se:related} \mysection\ref{se:related} addresses related work. \mysection\ref{se:conclusions} concludes the paper and presents some ideas for future research directions.