Authorea

Camil Demetrescu over 8 years ago

Commit id: 7daac271ee377df633942767908ca61014a2080d

deletions | additions

On-Stack Replacement (OSR) is a technique that consists in suspending the execution of a function and continuing from the same program state with a different version of the function. OSR is often used in virtual machines to interrupt a long-running function and recompile it at a higher optimization level, or to replace it with a different one when a speculative assumption made during its compilation no longer holds. % !TEX root = article.tex On-Stack Replacement (OSR) is a technique for suspending the execution of a function and continuing with a different version of the code. OSR is often used in virtual machines to interrupt a long-running function and recompile it at a higher optimization level, or to replace it with a different one when a speculative assumption made during its compilation no longer holds. In this paper we present anovel framework for OSR that introduces novel ideas and combines features of extant techniques that no previous solution provided simultaneously. New features include OSR with compensation code to adjust the program state during a transition and the ability to perform fire an OSR transitions. Compared from arbitrary locations in the code. Our approach is platform-independent as the OSR machinery is entirely encoded at a compiler's intermediate representation level. %Compared to extant OSR techniques, our approach is platform-independent as transitions are encoded entirely at the Intermediate Representation (IR) level; it supports OSR-point insertion at arbitrary locations in a function, and %and it allows a function version reached via OSR to fire an OSR itself, either to a more optimized version or to a less optimized one from which it was derived. We implement and evaluate our technique in the LLVM compiler infrastructure, which is nowadays being increasingly used as Just-In-Time (JIT) compiler in virtual machines for dynamic languages such as Javascript, MATLAB, Python, and Ruby. We also present a case study on the integration of our technique in the McVM virtual machine for MATLAB, showing significant speedups from the run-time optimization of a typical construct of the language.

%term1, term2 \keywords On-stack replacement, just-in-time compilation, code optimization, LLVM deoptimization, LLVM. \input{intro} \input{overview}

Once a state mapping object has been constructed, the optimizer calls our OSR library to generate the continuation function for the OSR transition and eventually compiles it. A pointer to the compiled function is stored in the code cache and returned to the stub, which invokes it through an indirect call passing the live state saved at the OSR point. \else \noindent Once a state mapping has been constructed, the optimizer asks \osrkit\ to generate the continuation function for the OSR transition and eventually executes it. \fi %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%We generated the IR modules for our experiments with {\tt clang}, starting from the C version of the \shootout\ suite. In the version of the code we will refer to as {\em unoptimized}, no LLVM optimization passes were performed on the code other than {\em mem2reg}, which promotes memory references to register references and constructs the SSA (Static Single Assignment) form. Starting from this version, we then generate an {\em optimized} version performing using the LLVM IR optimizer {\tt opt} at {\tt -O1} optimization level. \noindent We generate the IR modules for our experiments with {\tt clang} starting from the C version of the \shootout\ suite. To cover scenarios where OSR machinery is inserted in programs with different optimization levels, we consider two versions: 1) {\em unoptimized}, where the only LLVM optimization we perform is {\tt mem2reg} to promote stack references to registers and construct the SSA form; 2) {\em optimized}, where we apply {\tt opt} {\tt -O1} to the unoptimized version. %In the version of the code we will refer to as {\em unoptimized}, the only LLVM optimization pass we perform is {\tt mem2reg}, which promotes stack references to registers and constructs the SSA form. %\ifdefined \fullver

%Our implementation ships as a library for IR manipulation, and we present a preliminary experimental study of our technique in \tinyvm\, a proof-of-concept virtual machine for run-time IR manipulation and compilation based on MCJIT. \noindent While the general ideas we propose have been prototyped in LLVM, we believe that they could be applied to other toolchains as well. To investigate the potential of our approach, we show how to optimize the {\tt feval} construct -- a major source of inefficiency in MATLAB execution engines~\cite{lameed2013feval, radpour2013refactoring}. We present an extension of the MATLAB McVM runtime~\cite{chevalier2010mcvm} based on \osrkit\ to enable aggressive specialization mechanisms for {\tt feval} that were not supported by extant techniques~\cite{lameed2013feval}. An experimental evaluation of our technique reveals that the OSR machinery injected by \osrkit\ has a small level of intrusiveness and the optimizations enabled by our approach yield significant speeedups speedups in practical scenarios. %we extended McVM~\cite{chevalier2010mcvm}, showing how to enable aggressive specialization mechanism for the {\tt feval} construct\ - a source of bottlenecks in many MATLAB programs~\cite{lameed2013feval, radpour2013refactoring} - that could not have been implemented using extant OSR techniques.