WORKING DRAFT authorea.com/66046
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • Flexible On-Stack Replacement in LLVM

    Abstract

    On-Stack Replacement (OSR) is a technique for dynamically transferring execution between different versions of a function at run time. OSR is typically used in virtual machines to interrupt a long-running function and recompile it at a higher optimization level, or to replace it with a different one when a speculative assumption made during its compilation no longer holds.

    In this paper we present a framework for OSR that introduces novel ideas and combines features of existing techniques that no previous solution provided simultaneously. New features include OSR with compensation code to adjust the program state during a transition and the ability to fire an OSR from arbitrary locations in the code. Our approach is platform-independent as the OSR machinery is entirely encoded at a compiler’s intermediate representation level.

    We implement and evaluate our technique in the LLVM compiler infrastructure, which is gaining popularity as Just-In-Time (JIT) compiler in virtual machines for dynamic languages such as Javascript, MATLAB, Python, and Ruby. As a case study of our approach, we show how to improve the state of the art in the optimization of the feval instruction, a performance-critical construct of the MATLAB language.

    Introduction

    \label{se:intro}

    The LLVM compiler infrastructure (Lattner 2004) provides a Just-In-Time compiler called MCJIT that is currently being used for generating optimized code at run-time in virtual machines for dynamic languages. MCJIT is employed in both industrial and research projects, including Webkit’s Javascript engine, the open-source Python implementation Pyston, the Rubinius project for Ruby, Julia for high-performance technical computing, McVM for MATLAB, CXXR for the R language, Terra for Lua, and the Pure functional programming language. The MCJIT compiler shares the same optimization pipeline with static compilers such as clang, and it provides dynamic features such as native code loading and linking, as well as a customizable memory manager.

    A piece that is currently missing in MCJIT is a feature to enable on-the-fly transitions between different versions of a running program’s function. This feature is commonly known as On-Stack-Replacement (OSR) and is typically used in high-performance virtual machines, such as HotSpot and the Jikes RVM for Java, to interrupt a long-running function and recompile it at a higher optimization level. OSR can be a powerful tool for dynamic languages, for which most effective optimization decisions can typically be made only at run-time, when critical information such as type and shape of objects becomes available. In this scenario, OSR becomes useful also to perform deoptimization, i.e. when the running code has been speculatively optimized and the assumption used for the optimization does not hold anymore, the optimized function is interrupted and the execution continues in a safe version of the code.

    Currently VM builders using MCJIT are required to have a deep knowledge of the internals of LLVM in order to mimic a transitioning mechanism. In particular, they can rely on two experimental intrinsics, Stackmap and Patchpoint, to inspect the details of the compiled code generated by the back-end and to patch it manually with a sequence of assembly instructions. In particular, a Stackmap records the location of live values at a particular instruction address and during the compilation it is emitted into the object code within a designated section; a Patchpoint instead allows to reserve space at an instruction address for run-time patching and can be used to implement an inline caching mechanism (Pizlo).

    Lameed and Hendren propose McOSR (Lameed 2013), a technique for OSR that stores the live values in a global buffer, recompiles the current function, and then loads in it the saved state when the execution is resumed. McOSR was designed for the legacy JIT that is no longer included in LLVM since release 3.6 and has some limitations that we discuss in Section \ref{se:osr-llvm}.

    Contributions.

    In this paper we propose a general-purpose, target-independent implementation of on-stack replacement. Specific goals of our approach include:

    • The ability for a function reached via OSR to fire an OSR itself: this would allow switching from a base function \(f\) to an optimized function \(f^{\prime}\), and later on to a further optimized version \(f^{\prime\prime}\), and so on.

    • Supporting deoptimization, i.e., transitions from an optimized function to a less optimized function from which it was derived.

    • Supporting transitions at arbitrary function locations.

    • Supporting OSR targets either generated at run-time (e.g., using profiling information) or already known at compilation time.

    • Hiding from the front-end that generates the different optimized versions of a function all the implementation details of how on-the-fly transitions between them are handled at specific OSR points.

    We implemented the proposed approach in OSRKit11OSRKit is available at https://github.com/dcdelia/tinyvm., a prototype library for LLVM IR manipulation based on MCJIT with the following design goals:

    • Encoding OSR transitions in terms of pure IR code only, avoiding manipulations at machine-code level.

    • Incurring a minimal level of intrusiveness in terms of both the instrumentation of the code generated by the front-end and the impact of OSR points on native code quality.

    • Relying on LLVM’s compilation pipeline to generate the most efficient native code for an instrumented function.

    While the general ideas we propose have been prototyped in LLVM, we believe that they could be applied to other toolchains as well. To investigate the potential of our approach, we show how to optimize the feval construct – a major source of inefficiency in MATLAB execution engines (Lameed 2013a, Radpour 2013). We present an extension of the MATLAB McVM runtime (Chevalier-Boisvert 2010) based on OSRKit to enable aggressive specialization mechanisms for feval that were not supported by extant techniques (Lameed 2013a). An experimental evaluation of our technique reveals that the OSR machinery injected by OSRKit has a small level of intrusiveness and the optimizations enabled by our approach can yield significant speedups in practical scenarios.

    Structure of the paper.

    The remainder of this paper is organized as follows. In Section \ref{se:overview} we present our OSR technique and in Section \ref{se:osr-llvm} we outline its implementation in LLVM. Section \ref{se:case-study} illustrates our feval case study in McVM. In Section \ref{se:experiments}, we present our experimental study and discuss implications of injecting OSR points in LLVM IR programs. Related work is discussed in Section \ref{se:related} and concluding remarks are given in Section \ref{se:conclusions}.