Authorea

Daniele Cono D'Elia edited case-study.tex over 8 years ago

Commit id: bc7df7657ef0a96be00588434cdc4f743dc2cbe8

deletions | additions

strengths of both approaches". In the remaining part of this section, we extend McVM by implementing a novel optimization mechanism for \feval\ based on our OSR technique: we will show that our mechanism is as efficient as their JIT-based approach in terms of quality of generated code, and is even more general than their OSR-based approach, as it can optimize also \feval\ calls not enclosed in a loop. \subsection{Extending McVM} The McVM virtual machine is a complex research project developed at McGill~\cite{mcvm} and composed made of several software components, including: a front-end for lowering MATLAB programs to an intermediate representation called IIR that capturesall of the high-level features of the language; an interpreter for running MATLAB functions and scripts in IIR format; a manager component to perform analyses on IIR; a JIT compiler based on LLVM for generating native code for a function,thus lowering McVM IIR to LLVM IR; a set of helper components to perform fast vector and matrix operations using optimized libraries such as ATLAS, BLAS and LAPACK. %The architecture of McVM is illustrated in Figure [...] McVM implements a function versioning mechanism based on type specialization: for each IIR representation of a function, different IR versions are generated according to the types of the arguments at each call site. The number of generated versions per function is on average small (i.e., less than two), as in most cases functions are always called with the same argument types. Type specialization is the main factor driver for the generation of generating efficient code in McVM~\cite{chevalier2010mcvm}. The source code of McVM is publicly available~\cite{mcvm}, and available~\cite{mcvm}; after porting it from the legacy LLVM JIT to MCJIT, we have extended it with the following components to enable the optimization of \feval\ instructions: \begin{enumerate} \item An analysis pass for \feval\ instructions in the IIR representation of a function \item An extension for the IIR compiler to track the correspondence between IIR and IR objects at \feval\ calls \item A helper An inserter component to insert OSR points in the IR at generated for IIR locations annotated during the analysis pass \item A callback optimizer triggered at OSR points, which in turn is made of: \begin{enumerate} \item A profile-driven IIR generator to replace \feval\ calls with direct calls

\end{enumerate} \end{enumerate} The analysis pass is integrated in McVM's analysis manager and identifies optimization opportunities for functions containing \feval\ instructions. A function becomes a candidate for optimization when at least one of its \feval\ calls is inside a loop. We then group \feval\ instructions whose first argument is reached by the same definition, using reaching definition information already computed by the analysis manager for previous optimizations. For each group we mark for instrumentation onlythose instructions that dominate the not dominated by others, so that the function can be optimized as early as possible at run-time. The analysis pass is also able to determine whether the value of the argument can change across two executions of the same \feval\ instruction, thus discriminating when a run-time guard must be inserted during the run-time optimization phase. Compared to the OSR-based approach by Lameed and Hendren, our solution is cheaper because the types for the other arguments do not need to be cached or guarded: as we will see later, the type inference engine will compute the most accurate yet sound type information in the analysis of the optimized IIR where direct calls are used. When the IIR compiler processes an annotated \feval\ instruction, it will store in the metadata of the function version being compiled a copy of its variable map (i.e., a map between IIR and IR objects), the current {\tt llvm::BasicBlock*} created for the call and the {\tt llvm::Value*} object corresponding to the first argument for the \feval. The last two objects are used by the helper inserter component as source label and profiling value for inserting an open OSR point, with the copy of the variable map being passed (along with other information) as {\tt extra} field. The open-OSR stub will in turn invoke the callback optimizer component we are about to describe in the next subsection. \subsection{Generating optimized code} Optimized Code} The core of the our optimization pipeline is the callback optimizer component, that is responsible for generating optimized code for the current function $f$ using profiling (i.e., the object containing the first argument for \feval) and contextual information passed from the open-OSR stub. As a first step, the optimizer will process the profiling object to resolve the target of the call - which we call $g$ - and check whether a previously compiled optimized function is available from the code cache. If not, a new function $f_{opt}$ is generated by cloning the IIR representation $f^{IIR}$ of $f$ into $f^{IIR}_{opt}$ and replacing all the \feval\ calls in the same group of the instrumented one with direct calls to $g$. As a next step, the optimizer asks the IIR compiler to analyze $f^{IIR}_{opt}$ and generate optimized LLVM IR $f^{IR}_{opt}$, also making a copy of the variable map between IIR and IR objects when compiling the direct call corresponding to the \feval\ instruction that triggered the OSR.