Camil Demetrescu  over 8 years ago

Commit id: ede24ae15d72c701aeaa24cf20693380d9170730

deletions | additions      

       

\input{osr-llvm}  \input{case-study}  \input{prev-eval-sol}  \input{eval-new-approach}  \input{experim}  \input{related}  \input{conclusions}         

A previous study by Lameed and Hendren~\cite{lameed2013feval} shows that the overhead of an \feval\ call is significantly higher than a direct call, especially in JIT-based execution environments such as McVM~\cite{chevalier2010mcvm} and the proprietary MATLAB JIT accelerator by Mathworks. In fact, the presence of an \feval\ instruction can disrupt the results of intra- and inter-procedural level for type and array shape inference analyses, which are key factors for efficient code generation. Furthermore, since \feval\ invocations typically require a fallback to an intepreter, parameters passed to an \feval\ are typically boxed to make them more generic.  Our case study presents a novel technique for optimizing \feval\ in the McVM virtual machine, a complex research project developed at McGill University. McVM is publicly available~\cite{mcvm} and includes: a front-end for lowering MATLAB programs to an intermediate representation called IIR that captures the high-level features of the language; an interpreter for running MATLAB functions and scripts in IIR format; a manager component to perform analyses on IIR; a JIT compiler based on LLVM for generating native code for a function, lowering McVM IIR to LLVM IR; a set of helper components to perform fast vector and matrix operations using optimized libraries such as ATLAS, BLAS and LAPACK. %The architecture of McVM is illustrated in Figure [...]  McVM implements a function versioning mechanism based on type specialization, which is the main driver for generating efficient code~\cite{chevalier2010mcvm}.  McVM implements a function versioning mechanism based on type specialization, which is the main driver for generating efficient code~\cite{chevalier2010mcvm}.: for each IIR representation of a function, different IR versions are generated according to the types of the arguments at each call site.   \ifdefined\fullver   : for each IIR representation of a function, different IR versions are generated according to the types of the arguments at each call site.  The number of generated versions per function is on average small (i.e., less than two), as in most cases functions are always called with the same argument types. \fi         

% !TEX root = article.tex  \subsection{A New Approach}  \label{ss:eval-opt-mcvm} 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  \paragraph{Discussion.}  The ideas presented in this section combine advance the state of the art of \feval\ optimization in MATLAB runtimes.  %combine  the flexibility of OSR-based specialization with the efficiency of the JIT-based method. Similarly to OSR-based specialization, we do not place restrictions on the functions that can be optimized. On the other hand, we works work  at IIR (rather than IR) level as in JIT-based specialization, which makes it possible allows us  to perform type inference on the specialized code. Working at IIR level eliminates the two main sources of inefficiency of OSR-based specialization: 1) we can replace generic istructions with specialized instructions, and 2) the types of $g$'s arguments do not need to be cached or guarded as they are statically inferred. These observations are confirmed in practice by experiments on real MATLAB programs, as we will show in \mysection\ref{ss:experim-results}.        

Starting from this version, we also generate an {\em optimized} version using the LLVM optimizer {\tt opt -O1}.  \fi  For question Q4, we analyze the impact of the optimization technique presented in \mysection\ref{sse:optimizing_feval} \mysection\ref{ss:eval-opt-mcvm}  on the running time of a few numeric benchmarks, namely {\tt odeEuler}, {\tt odeMidpt}, {\tt odeRK4}, and {\tt sim\_anl}. The first three benchmarks~\cite{recktenwald2000numerical} solve an ODE for heat treating simulation using the Euler, midpoint, and Range-Kutta method, respectively; the last benchmark minimizes the six-hump camelback function with the method of simulated annealing\footnote{\url{http://www.mathworks.com/matlabcentral/fileexchange/33109-simulated-annealing-optimization}}. %\footnote{\url{http://web.cecs.pdx.edu/~gerry/nmm/mfiles/}}         

\subsection{Current Approaches}  \label{ss:prev-eval-sol}  Lameed and Hendren~\cite{lameed2013feval} proposed two dynamic techniques for optimizing \feval\ instructions in McVM: {\em JIT-based} and {\em OSR-based} specialization. Both attempt to optimize a function $f$ that contains instructions of the form \feval$(g,...)$, leveraging information about $g$ and the type of its arguments observed at run-time. The optimization produces a specialized version $f'$ where \feval$(g,x,y,z...)$ \feval$(g,x,y,z,...)$  instructions are replaced with direct calls of the form $g(x,y,z,...)$. The two approaches differ in the points where code specialization is performed. In JIT-based specialization, $f'$ is generated when $f$ is called. In contrast, the OSR-based method interrupts $f$ as it executes, generates a specialized version $f'$, and resumes from it.