Authorea

Camil Demetrescu over 8 years ago

Commit id: ec9677bdf7c3abb6cbf23521b1aa8c82ccc2c06e

deletions | additions

A previous study by Lameed and Hendren~\cite{lameed2013feval} shows that the overhead of an \feval\ call is significantly higher than a direct call, especially in JIT-based execution environments such as McVM~\cite{chevalier2010mcvm} and the proprietary MATLAB JIT accelerator by Mathworks. In fact, the presence of an \feval\ instruction can disrupt the results of intra- and inter-procedural level for type and array shape inference analyses, which are key factors for efficient code generation. Furthermore, since \feval\ invocations typically require a fallback to an intepreter, parameters passed to an \feval\ are generally boxed to make them more generic. Lameed and Hendren propose and implement in McVM proposed two dynamic techniques for optimizing \feval\ instructions: instructions in McVM: {\em OSR-based} and {\em JIT-based} specialization, which work as follows. \begin{itemize} \item {\em OSR-based specialization:} when \paragraph{OSR-based specialization.} When a loop containing an \feval\ becomes hot in a function $f$, an OSR is triggered: the OSR handler invokes a code optimizer triggered. At that creates time, a specialized version $f'$ of $f$, $f$ is generated anddiverts control is diverted to $f'$. The $f'$ is generated by an optimizer works at IR level and that attempts to replace an \feval$(g,x,y,z...)$ instruction with a direct call $g(x,y,z,...)$. $g(x,y,z,...)$ with unboxed parameters, leveraging the fact that the actual values of $g$ and its arguments are known at OSR time. The direct call is guarded by a condition that checks if $g$ and the type of its parameters remain the same as observed when the OSR was fired. If the guard fails, the code falls back to executing the original \feval\ instruction. guarded \paragraph{JIT-based specialization.} [...] \noindent Although the OSR-based approach is more general, it generates much less efficient code compared to the JIT-based one for two main reasons: \begin{enumerate} \item when a function $f$ is first compiled from MATLAB to IR by condition that checks if . McVM, the functions it calls via \feval\ are unknown. Hence, the type inference engine is unable to infer types for their returned values, and these values must be kept boxed in heap-allocated objects and handled with slow generic instructions in the IR representation of $f$ (suitable for handling different types). For this reason, the optimized continuation functions $f'$ generated at OSR points in McVM inherit the slow generic instructions of $f$. \item {\em JIT-based specialization:} [...] \end{itemize} guard computation in $f'$ can be rather expensive, as it may require checking many parameters; \end{enumerate} \ifdefined\fullver The first one is based on OSR: using the McOSR library~\cite{lameed2013modular}, \feval\ calls inside loops are instrumented with an OSR point and profiling code to cache the last-known types for the arguments of each \feval\ instruction. When an OSR is fired at run-time, a code generator modifies the original function by inserting a guard to choose between a fast path containing a direct call and a slow path with the original \feval\ call. The second technique is less general and uses value-based JIT compilation: when the first argument of an \feval\ call is an argument of the enclosing function, the compiler replaces each call to this function in all of its callers with a call to a special dispatcher. At run-time, the dispatcher evaluates the value of the argument to use for the \feval\ and executes either a previously compiled cached code or generates and JIT-compiles a version of the function optimized for the current value.

Although the OSR-based approach is more general, it generates much less efficient code compared to the JIT-based one for three reasons: \begin{enumerate} \item since the function called through \feval\ is unknown at compile time, the type inference engine is unable to infer types for the returned values, so the compiler has to generate generic instructions (suitable for handling different types) for the remainder of the code; \item guard computation is expensive,because not only because the value of the first argument, but also the types of the remaining arguments have to be checked to choose between the fast and the slow path; \item since an \feval\ is executed through the interpreter, arguments are boxed to make them more generic before the call. \end{enumerate}