Authorea

Daniele Cono D'Elia edited case-study.tex over 8 years ago

Commit id: 5a94b298c630ad448cc71ab032f726823de3f157

deletions | additions

Once a state mapping object has been constructed, the optimizer calls our OSR library to generate the continuation function for the OSR transition and eventually compiles it. A pointer to the compiled function is stored in the code cache and returned to the stub, which invokes it through an indirect call passing the live state saved at the OSR point. \subsection{Experimental Results} We evaluated the effectiveness of our technique on four benchmarks, namely {\tt odeEuler}, {\tt odeMidpt}, {\tt odeRK4}, and {\tt sim\_anl}. The first three benchmarks solve an ODE for heat treating simulation using the Euler, midpoint, and Range-Kutta method, respectively; the last benchmark minimizes the six-hump camelback function with the method of simulated annealing. We report the speed-ups enabled by our technique in \mytable\ref{tab:feval}, using the running times for McVM's \feval\ default dispatcher as baseline. As the dispatcher typically JIT-compiles the invoked function, we also analyzed running times when the dispatcher calls a previously compiled function. In the last column, we report show speed-ups from a modified version of the benchmarks in which each \feval\ call is replaced by hand with a direct call to the function in use for the specific benchmark. \begin{table} \begin{small}

Unfortunately, we are unable to compute direct performance metrics for the solution by Lameed and Hendren since its source code has not been released. Numbers in their paper~\cite{lameed2013feval} show that for these benchmarks the speed-up of the OSR-based approach is equal on average to a $30.1\%$ percentage of the speed-up from hand-coded calls, ranging from $9.2\%$ to $73.9\%$; for the JIT-based approach the average percentage grows to $84.7\%$, ranging from $75.7\%$ to $96.5\%$. Our optimization technique yields speed-ups that are very close to the upper bound given from by-hand optimization; in the worst case - {\tt odeRK4} benchmark - we observe a $94.1\%$ percentage when the optimized code is generated on-the-fly, which becomes $97.5\%$ when a cached version is available. Compared to their OSR-based approach, the compensation entry block is a key driver of improved performance, as the benefits from a better type-specialized whole function body outweigh those from performing a direct call using boxed arguments and return values in place of the original \feval.