Authorea

Daniele Cono D'Elia edited experim.tex over 8 years ago

Commit id: 1d418cfd0b0037359a3cfc9fe297d80e779129ab

deletions | additions

\subsection{Setup} We generated the IR modules for our experiments with clang, starting from the C version of the shootout suite. In the version of the code we will refer to as {\em baseline}, unoptimized}, no LLVM optimization passes were performed on the code other than {\em mem2reg}, which promotes memory references to register references and constructs the SSA (Static Single Assignment) form. We refer to Starting from this version, we then generate an {\em optimized} version performing using the LLVM IR optimizer {\tt opt} at {\tt -O1} optimization level. Experiments were performed on an octa-core 2.3 Ghz Intel Xeon E5-4610 v2 with 256+256KB of L1 cache, 2MB of L2 cache, 16MB of shared L3 cache and 128 GB of DDR3 main memory, running Debian Wheezy 7, Linux kernel 3.2.0, LLVM 3.6.2 (Release build, compiled using gcc 4.7.2), 64 bit.

For iterative benchmarks, we insert an OSR point in the body of their hottest loops. We classify a loop as hottest when its body is executed for a very high cumulative number of iterations (e.g., from a few thousands up to billions) and it either calls the method with the highest {\em self} time in the program, or it performs the most computational-intensive operations for the program in its own body. These loops are natural candidates for OSR point insertion, as they can be used - as in the Jikes RVM - to enable more dynamic inlining opportunities, with the benefits from several control-flow (e.g., dead code elimination) and data-flow (e.g., constant propagation) optimizations based on the run-time values of the live variables. In the shootout benchmarks, the number of such loops is typically 1 (2 for {\tt spectral-norm}). For {\tt b-trees} - the only benchmark in our suite showing a recursive pattern - we insert an OSR point in the body of the method that accounts for the largest {\em self} execution time of the program. Such an OSR point might be useful to trigger recompilation of the code at a higher degree of optimization, or to enable some form of dynamic optimization (for instance, in a recursive search algorithm we might want to inline the comparator method provided by the user at the call). Results for the unoptimized and optimized versions of the benchmarks are reported in Figure\ref{fig:code-quality-base} and \ref{fig:code-quality-O1}, respectively. \paragraph{Overhead of OSR transitions}

\begin{figure}[t] \begin{center} \includegraphics[width=0.95\columnwidth]{figures/code-quality-noBB/code-quality-noBB.eps} \caption{\protect\input{figures/code-quality-noBB/caption}} \caption{\label{fig:code-quality-base} \protect\input{figures/code-quality-noBB/caption}} \end{center} \end{figure} \fi

\begin{figure}[t] \begin{center} \includegraphics[width=0.95\columnwidth]{figures/code-quality-O1-noBB/code-quality-O1-noBB.eps} \caption{\protect\input{figures/code-quality-O1-noBB/caption}} \caption{\label{fig:code-quality-O1} \protect\input{figures/code-quality-O1-noBB/caption}} \end{center} \end{figure} \fi