Authorea

Daniele Cono D'Elia edited experim.tex over 8 years ago

Commit id: 40209e8c3df7a11cd1e9b2c9936433a082cde0eb

deletions | additions

For {\tt b-trees} - the only benchmark showing a recursive pattern - we insert an OSR point in the body of the method that accounts for the largest {\em self} execution time of the program. Such an OSR point might be useful to trigger recompilation of the code at a higher degree of optimization, or to enable some form of dynamic optimization (for instance, in a recursive search algorithm we might want to inline the comparator method provided by the user at the call). Results for the unoptimized and optimized versions of the benchmarks are reported in \myfigure\ref{fig:code-quality-base} and \myfigure\ref{fig:code-quality-O1}, respectively. For both scenarios we observe that the overhead is very small, i.e. less than $1\%$ for most benchmarks and less than $2\%$ in the worst case. For some benchmarks, code might run slightly faster after OSR point have been inserted due tocache and instruction alignment cache effects. We analyzed the code produced by the x86\_64 back-end: the OSR machinery is lowered into three native instructions that load a counter in a register, compare it against a constant value and jump to the OSR block accordingly. The number of times the OSR condition is checked for each benchmark is the same as in the experiments reported in \mytable\ref{tab:sameFun}. \paragraph{Overhead of OSR transitions.}

Hot code portions for OSR point insertion have been identified as in the experiments for code quality. However, as for hot loops we want to perform an OSR transition at each iteration, inserting an always-firing OSR point in the enclosing function is not an option, because the function we OSR into should then fire an OSR itself, leading eventually to a very large number of active stack frames. Depending on the characteristics of the hot loop, we either transform it into a separate function and instrument its entrypoint, or, when the loop calls a method with a high self time, we insert an OSR point at the beginning of that method. Normalized differences reported in the table represent a reasonable estimate of the average cost of firing a single OSR transition, which in other words is the cost of performing a function call passing the live variables as arguments. Reported numbers are in the order of nanoseconds, and might be negative due to the machine-level effects discussed for Message 1. instruction cache effects. \begin{table*} \begin{center}