Authorea

dcdelia Restoring previous changes deleted due to a mistake from Authorea interface over 8 years ago

Commit id: cc6930a19c8d75da0fec0635000734e0e7114c01

deletions | additions

\end{tabular} \end{small} \end{center} \caption{\label{tab:sameFun}Average cost \caption{\label{tab:sameFun}Cost ofan OSR transition transitions to the same function. For Overhead is assessed w.r.t. running time of the never-firing version. %For each benchmark we report the number of fired OSR transitions, the number of live values passed at the OSR point, the average time for performing a transition, and the slowdown of the always-firing w.r.t. the never-firing version calculated on total CPU time. } \end{table*} \ifauthorea{\newline}{} Results for the unoptimized and optimized versions of the benchmarks are reported in \myfigure\ref{fig:code-quality-base} and \myfigure\ref{fig:code-quality-O1}, respectively. For both scenarios we observe that the overhead is very small, i.e. less than $1\%$ for most benchmarks and less than $2\%$ in the worst case. For some benchmarks, code might run slightly faster after OSR point insertion due to instruction cache effects. %We analyzed the code produced by the x86-64 back-end: the OSR machinery is lowered into three native instructions that load a counter in a register, compare it against a constant value and jump to the OSR block accordingly. The number of times the OSR condition is checked for each benchmark is the same as in the experiments reported in \mytable\ref{tab:sameFun}. \ifauthorea{\newline}{} \paragraph{Overhead of OSR Transitions.} \mytable\ref{tab:sameFun} reports estimates of the average cost of performing an OSR transition to a clone of the running function. For each benchmark we compute the time difference between the scenarios in which an always- and a never-firing resolved OSR point is inserted in the code, respectively; we then normalize this difference against the number of fired OSR transitions.

\paragraph{Early Approaches.} %OSR has been pioneered in the SELF language~\cite{holzle1992self} to enable source-level debugging of optimized code, which required deoptimizing the code back to the original version. To reconstruct the source-level state, the compiler generates {\em scope descriptors} recording for each scope the locations or values of its arguments and locals. Execution can be interrupted only at certain interrupt points (i.e., method prologues and backward branches in loops) where its state is guaranteed to be consistent, allowing optimizations between interrupt points. It is worth mentioning also the {\em deferred compilation} mechanism~\cite{chambers1991self} implemented in SELF for branches that are unlikely to occur at run-time: the system generates a stub which invokes the compiler to generate a code object that can reuse the stack frame of the original code. OSR has been pioneered in the SELF language~\cite{holzle1992self} to enable source-level debugging of optimized code, which requires deoptimizing the code back to the original version. To reconstruct the source-level state, the compiler generates {\em scope descriptors} recording locations or values of arguments and locals. Execution can be interrupted only at certain interrupt points where its state is guaranteed to be consistent (i.e., method prologues and backward branches in loops), allowing optimizations between interrupt points. \ifdefined \fullver SELF also implements a deferred compilation mechanism~\cite{chambers1991self} for branches that are unlikely to occur at run-time: the system generates a stub that invokes the compiler to generate a code object that can reuse the stack frame of the original code. \else SELF also implements a deferred compilation mechanism~\cite{chambers1991self} for branches that are unlikely to occur at run-time: the system generates a stub that invokes the compiler to generate a code object that can reuse the current stack frame. \fi \paragraph{Java Virtual Machines.} The success of the Java language has drawn more attention to the design and implementation of OSR techniques, as bytecode interpreters began to work along with JIT compilers. In the high-performance HotSpot Server JVM~\cite{paleczny2001hotspot} performance-critical methods are identified using method-entry and backward-branches counters; when the OSR threshold is reached, the runtime transfers the execution from the interpreter frame to an OSR frame and thus to compiled code. Deoptimization is performed when class loading invalidates inlining or other optimization decisions: execution is rolled forward to a safe point, at which the native frame is converted into an interpreter frame.