Authorea

dcdelia over 8 years ago

Commit id: bd0eeeea870fdbdd9f72d1a0003164a2aee59459

deletions | additions

\begin{small} \begin{verbatim} .lr.ph: ; preds = %2, %0 %i.01 = phi i64 [ %10, %2 ], [ 1, %0 ] %4 = getelementptr inbounds i64* %v, i64 %i.01 %.sum = add nsw i64 %i.01, -1

\end{verbatim} \end{small} \noindent \tinyvm\ will {\tt UPDATE} the function in the following way: an {\tt ALWAYS}-true OSR condition is verified before executing instruction {\tt \%4}, firing an {\tt OPEN} OSR transition in to the {\tt DYN\_INLINE} code generator that will inline any indirect function call to the function pointer {\tt \%c}. We choose {\tt \%4} as location for the OSR as its it is the first non-$\phi$ instruction in the loop body, and we hint the LLVM backend back-end through profiling metadata that the OSR firing is {\tt 100}\%-likely. The IR will now look like: \begin{small} \begin{verbatim} .lr.ph: ; preds = %2, %0 %i.01 = phi i64 [ %10, %2 ], [ 1, %0 ] %alwaysOSR = fcmp true double 0.000000e+00, 0.000000e+00 br i1 %alwaysOSR, label %OSR_fire, label %OSR_split, !prof !1 OSR_split: ; preds = %.lr.ph %4 = getelementptr inbounds i64* %v, i64 %i.01 %.sum = add nsw i64 %i.01, -1 [...] OSR_fire: ; preds = %.lr.ph %OSRCast = bitcast i32 (i8*, i8*)* %c to i8* %OSRRet = call i32 @isord_stub(i8* %OSRCast, i64* %v, i64 %n, i32 (i8*, i8*)* %c, i64 %i.01) ret i32 %OSRRet \end{verbatim} \end{small} \noindent\osrkit\ has split the {\tt \%.lr.ph} basic block at the OSR point, also adding an {\tt OSR\_fire} block to transfer the execution state to {\tt isord\_stub} and eventually return the {\tt OSRRet} value. We can now let {\tt isord} run on a dynamically initialized array through the {\tt driver} method, which takes as argument the array length to use. The method will populate with elements ordered for the comparator in use (see {\small\tt inline.c}). For instance, we will ask {\tt driver} to set up an array of $100000$ elements and run {\tt isord} on it: \begin{small} \begin{verbatim} TinyVM> driver(100000) Time spent in creating continuation function: 0.000252396 seconds Address of invoked function: 140652750196768 Function being inlined: cmp Elapsed CPU time: 0 m 0 s 3 ms 417 us 157 ns (that is: 0.003417157 seconds) Evaluated to: 1 \end{verbatim} \end{small} \noindent The method returns $1$ as result, which means that the vector is ordered. Compared to \myfigure\ref{fig:isordascto}, IR code generated for the OSR continuation function {\tt isordto} ({\tt DUMP isordto}) is slightly different as the MCJIT compiler detects that additional optimizations (e.g., loop strength reduction) are possible and performs them. We expect code generated for {\tt isord\_stub} to be identical up to renaming to the IR reported in \myfigure\ref{fig:isordstub}. To show native code generated by the MCJIT back-end, we can run \tinyvm\ under {\tt gdb} and leverage the debugging interface of MCJIT. For instance, once {\tt driver} has been invoked, we can switch to the debugger with {\tt CTRL-Z} and display the x86-64 code for any compiled method with: \begin{small} \begin{verbatim} (gdb) disas isordto Dump of assembler code for function isordto: [Base address: 0x00007ffff7ff2000] <+0>: mov -0x8(%rdi,%rcx,8),%edx <+4>: sub (%rdi,%rcx,8),%edx <+7>: xor %eax,%eax <+9>: test %edx,%edx <+11>: jg 0x7ffff7ff201a <+13>: inc %rcx <+16>: mov $0x1,%eax <+21>: cmp %rsi,%rcx <+24>: jl 0x7ffff7ff2000 <+26>: retq End of assembler dump. \end{verbatim} \end{small} Assuming that the steps described above are executed Native code [...] {\tt gdb} [...] %In a usage scenario in which input arrays are large, we might want to perform dynamic inlining as early as possible. We can thus insert