dcdelia  over 8 years ago

Commit id: bd0eeeea870fdbdd9f72d1a0003164a2aee59459

deletions | additions      

       

\begin{small}  \begin{verbatim}  .lr.ph: ; preds = %2, %0 %i.01 = phi i64 [ %10, %2 ], [ 1, %0 ]  %4 = getelementptr inbounds i64* %v, i64 %i.01  %.sum = add nsw i64 %i.01, -1 

\end{verbatim}  \end{small}  \noindent \tinyvm\ will {\tt UPDATE} the function in the following way: an {\tt ALWAYS}-true OSR condition is verified before executing instruction {\tt \%4}, firing an {\tt OPEN} OSR transition in to  the {\tt DYN\_INLINE} code generator that will inline any indirect function call to the function pointer {\tt \%c}. We choose {\tt \%4} as location for the OSR as its it is  the first non-$\phi$ instruction in the loop body, and we hint the LLVM backend back-end through profiling metadata  that the OSR firing is {\tt 100}\%-likely. The IR will now look like:  \begin{small}  \begin{verbatim}  .lr.ph: ; preds = %2, %0  %i.01 = phi i64 [ %10, %2 ], [ 1, %0 ]  %alwaysOSR = fcmp true double 0.000000e+00,  0.000000e+00  br i1 %alwaysOSR, label %OSR_fire,  label %OSR_split, !prof !1  OSR_split: ; preds = %.lr.ph  %4 = getelementptr inbounds i64* %v, i64 %i.01  %.sum = add nsw i64 %i.01, -1  [...]  OSR_fire: ; preds = %.lr.ph  %OSRCast = bitcast i32 (i8*, i8*)* %c to i8*  %OSRRet = call i32 @isord_stub(i8* %OSRCast,  i64* %v, i64 %n,  i32 (i8*, i8*)* %c,  i64 %i.01)  ret i32 %OSRRet  \end{verbatim}  \end{small}  \noindent\osrkit\ has split the {\tt \%.lr.ph} basic block at the OSR point, also adding an {\tt OSR\_fire} block to transfer the execution state to {\tt isord\_stub} and eventually return the {\tt OSRRet} value.   We can now let {\tt isord} run on a dynamically initialized array through the {\tt driver} method, which takes as argument the array length to use. The method will populate with elements ordered for the comparator in use (see {\small\tt inline.c}). For instance, we will ask {\tt driver} to set up an array of $100000$ elements and run {\tt isord} on it:  \begin{small}  \begin{verbatim}  TinyVM> driver(100000)  Time spent in creating continuation function:  0.000252396 seconds  Address of invoked function: 140652750196768  Function being inlined: cmp  Elapsed CPU time: 0 m 0 s 3 ms 417 us 157 ns  (that is: 0.003417157 seconds)  Evaluated to: 1  \end{verbatim}  \end{small}  \noindent The method returns $1$ as result, which means that the vector is ordered. Compared to \myfigure\ref{fig:isordascto}, IR code generated for the OSR continuation function {\tt isordto} ({\tt DUMP isordto}) is slightly different as the MCJIT compiler detects that additional optimizations (e.g., loop strength reduction) are possible and performs them. We expect code generated for {\tt isord\_stub} to be identical up to renaming to the IR reported in \myfigure\ref{fig:isordstub}.  To show native code generated by the MCJIT back-end, we can run \tinyvm\ under {\tt gdb} and leverage the debugging interface of MCJIT. For instance, once {\tt driver} has been invoked, we can switch to the debugger with {\tt CTRL-Z} and display the x86-64 code for any compiled method with:  \begin{small}  \begin{verbatim}  (gdb) disas isordto  Dump of assembler code for function isordto:  [Base address: 0x00007ffff7ff2000]  <+0>: mov -0x8(%rdi,%rcx,8),%edx  <+4>: sub (%rdi,%rcx,8),%edx  <+7>: xor %eax,%eax  <+9>: test %edx,%edx  <+11>: jg 0x7ffff7ff201a   <+13>: inc %rcx  <+16>: mov $0x1,%eax  <+21>: cmp %rsi,%rcx  <+24>: jl 0x7ffff7ff2000   <+26>: retq  End of assembler dump.  \end{verbatim}  \end{small}  Assuming that the steps described above are executed   Native code [...] {\tt gdb} [...]  %In a usage scenario in which input arrays are large, we might want to perform dynamic inlining as early as possible. We can thus insert