Authorea

Nuwan Wajirasena(110607D) edited section_Introduccton_To_Apache_Derby__.tex about 8 years ago

Commit id: 517c0119253874db249eebf90bff7d73f510d8b3

deletions | additions

\item Generation of a Java class (directly to byte code) to represent the statement plan \item Loading of the class and creation of an instance to represent that connection’s state of the query \end{enumerate} The generated statement plan is cached and can be shared by multiple connections. DDL statements (e.g. CREATE TABLE) use a common statement plan to avoid generation of a Java class file. This implementation was driven by the original goal to have a small footprint. Using the JVM’s interepter interpreter was thought to be less code than having an internal one. It does mean that the first couple of times the statement plan is executed,it would be interpreted. After a number of executions, the Just-In-Time (JIT) compiler will decide to compile it into native code. Thus, running performance tests will see a boost after a number of iterations. In addition, calling into Java user-supplied methods (functions and procedures) is direct, rather than through reflection.

SQL statements, open containers and their file descriptors, string conversions, table descriptors and maybe more. The interface/implementation split ensures that the cache algorithm is not known by the users of the cache and can be changed with affecting them. The current algorithm is a ”clock or ring” based algorithm. \subsection{Derby Optimizer} After introducing the Derby architecture now we can present the Derby optimizer which is used by the SQL compilation under the SQL layer. The optimizer currently only considers left-deep trees. That is, when looking at joins, it doesn’t does not consider join nodes to the right of other join nodes. The join trees it considers are skewed to the left. Bushy trees where the join tree can take any shape are harder to implement but are often useful for big multi-way decision support joins. During optimization the join order is represented as an array of Optimizables. The first element in the array is the leftmost table in the join tree, and the successive elements in the array are the joining tables in the left-deep tree. The optimizer does an exhaustive search of query plans, going through all the join orders and, for each join order, all of the access paths for that order. It remembers the cheapest complete plan it has found so far, and does cost-based search space pruning. That is, it doesn’t does not consider plans that are more expensive that the best plan found so far. The optimizer goes through the search space depth-first. In other words, it adds a table to the join order, then goes through all the access paths for that table (choosing the cheapest one) before going on to the next position in the join order. When it gets all the way to the end of the join order, with all of the tables accounted for, it considers whether the plan it is looking at is the best one found so far. If so, it remembers it. Eventually, it exhausts all the tables to consider at a given depth in the join order, and it has to back up to the previous position in the join order and consider the next table there.

getNextPermutation() does the adding (and deletion) of tables to the join order under consideration by the optimizer. getNextDecoratedPermutation() goes through all the access paths for the current permutation (in the current implementation, it only has to consider the access paths for the innermost table in the join order, as the search is done depth-first). You can think of a decorated permutation as a join order with things like indexes and join strategies ”decorating” the nodes. The Optimizable interface represents anything in the query that can have an access path. In practice an Optimizable is usually a base table, but it can be anything that appears in a FROM list (for example, standard SQL allows a UNION in the FROM list). Different Optimizables have different choices for access paths. The optimizer uses the nextAccessPath() method on Optimizable to step through the different access paths. These include different join strategies (such as nested loop and hash join) and different conglomerates (the base table and the different indexes). Sometimes the Optimizable has to decide whether a given access path is feasible (for example, hash join is only feasible for equijoins) equatoins)