Authorea

Bhathiya edited Section_Apache_Derby_Architecture_Before__.tex about 8 years ago

Commit id: 89b4da10a2bee0fe162364e0d9f771d370458a83

deletions | additions

\Section {Apache Derby Architecture} Before going in to the details of Derby Optimizer lets look at the Apache Derby architecture. If we consider the module view of Derby architecture it is a system comprised of a monitor and a collection of modules. The monitor is code that maps module requests to implementations based upon the request and the environment. For example with JDK 1.3 the internal request for a JDBC driver the monitor selects Derby’s JDBC 2.0 implementation, while in JDK 1.4 the driver is the JDBC 3.0 implementation. This allows Derby to present a single JDBC driver to the application regardless of JDK and internally the correct driver is loaded. A module in Derby is a set of discrete functionality, such as a lock manager, JDBC driver, indexing method etc. A module’s interface is typically defined by a set of Java interfaces. For example the java.sql interfaces define a interface for a JDBC driver. All callers of a module do so purely through its interface to separate api from implementation. A module’s implementation is a set of classes that implement the required behavior and interfaces. Thus a module implementation can change or be replaced with a different implementation without affecting the callers’ code. Modules are either system wide (shared) or per-service with a service corresponding to a database. This architecture allows different modules to be loaded depending on the environment and in the past also supported different product configurations out of the same code base.If we consider the layer/box view of the Derby architecture, there are four main code areas that can be identified. They are JDBC, SQL, Store and Services. JDBC presents the only api to Derby to applications and consists of implementations of the java.sql and javax.sql classes for JDBC 2.0 and 3.0. Applications use Derby solely through its implementations of the top-level JDBC interfaces (Driver, DataSource, ConnectionPoolDataSource and XADataSource) and the remaining JDBC interfaces. The JDBC layer sits on top of the SQL layer. The SQL layer is split into two main logical areas, compilation and execution. SQL compilation is a five step process: \begin{enumerate} \item Parse using a parser generated by Javacc, results in a tree of query nodes \item Bind to resolve all objects (e.g. table names) \end{enumerate} 2) 3) Optimize to determine the best access path 4) Generation of a Java class (directly to byte code) to represent the statement plan 5) Loading of the class and creation of an instance to represent that connection’s state of the query The generated statement plan is cached and can be shared by multiple connections. DDL statements (e.g. CREATE TABLE) use a common statement plan to avoid generation of a Java class file. This implementation was driven by the original goal to have a small footprint. Using the JVM’s interepter was thought to be less code than having an internal one. It does mean that the first couple of times the statement plan is executed, it would be interpreted. After a number of executions, the Just-In-Time (JIT) compiler will decide to compile it into native code. Thus, running performance tests will see a boost after a number of iterations. In addition, calling into Java user-supplied methods (functions and procedures) is direct, rather than through reflection. SQL Execution is calling execute methods on the instance of the generated class that return a result set object. This result set is a Derby ResultSet class, not a JDBC one. The JDBC layer presents the Derby ResultSet as a JDBC one to the application. For a simple table scan the query would