Authorea

Paf Paris edited untitled.tex over 8 years ago

Commit id: 566b1900b64071de3fd0439ad5ec2e67d515e7f6

deletions | additions

\section{Related Work} \label{sec:related} \textbf{Oracle Golden Gate 12c} \cite{golden-gate} shares similar ideas with Hihooi, although we were not able to find detailed design and implementation specifications. As new or updated data is commited to the source database, it is continuously captured and applied to one or more target systems. It is also oriented to transaction-driven applications. It aims at offering a real-time replication platform through log-based data capture, supporting heterogeneous platforms. To avoid latencies and anticipate scalability, it follows a decoupled architection based on four modules: \begin{itemize} \item The \textbf{Capture} module grabs commited transactions resulting from insert, update or delete and routes them for distribution. It does not require any changes to the underlying database engine, while it supports data compression and selectivity (at table, row or column granularity). \item \textbf{Trail Files} contain the most recent changed data in a transportable format for consuption by different applications. \item The \textbf{Delivery} module takes any changed transactional data placed in a trail file and applies it to the target database. It offers a variety of delivery semantics according to user-defined criteria. Additionally, it can publish data into flat files or stream data to other Big Data platforms. \item Finally, the \textbf{Manager} module is the controlling process that performs administrative and reporting activities via monitoring and control of the other modules. \end{itemize} \textbf{C-JDBC} \cite{c-jdbc}, is an open source database cluster middleware, which provides a Java application access to a cluster of databases transparently, through JDBC. A database can be distributed and replicated among several nodes composing a virtual database. C-JDBC balances the queries among these nodes. C-JDBC also handles node failures and provides support for check-pointing and hot recovery. Like Hihooi, it is compatible with any database engine that provides a JDBC driver and does not require any changes to the database engine to work. It consists of a generic JDBC driver, used by client applications and by a controller that handles load balancing and fault tolerance. The controller also handles the distribution and acts as a proxy between the driver and the database backends. Each controller is able to control multiple virtual databases, but each one has its own Request Manager. As in Hihooi, the replicas are hidden from the application. The controller exposes a single virtual database and enables database back-ends to be added or removed dyanamically and transparently. It also offers early responses to updates, where the controller returns the result as soon as one, the majority or all backends have executed the operation. \testit{Unlike Hihooi, where the extension databases send their results directly to the client application, in C-JDBC results are passing through the controller where they are serialized and send back through the communication channel.} C-JDBC offers three degrees of replication in a virtual database, namely full partitioning (zero-replication), full replication, and partial replication. The Request Manager decides the scheduling of requests among the nodes. It also contains a query result cache and performs load balancing with one of the implemented algorithms; round-robin, weighted round-robin, and least pending requests first. Since the controller can potential become a single point of failure, C-JDBC offers horizontal scalability, such that two controllers can manage the same vistual database. Additionally, vertical scalability is also offered, enabling users to build a tree structure of controllers with the native database drivers connected at the leaves. \textbf{Distibuted Versioning} and conflict aware scheduling is an approach introduced in \cite{distr-versioning}. The key idea is to use a middleware based scheduler which accepts transactions from clients and routes them to a set of replicas. Internally the middleware layer uses lazy replication, whilst offering 1-copy-serializability. Consistency is maintained through bookkeeping of \textit{versions} of tables in all the replicas. Every transactions that updates a table increases the corresponding version number. At the begining of every transaction, clients have to inform the scheduler about the tables they are going to access. The scheduler then uses this information to assign versions of tables to the transactions. Also, replicas are kept in sync by sending the full SQL update statements. Since table level locking reduces concurrency, distributed versioning also introduces the concept of \textit{early version releases}. This allows clients to notify the scheduler when they have used a table for the last time in a transaction. Similar to the C-JDBC approach, SQL statements have to be parsed at the middleware level for locking purposes. \textbf{Postgres-R} \cite{postgres-r} uses an eager replication model and group communication. It is implemented as an extension to PostgreSQL v6.4 \cite{postgres}. It tries to reduce messages and synchronization overhead by bundling writes into a single write set message using shadow copies to perform updates. As most replication protocols, it applies the read-one/write all approach using localized reads and pre-orders transactions to ensure order semantics guaranteeing that all nodes receive the writes sets in exactly the same order. Postgres-R has no centralized components and load balancing is left to the clients. This can become a problem in case of bursts of update traffic, since the system is busy resolving conflicts between the replicas. Another shortcoming of Postgres-R is that it has a rather intrusive implementation, requiring modifications to the underlying database, something that is not always feasible and limits database heterogeneity. A solution to the problem of high conflict rates in group communication systems is to partition the load \cite{icdcs02-Jimenez}. In this approach however, update transactions cannot be executed on every replica. Clients have to predeclare for every transaction which elements in the database will be updated (so called \textit{conflict classes}). Depending on this set of conflict classes, a \textit{compund conflict class} can be deduced. Every possible compound conflict class is statically assigned to a replica; replicas are said to act as \texti{master site} for assigned classes. Incoming update transactions are broadcasted to all replicas using group communication, leading to a total order. Each replica decides then if it is the master site for a given transaction. Master sites execute transactions, other sites just install the resulting writesets, using the derived total order. \textbf{Ganymed} \cite{ganymed-middleware2004} \section{Experimental Evaluation} \label{sec:evaluation}