Related Work


Oracle Golden Gate 12c (Corp. 2015) is oriented to transaction-driven applications. As new or updated data is commited to the source database, it is continuously captured and applied to one or more target systems. It aims at offering a real-time replication platform through log-based data capture, supporting heterogeneous platforms. To avoid latencies and anticipate scalability, it follows a decoupled architection based on four modules: The Capture module grabs commited transactions resulting from insert, update or delete and routes them for distribution. It does not require any changes to the underlying database engine, while it supports data compression and selectivity (at table, row or column granularity). Trail Files contain the most recent changed data in a transportable format for consuption by different applications. The Delivery module takes any changed transactional data placed in a trail file and applies it to the target database. It offers a variety of delivery semantics according to user-defined criteria. Additionally, it can publish data into flat files or stream data to other Big Data platforms. Finally, the Manager module is the controlling process that performs administrative and reporting activities via monitoring and control of the other modules.

C-JDBC (Cecchet 2004), is an open source database cluster middleware, which provides a Java application access to a cluster of databases transparently, through JDBC. A database can be distributed and replicated among several nodes composing a virtual database. C-JDBC balances the queries among these nodes. C-JDBC also handles node failures and provides support for check-pointing and hot recovery. Like Hihooi, it is compatible with any database engine that provides a JDBC driver and does not require any changes to the database engine to work. It consists of a generic JDBC driver, used by client applications and by a controller that handles load balancing and fault tolerance. The controller also handles the distribution and acts as a proxy between the driver and the database backends. Each controller is able to control multiple virtual databases, but each one has its own Request Manager. As in Hihooi, the replicas are hidden from the application. The controller exposes a single virtual database and enables database back-ends to be added or removed dyanamically and transparently. It also offers early responses to updates, where the controller returns the result as soon as one, the majority or all backends have executed the operation. C-JDBC offers three degrees of replication in a virtual database, namely full partitioning (zero-replication), full replication, and partial replication. The Request Manager decides the scheduling of requests among the nodes. It also contains a query result cache and performs load balancing with one of the implemented algorithms; round-robin, weighted round-robin, and least pending requests first. Since the controller can potential become a single point of failure, C-JDBC offers horizontal scalability, such that two controllers can manage the same vistual database. Additionally, vertical scalability is also offered, enabling users to build a tree structure of controllers with the native database drivers connected at the leaves.

Distibuted Versioning and conflict aware scheduling is an approach introduced in (Amza 2003). The key idea is to use a middleware based scheduler which accepts transactions from clients and routes them to a set of replicas. Internally the middleware layer uses lazy replication, whilst offering 1-copy-serializability. Consistency is maintained through bookkeeping of versions of tables in all the replicas. Every transactions that updates a table increases the corresponding version number. At the begining of every transaction, clients have to inform the scheduler about the tables they are going to access. The scheduler then uses this information to assign versions of tables to the transactions. Also, replicas are kept in sync by sending the full SQL update statements. Since table level locking reduces concurrency, distributed versioning also introduces the concept of early version releases. This allows clients to notify the scheduler when they have used a table for the last time in a transaction. Similar to the C-JDBC approach, SQL statements have to be parsed at the middleware level for locking purposes.

Postgres-R (Kemme 2000) uses an eager replication model and group communication. It is implemented as an extension to PostgreSQL v6.4 (PostgresSQL 2015). It tries to reduce messages and synchronization overhead by bundling writes into a single write set message using shadow copies to perform updates. As most replication protocols, it applies the read-one/write all approach using localized reads and pre-orders transactions to ensure order semantics guaranteeing that all nodes receive the writes sets in exactly the same order. An important result extracted from (Kemme 2000) is the insight that the distribution of full SQL update statements, as often done in eager update-everywhere approaches, is not optimal. Performance can be significantly improved by executing SQL statements only once and then propagating the resulting database changes (writesets) to other replicas. Postgres-R has no centralized components and load balancing is left to the clients. This can become a problem in case of bursts of update traffic, since the system is busy resolving conflicts between the replicas. Another shortcoming of Postgres-R is that it has a rather intrusive implementation, requiring modifications to the underlying database, something that is not always feasible and limits database heterogeneity. (As pointed out in (Cecchet 2008) Middle-R was developed to alleviate these constraints, and implement Postgres-R functionality to the middleware).

A solution to the problem of high conflict rates in group communication systems is to partition the load (Jiménez-Peris 2002). In this approach however, update transactions cannot be executed on every replica. Clients have to predeclare for every transaction which elements in the database will be updated (so called conflict classes). Depending on this set of conflict classes, a compound conflict class can be deduced. Every possible compound conflict class is statically assigned to a replica; replicas are said to act as for assigned classes. Incoming update transactions are broadcasted to all replicas using group communication, leading to a total order. Each replica decides then if it is the master site for a given transaction. Master sites execute transactions, other sites just install the resulting writesets, using the derived total order.

All previous works suffer from a variety of problems. They either reduce response time by giving up consistency or enforce limitations via predeclaration of the access patterns or statically partitioned data. thus limiting scale-out possibility. Ganymed (Plattner 2004) proposed a middleware solution that addresses all of the above and our work is heavily influenced by it. The main idea behind Ganymed is a transaction scheduling algorithm, called RSI-PC, that separates update and read-only transactions. Ganymed routes update transactions to a main or primary server and queries to a potentially unlimited number of read-only copies. An RSI-PC scheduler can be used with Snapshot Isolation (following (Berenson 1995)) based databases as replicas that offer the transaction isolation levels Serializable and Read Committed. Update transactions are directly forwarded to the master. After successful commit, the scheduler makes sure the corresponding write set is sent to the replicas and that every replica applies the write sets of different transactions in the same order as the corresponding commit occurred on the master database. The scheduler uses also the notation of a global database version number, something similar to Hihooi’s TRANS-SET. Read-only transactions can be processed by any replica; the scheduler is free to decide where. Ganymed choses a valid replica based on least pending request first rule. If a chosen replica is not updated to the latest global version number, the scheduler must delay the creation of the read-only snapshot. If no replica exists, containing the latest produced writeset, the start of the transaction is delayed. Ganymed chooses to delay the transaction execution although the underlying algorithm supports two additional options: for clients that are not willing to accept any delays for read-only transaction the query can be sent directly to the master (which is what Hihooi does by default in this occasion) or let the client specify a stalness threshold (a maximum age of the requested snapshot). To ensure a consistent state among replicas, the RSI-PC scheduler must ensure that writesets of update transactions get applied on all replicas in the same order. To achieve this Ganymed sends COMMIT operations to the master in a serialized manner, while the distribution of writesets to the replicas is handled by a FIFO queue on each replica. The Scheduler keeps a thread for each replica, that constantly applies the contents of each FIFO queue to its respective replica. Hihooi avoids this complexity by sharing all writesets in the Memcache server which each replica polls periodically to apply any outstanding updates.

(Krikellas 2010) uses a commit delay mechanism to provide strong consistency in a multi-master scheme; a client is guaranteed that the updates of its last transaction are visible globaly before any other transaction takes place. In this implementation all clients communicate with a replica via a load balancer (following the least pending request first rule). Upon reception of a transaction commit request from a client, the writeset is examined. If it is empty (read-only) the transaction commits locally. If it is an update transaction it needs to be checked whether it conflicts with any transaction that has commited after the start of this transaction and if no conflicts exists, the wirteset is forwarded to the other replicas. A certifier maintains the total order of committed transactions update and decides whether an update transaction commits.

Hihooi implements the lazy fined-grained approach, through transet propagation via the Memcache server.

Hihooi’s design, proves to be flexible enough and allows clients to choose between the available transaction isolation levels provided by the underlying databases by setting the SET-TRANSACTION parameter before each START-TRANSACTION request.

In the context of (Cecchet 2008) Hihooi uses a master-slave replication protocol, query interception at the JDBC driver, query-level load balancing, and statement replication.

More recent research in database replication includes Megastore (Baker 2011), Calvin (Thomson 2012), and Spanner(Corbett 2013). Although these works focus on orders of magnitude bigger data scales (while Hihooi focuses on scaling the workload), we mention them for completeness. Google’s Megastore uses fine-grained partiotioning of data and offers high availability and serializable ACID semantics, through syncronous replication, but transactions are limited to small subsets of the database. Calvin uses partitioning and supports both synchronous and asynschronous replication. When multiple nodes need to agree to a plan on how to handle a particular transaction, they do it outside the transaction boundaries - that is, before acquiring locks and begin executing the transaction. The plan is saved on persistent storage to facilitate consistent recoveries. It uses a deterministic ordering guarantee to reduce the contention costs of distributed transactions and supports disk-based storage by employing artificial delays, to achieve synchronous replication. Finally, Google’s Spanner, is a multi-version, globally-distributed, synchronously-replicated database, supporting externally-consistent distributed transactions. Spanner’s main goal is managing cross-datacenter replicated data, focused on high availability even in the face of wide-area natural disasters.


  1. Oracle Corp.. Oracle GoldenGate12c: Real-Time Access to Real-Time Information, White Paper. (2015).

  2. Emmanuel Cecchet, Julie Marguerite, Willy Zwaenepoel. C-jdbc: Flexible database clustering middleware. 9–18 (2004).

  3. Cristiana Amza, Alan L. Cox, Willy Zwaenepoel. Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites. 282–304 (2003). Link

  4. Bettina Kemme, Gustavo Alonso. Don’t be lazy, be consistent: Postgres-R, A new way to implement Database Replication. 134–143 (2000).

  5. PostgresSQL. (2015). Link

  6. Emmanuel Cecchet, George Candea, Anastasia Ailamaki. Middleware-based database replication. (2008). Link

  7. Ricardo Jiménez-Peris, Marta Patiño-Mart', Bettina Kemme, Gustavo Alonso. Improving the Scalability of Fault-Tolerant Database Clusters. 477–484 (2002). Link

  8. Christian Plattner, Gustavo Alonso. Ganymed: Scalable Replication for Transactional Web Applications. 155–174 (2004).

  9. Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O, Patrick O. A critique of ANSI SQL isolation levels. (1995). Link

  10. Konstantinos Krikellas, Sameh Elnikety, Zografoula Vagena, Orion Hodson. Strongly consistent replication for a bargain. (2010). Link

  11. Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, Vadim Yushprakh. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. 223–234 (2011). Link

  12. Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi. Calvin. (2012). Link

  13. James C. Corbett, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Jeffrey Dean, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser. Spanner. ACM Trans. Comput. Syst. 31, 1–22 (2013). Link

  14. S. Elnikety, W. Zwaenepoel, F. Pedone. Database Replication Using Generalized Snapshot Isolation. (2005). Link