Authorea

Paf Paris edited untitled.tex over 8 years ago

Commit id: 44f32b64f050cb0e730dceb931720b7e306534c3

deletions | additions

\textbf{Distibuted Versioning} and conflict aware scheduling is an approach introduced in \cite{distr-versioning}. The key idea is to use a middleware based scheduler which accepts transactions from clients and routes them to a set of replicas. Internally the middleware layer uses lazy replication, whilst offering 1-copy-serializability. Consistency is maintained through bookkeeping of \textit{versions} of tables in all the replicas. Every transactions that updates a table increases the corresponding version number. At the begining of every transaction, clients have to inform the scheduler about the tables they are going to access. The scheduler then uses this information to assign versions of tables to the transactions. Also, replicas are kept in sync by sending the full SQL update statements. Since table level locking reduces concurrency, distributed versioning also introduces the concept of \textit{early version releases}. This allows clients to notify the scheduler when they have used a table for the last time in a transaction. Similar to the C-JDBC approach, SQL statements have to be parsed at the middleware level for locking purposes. \textbf{Postgres-R} \cite{postgres-r} uses an eager replication model and group communication. It is implemented as an extension to PostgreSQL v6.4 \cite{postgres}. It tries to reduce messages and synchronization overhead by bundling writes into a single write set message using shadow copies to perform updates. As most replication protocols, it applies the read-one/write all approach using localized reads and pre-orders transactions to ensure order semantics guaranteeing that all nodes receive the writes sets in exactly the same order. An important result extracted from \cite{postgres-r} is the insight that the distribution of full SQL update statements, as often done in eager update-everywhere approaches, is not optimal. Performance can be significantly improved by executing SQL statements only once and then propagating the resulting database changes (writesets) to other replicas. Postgres-R has no centralized components and load balancing is left to the clients. This can become a problem in case of bursts of update traffic, since the system is busy resolving conflicts between the replicas. Another shortcoming of Postgres-R is that it has a rather intrusive implementation, requiring modifications to the underlying database, something that is not always feasible and limits database heterogeneity. A solution to the problem of high conflict rates in group communication systems is to partition the load \cite{icdcs02-Jimenez}. In this approach however, update transactions cannot be executed on every replica. Clients have to predeclare for every transaction which elements in the database will be updated (so called \textit{conflict classes}). Depending on this set of conflict classes, a \textit{compund \textit{compound conflict class} can be deduced. Every possible compound conflict class is statically assigned to a replica; replicas are said to act as \texti{master site} for assigned classes. Incoming update transactions are broadcasted to all replicas using group communication, leading to a total order. Each replica decides then if it is the master site for a given transaction. Master sites execute transactions, other sites just install the resulting writesets, using the derived total order. Our All previous works suffer from a variety of problems. They either reduce response time by giving up consistency or enforce limitations via predeclaration of the access patterns or statically partitioned data. thus limiting scale-out possibility. \textbf{Ganymed} \cite{ganymed-middleware2004} proposed a solution that addresses all of the above and our work is heavily influenced by \textbf{Ganymed} \cite{ganymed-middleware2004}. it. The main idea behind Ganymed is a transaction scheduling algorithm, called RSI-PC, that separates update and read-only transactions. Ganymed is a middleware solution that routes update transactions to a main \textit{main} or primary \textit{primary} server and queries to a potentially unlimited number of read-only copies. More recent research in database replication includes [] [] []. Although these works focus on ... , we review them briefly for context.

For the experiments, a group of machines was used to host the different entities of Hihooi. For each component (Manager, Listener, Primary DB, and extension DBs) a dedicated machine was used. All machines shared the same configuration (m4.large) and were deployed in AWS in a local LAN. Before starting any experiment, all databases were always reset to an initial condition, to ensure that every experiment started from the same, constant state. During the experiments, all transactions involvig a \textit{write} were executed within a \emphasis{START TRANSACTION} - \emphasis{COMMIT} BLOCK. \textbf{Databse configuration}: default values, no logging, default isolation level (read commited) \subsection{Part 1. Performace and Scalability} The first part of the evaluation analyzes performance and scalability. Hihooi was compared to a reference system consisting of a single PostgreSQL instance. We measured the performance of Hihooi in different configurations, from 1 to 8 extension DBs. Each setup was tested with 3 different workload mixes, as mentioned in \ref{wmix}.