Authorea

Paf Paris edited untitled.tex over 8 years ago

Commit id: 6dc3a16264b5095f5d9349f38a3c917e47df6a3c

deletions | additions

\section{Related Work} \label{sec:related}

A solution to the problem of high conflict rates in group communication systems is to partition the load \cite{icdcs02-Jimenez}. In this approach however, update transactions cannot be executed on every replica. Clients have to predeclare for every transaction which elements in the database will be updated (so called \textit{conflict classes}). Depending on this set of conflict classes, a \textit{compound conflict class} can be deduced. Every possible compound conflict class is statically assigned to a replica; replicas are said to act as \texti{master site} for assigned classes. Incoming update transactions are broadcasted to all replicas using group communication, leading to a total order. Each replica decides then if it is the master site for a given transaction. Master sites execute transactions, other sites just install the resulting writesets, using the derived total order. All previous works suffer from a variety of problems. They either reduce response time by giving up consistency or enforce limitations via predeclaration of the access patterns or statically partitioned data. thus limiting scale-out possibility. \textbf{Ganymed} \cite{ganymed-middleware2004} proposed a middleware solution that addresses all of the above and our work is heavily influenced by it. The main idea behind Ganymed is a transaction scheduling algorithm, called RSI-PC, that separates update and read-only transactions. Ganymed routes update transactions to a \textit{main} or \textit{primary} server and queries to a potentially unlimited number of read-only copies. An RSI-PC scheduler can be used with \testit{Snapshot Isolation} (as defined in \cite{Berenson_1995}) based databases as replicas that offer the transaction isolation levels \textit{Serializable} and \textit{Read Committed}. Update transactions are directly forwarded to the master. After successful commit, the scheduler makes sure the corresponding write set is sent to the replicas and that every replica applies the write sets of different transactions in the same order as the corresponding commit occurred on the master database. The scheduler uses also the notation of a global database version number, something similar to Hihooi's TRANS-SET. Read-only transactions can be processed by any replica; the scheduler is free to decide where. Ganymed choses a valid replica based on \textit{least pending transactions} number. request first} rule. If a chosen replica is not updated to the latest global version number, the scheduler must delay the creation of the read-only snapshot snapshot. If no replica exists, containing the latest produced writeset, the start of the transaction is delayed. Ganymed chooses to delay the transaction execution although the underlying algorithm supports two additional options: for clients that are not willing to accept any delays for read-only transaction the query can be sent directly to the master (which is what Hihooi does by default in this occasion) or let the client specify a stalness threshold (a maximum age of the requested snapshot). To ensure a consistent state among replicas, the RSI-PC scheduler must ensure that writesets of update transactions get applied on all replicas in the same order. To achieve this Ganymed sends COMMIT operations to the master in a serialized manner, while the distribution of writesets to the replicas is handled by a FIFO queue on each replica. The Scheduler keeps a thread for each replica, that constantly applies the contents of each FIFO queue to its respective replica. Hihooi avoids this burden by sharing all writesets in the Memcache server which each replica polls periodically to apply any outstanding updates. Hihooi does not use write sets to propagate committed results to the replicas; instead it parses the SQL commands. Therefore Hihooi avoids the problem Both Ganymed and Hihooi can support whatever isolation level the underlying database engine offers with appropriate changes in the Scheduler and the Manager respectively. \cite{Krikellas_2010} uses a \textit{global commit delay} to provide strong consistency; a client is guaranteed that the updates of its last transaction are visible globaly before any other transaction takes place. Load balancing follows the same rule as Ganymed; \textit{least pending request first}. In this implementation all clients communicate directly with each replica; on each client commit Hihooi does not use write sets to propagate committed results to implements the replicas; instead it parses \testit{lazy fined-grained} approach, through transet propagation via the SQL commands. Nevertheless, we feel this should be implemented in the future. Memcache server. Hihooi can support whatever isolation level the underlying database engine offers with appropriate changes in the Manager. More recent research in database replication includes [] [] []. Although these works focus on ... , we review them briefly for context.

\subsection{The TPC-E Benchmark} Since we are interested in enterprises that involve transaction processing, we used the TPC-E benchmark \cite{tpce}. Other benchmarks for OLTP are TPC-C, which is ccc, and TPC-W which is now obsolete \cite{tpc-homepage}. The TPC-E benchmark uses a database to model a brokerage firm with customers who generate transactions related to trades, account inquiries, and market research. The brokerage firm in turn interacts with financial markets to execute orders on behalf of the customers and updates relevant account information. The benchmark is “scalable,” {\textquotedblleft}scalable,{\textquotedblright} meaning that the number of customers defined for the brokerage firm can be varied to represent the workloads of different-size businesses. The workload includes several OLTP queries of variable complexity, as well as different processing and memory demands, representing similar to real workload characteristics. The benchmark defines the required mix of transactions the benchmark must maintain. The TPC-E benchmark has a fixed workload of reads and writes that do not fit our demonstration goals. For this reasons we created three different combinations of workloads having different read-write ratios. The ratios chosen are 0\%, 5\%, 10\%, and 30\% writes in the total workload. The TPC-E scaling parameters were chosen as follows: 5000 customers, 1 working day of populated transactions and a scale–factor scale{\textendash}factor of 500, resulting in a database of size 21GB. \subsection{System Setup}

The first part of the evaluation analyzes performance and scalability. Hihooi was compared to a reference system consisting of a single PostgreSQL instance. We measured the performance of Hihooi in different configurations, from 1 to 8 extension DBs. Each setup was tested with 3 different workload mixes, as mentioned in \ref{wmix}. To measure the \textit{speedup} we kept the workload fixed and increased the number of extension DBs. To measure the \textit{scaleup} we increased the number of replicas while we proportionaly increased the workload.{Figures} [transactions per second] [latency (msec)] For Primary and Extension DBs [cpu utilization [\% Vs time]] - underutilized ?? [Network I [Bytes Vs time]] - balanced or not ?? [Network O [Bytes Vs time]] - balanced or not ?? how are the previous affected by the workload mix (read/write ratio) ?? Listener and Manager (Vs Tester = actual requests) Network traffic characteristic