Introduction

The move to non-relational (NoSQL) databases was motivated by a desire for scalability and flexibility. People found that by not enforcing strong consistency, they could better scale services to millions or billions of users while meeting tight performance goals. By forgoing the structure of relational schemas, they gained more direct control over performance at the cost of significant automatic optimization.
The observation has been that relaxed or eventual consistency is a good fit for many applications built on top of these NoSQL because potential conflicts are difficult to observe in practice and that users are likely to accept minor inconsistencies, such as two tweets being out of temporal order on their Twitter timeline. However, this leaves much to chance; there is likely no guarantee that more significant inconsistencies are impossible.

In situations where consistency really is crucial, developers can enforce stronger guarantees on top of any platform, but they often get it wrong. Google, recognizing the benefit to programmer productivity, reintroduced serializable transactions to their next generation database, Spanner \cite{Spanner}. However, this leaves them with two extreme choices, with significant performance impact. If certain parts of an application can tolerate imprecision, then why not capture those properties in the database programming model? What if the programmer could express the semantics they desire succintly and precisely, in a way that the database can better reason about conflicts and optimize performance, without giving up the flexibility and scalability of their existing systems?

Abstract data types (ADTs) are the solution to this problem. Rather than limiting the records in databases to being primitive types like strings or integers, raising them to more complex data types provides a richer interface, exposing ample opportunities for optimization to the database and a precise mechanism to express the intentions of programmers.

Performance benefits come from understanding the properties of ADT operations: those that commute with each other can be performed concurrently, even on multiple copies of the record. This means that transactions whose operations commute abort less, approaching the performance without transactions. This cleanly captures the cases described above where conflicts were unobservable or ordering didn't matter but in a safe way because any operations that don't in fact commute will be handled with traditional concurrency control. Using insights from multi-core concurrent data structures, we show in Section \ref{sec:commutativity} that it is practical to reason about the matrix of commutativity among operations and build implementations that make the right tradeoff between the amount of concurrency allowed and the efficiency of tracking it.

Selecting the right data type for the job gives programmers a clean, precise way to express their desired behavior. For instance, rather than using a generic integer to generate unique identifiers, a UUID type, realizing that contiguity of ids isn't necessary, can be trivially parallelized and distributed. Though this is an extremely simple case (and nearly universally adopted optimization), it fits the same mold as more nuanced decisions, such as choosing to represent the number of times a given post has been "retweeted" as a HyperLogLog, which can efficiently yield the approximate number of unique users, rather than a costly precise Set. Though selecting data structures for the job at hand is nothing new to programmers, only a handful of databases, such as Redis, MongoDB or Riak, support this flexibility, and they do not use the abstraction it affords to enforce strongly consistent transactions.