Authorea

Brandon Holt edited Introduction.md about 9 years ago

Commit id: 6d94d50a5ea89dc005171aae2af88b544bb3aea5

deletions | additions

We argue that the correct way to express and enforce these semantics is through the use of an old computer science standby: abstract data types (ADTs). Rather than treating the values in a key/value store simply as strings, making them into more complex data types provides a richer interface, exposing ample opportunities for optimization to the database and a clean mechanism for programmers to express their intentions. Performance benefits come from understanding the properties of ADT operations: those that commute with each other can be performed concurrently, even on multiple copies of the record. This means that transactions whose operations commute abort less, approaching the performance without transactions. This cleanly captures the cases described above where conflicts were unobservable or ordering didn't matter but in a safe way because any operations that don't in fact commute will be handled with traditional concurrency control. Using insights from multi-core concurrent data structures, we show in Section ?? \ref{sec:commutativity} that it is practical to reason about the matrix of commutativity among operations and build implementations that make the right tradeoff between the amount of concurrency allowed and the efficiency of tracking it. Selecting the right data type for the job gives programmers a clean, precise way to express their desired behavior. For instance, rather than using a generic integer to generate unique identifiers, a `UUID` type, realizing that contiguity of ids isn't necessary, can be trivially parallelized and distributed. Though this is an extremely simple case (and nearly universally adopted optimization), it fits the same mold as more nuanced decisions, such as choosing to represent the number of times a given post has been "retweeted" as a `HyperLogLog`, which can efficiently yield the approximate number of unique users, rather than a costly precise `Set`. Though selecting data structures for the job at hand is nothing new to programmers, only a handful of databases, such as Redis, MongoDB or Riak, support this flexibility, and they do not use the abstraction it affords to enforce strongly consistent transactions.