Design choices in a distributed database


A distributed database system, generically designed to work with reasonable performance, on any type of load, will be comparitively inefficient to a distributed database system designed specifically to suit the storage requirements of the applications that will be built on top. In this paper, We present a few design choices which should be considered, not only while designing a distributed database system, but also while choosing a existing distributed database system for the storage needs of your applications. None of these design choices are inherently inferior or superior. The weightage for the individual choices should be dictated by the needs of your application. We have tried to present various design choices and used industry established HBase and Cassandra database systems to contrast these choices.

Load on the Database

The choice of the datastructures and the design of the individual components of a database system depends a lot on the load on the database. The estimated Queries Per Second (QPS) numbers for the load on the database during average load and peak load, the duration for which the peak loads will continue will all come in handy.

More than the number of the calls that the database receives, the ratio of the type of the database calls plays a big factor in choosing a design. We can roughly classify the type of the database calls to the following four buckets:

  • INSERT: To create new records (write)

  • UPDATE: To modify an existing record (write)

  • DELETE: To remove a record (delete)

  • SELECT: To fetch a record, optionally based on a condition (read)

If the database will spend more than 99% of the time on writes (Example: as the backend for a logging application), then a Log Structured Merge Tree (O’Neil 1996) will be effective. Conversely, if 99% of the time will be spent on reads and on a small set of hot keys, then using memory maps to load the pages from disk will be more efficient. If the reads will distributed on a large set of keys with no noticeable hotness in the keys, then memory maps will not give big benefits, as the working set size will be larger. If there is a combination of SELECT calls and UPDATE calls on the same set of keys, memory maps will be slower than doing regular I/O, due to the mixture of read and write where memory maps are not efficient.

The way in which applications access the database and the way in which databases makes design choices, should both be in clear understanding of each other and should know each other’s strength and weaknesses.

Facebook started the Cassandra(Lakshman 2009) distributed database project initially to perform well during parallel writes and later switched to HBase as they started running more data mining queries on the huge bigdata datasets that they have accumulated.

Filesystem && Database :: Husband && Wife

A filesystem and a database is a match made in Heaven. I leave it to you to decide what component does which spousal role, though. Synergistic will be a word that will aptly define the combination of the filesystem and the database. Just like every other marriage, this filesystem and database integeration too needs a lot of time, effort, careful planning and love. If we design badly, we may endup with a system where both the filesystem and the database may be doing the same thing redundantly or both may skip important tasks.

Data Compression


Rake Awareness



Checksums for data

Actual distribution of data

We can ask the database to merely write the files to a distributed filesystem and let the fs take care of the distributedness.

  • Swift from OpenStack

  • Ceph and libceph

  • Lustre etc.