Authorea

Sankar edited Load on the Database.tex over 9 years ago

Commit id: 9a17eaaa29a039fe54f9afc4745a8bb23748f7aa

deletions | additions

\section{Load on the Database} The choice of the datastructures and the design of the individual components of a database systemcomponents depends a lot on the load on the database. In addition to the raw \textbf{Input Output Operations Per Second (IOPS)} estimate, we also need to know the ratio of the type of the I/O requests (\textbf{read or write}). A generically designed distributed database may actually prove to be inefficient for many usecases which could have better performance, if we design as per the application requirement. To give an example, if If the database will spend more than 99\% of the time on writes (say a logging application), then a Log Structured Merge Tree \cite{O_Neil_1996} may be effective; conversely, if 99\% of the time will be on reads, then memory maps may prove to be more efficient. So, understanding the application need is very important while designing a distributed database. Even while choosing an existing database, having knowledge about the nature of the database workload by the application(s) on top will be useful. Facebook started the Cassandra\cite{Lakshman_2009} distributed database project initially to perform well during parallel writes and later switched to HBase as they started running more data mining queries on the huge bigdata datasets that they have accumulated.