Sankar edited Load on the Database.tex  over 9 years ago

Commit id: 88bf0eda5c7adc2177f66631032891530c6f1f58

deletions | additions      

       

The choice of the datastructures and the design of the individual components of a database system depends a lot on the load on the database. The estimated \textbf{Queries Per Second (QPS)} numbers for the load on the database during average load and peak load, the duration for which the peak loads will continue will all come in handy.  More than the number of the calls that the database receives, the ratio of the type of the database calls plays a big factor in choosing a design. We can roughly classify the type of the database calls to three the following four  buckets: \begin{itemize}  \item Insert: INSERT:  To create new records (write) \item Modify: UPDATE:  To modify an existing record (write/delete) (write)  \item DELETE: To remove a record (delete)  \item Select: SELECT:  To fetch a record, optionally based on a condition (read) \end{itemize}  If the database will spend more than 99\% of the time on writes (say (Example: as the backend for  a logging application), then a Log Structured Merge Tree \cite{O_Neil_1996} may will  be effective; conversely, effective. Conversely,  if 99\% of the time will be spent on reads and  on reads, a small set of hot keys,  then using  memory mapsmay prove  to load the pages from disk will  be more efficient. So, understanding If  the application need reads will distributed on a large set of keys with no noticeable hotness in the keys, then memory maps will not give big benefits, as the working set size will be larger. If there  isvery important while designing  a distributed database. Even while choosing an existing database, having knowledge about combination of SELECT calls and UPDATE calls on the same set of keys, memory maps will be slower than doing regular I/O, due to  the nature mixture  of read and write where memory maps are not efficient.  The way in which applications access  the database workload by and  the application(s) on top will way in which databases makes design choices, should both  be useful. in clear understanding of each other and should know each other's strength and weaknesses.  Facebook started the Cassandra\cite{Lakshman_2009} distributed database project initially to perform well during parallel writes and later switched to HBase as they started running more data mining queries on the huge bigdata datasets that they have accumulated.