this is for holding javascript data
Sankar edited Load on the Database.tex
over 9 years ago
Commit id: 88bf0eda5c7adc2177f66631032891530c6f1f58
deletions | additions
diff --git a/Load on the Database.tex b/Load on the Database.tex
index 6fa76bc..bb1bd9c 100644
--- a/Load on the Database.tex
+++ b/Load on the Database.tex
...
The choice of the datastructures and the design of the individual components of a database system depends a lot on the load on the database. The estimated \textbf{Queries Per Second (QPS)} numbers for the load on the database during average load and peak load, the duration for which the peak loads will continue will all come in handy.
More than the number of the calls that the database receives, the ratio of the type of the database calls plays a big factor in choosing a design. We can roughly classify the type of the database calls to
three the following four buckets:
\begin{itemize}
\item
Insert: INSERT: To create new records (write)
\item
Modify: UPDATE: To modify an existing record
(write/delete) (write)
\item DELETE: To remove a record (delete)
\item
Select: SELECT: To fetch a record, optionally based on a condition (read)
\end{itemize}
If the database will spend more than 99\% of the time on writes
(say (Example: as the backend for a logging application), then a Log Structured Merge Tree \cite{O_Neil_1996}
may will be
effective; conversely, effective. Conversely, if 99\% of the time will be
spent on reads and on
reads, a small set of hot keys, then
using memory maps
may prove to
load the pages from disk will be more efficient.
So, understanding If the
application need reads will distributed on a large set of keys with no noticeable hotness in the keys, then memory maps will not give big benefits, as the working set size will be larger. If there is
very important while designing a
distributed database. Even while choosing an existing database, having knowledge about combination of SELECT calls and UPDATE calls on the same set of keys, memory maps will be slower than doing regular I/O, due to the
nature mixture of
read and write where memory maps are not efficient.
The way in which applications access the database
workload by and the
application(s) on top will way in which databases makes design choices, should both be
useful. in clear understanding of each other and should know each other's strength and weaknesses.
Facebook started the Cassandra\cite{Lakshman_2009} distributed database project initially to perform well during parallel writes and later switched to HBase as they started running more data mining queries on the huge bigdata datasets that they have accumulated.