Authorea

Paf Paris edited untitled.tex about 8 years ago

Commit id: 64a0cc4556ebc84256b145c8bffe08e228202c64

deletions | additions

\title{Extracts from various articles} Appearing in a rather random order. Will tidy up later... The problem was initially refer to as Extract - Transform - Load. The basic methodology was to: \begin{itemize} \item{construct a local schema} \item{write a connector to do the extraction} \item{write transformations to cleanup the data} \item{load the data in the data warehouse} \end{itemize} Zachary G. Ives in \cite{cidr2015-Ives} says that a view \textit{at scale} yields many benefits, and this is evident in \cite{ieee-3-googlers}. "Follow the data. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data. Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail." Georgia Kapitsaki in \cite{kapitsaki-2015} proposes a context extraction technique from existing datasets.