[section] [theorem] [theorem]Lemma [subsection]

Machine Learning in Practice: Technical Observations

\label{section-technicalObservations} TODO A desarrollar Python, sklearn, pandas, graphlab, etc

Data process raw data by reading in chunks from huge files (compressed filesizes amount to 1TB), applying filters like modulus 10.

On the nature of computational issues such as memory size, disk size, parallelization, multi-core, linear algebra routines.

In general, algorithms will load all data in RAM and execute optimization routines. If KFolds is used, some impelnemntations will run learning routines simultaneously in each fold group and keep the “best” scores at the end.

Joblib, sklearn and Graphlab are all Python modules