[section] [theorem] [theorem]Lemma [subsection]

Machine Learning Chapter

[section] [theorem] [theorem]Lemma [subsection]

**Primero**, dar una introduccion a Machine Learning, que es, que hace, notacion, como se usa, supervised learning, clasificadores y algunos aspectos sobre el “cambio de paradigma” o “filosofia” o como quieran llamarlo. Comentar algo del foco en la data, el training error, predictive power, generalizaciones,etc . Ir llevando todo con un ejemplo sobre un dataset de llamados y con un logistic classifier.

**Despues** cross validation, variance, bias, errors, metrics, scores y similar.

**Finalizar** con 4/5 clasificadores usuales y explicar los conceptos que los motivan, las formulas, los hiperparametros y las funciones a minmizar. Tambien dar algun *overview* de como sirven para tal o cual caso, ventajas y desventajas segun tal autor:

multinomial naive bayes (por su simplicidad y eficiencia computacional)

“full” logistic regression (a diferencia de la intro anterior con la parte de regularizacion, con comentarios a sgd y su eficiencia, etc. )

Random Forests

Extension a gradient boosting

Comentarios sobre Boltzmann machines ( sin adentrarnos mucho en redes neuronales ) y Bernoulli RestrictedBM.

[section] [theorem] [theorem]Lemma [subsection]

Machine learning is a subfield of computer science with broad *theoeretical* intersections with statistics and mathematical optimization. At present time it has a wide range of application. A non-exhausitve list of applicaitons include self-driving cars, spam detection systems, face and/or voice recognition, temperature prediction in weather, AI opponents in games, disease prediction in patients, stock pricing, etc. Examples of these machine learning programs are now widespread to the point where their use has direct impact on the lives of millions of people. Due to this, machine learning has *practical* intersections with data and software engineering.

The most widely used definition of machine learning is attributed to Tom Mitchell: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E” (Mitchell 1997). To our purpose it is clear though that this definition ^{1} is not formally well-defined. However it serves to convey the idea of algorithms that automatically *learns* better over time and with more data. Note that the “goodness” of their performance is inherently subjected to the evaluation criteria chosen for the task. Because of this,*learning* is less associated with a cognitive definition in this context and more to a performance based approach.

It is divided into two main categories: Supervised and Unsupervised Learning. The difference is that in the fist case algorithms are set to produce outputs, noted by \(Y\), from input data, noted by \(X\) i.e. the computer has access to examples of outputs and tries to reproduce them based on information contained in \(X\). In this context, the algorithm is generally referred to as a *learner*.

The second type of problems is where the output \(Y\) is missing altogether from the data. In this scenario the most common objectives are clustering samples, density estimation and data compression. Linear regression and K-means clustering are examples of algorithms in each of these categories respectively.

Supervised learning is then sub-categorized according to the type of information available in the data for the task at hand. When the type of the output variable \(Y\) takes a discrete set of values then it is said that it is a supervised classification problem. On the other hand, when the output takes continuous (or dense in an open set of \(\mathbb{R}\)) range of *quantitative* values, the problem is said to be of supervised regression. Note that regression problems can be encoded into classification problems by assigning categories to ranges of outputs.For this work we will focus on the supervised aspect of machine learning.

In these type of problems the theoretical and the computational aspect are both of interest. Algorithms are generally applied in a given setting and, as such, are expected to be executed in a reasonable amount of time and with a certain level of performance. ^{2} In its essence, the machine learning method is a probabilistic model so it is very much the same as a statistical model. However, it differs specially in that its focus is generally on the models’ predictive abilities more than in the model’s parameters estimates.(Breiman 2001) The algorithms will be built and used to try and replicate as best as possible a given phenomena, without really identifying the true nature of the mechanisms behind this phenomena. As such, most applications will try to generalize a problem rather than identify the system behind it. Computational efficenciy and *scalabiilty* will also be of great importance when working with these applications. However only brief references to this matter will be given in this work. These subtle differences in the way of approaching a problem also reflect themselves in the terminology used. Here most terms have equivalent or similar notations in statistics. To start off, the *dependent* \(Y\) is called the target or label and the *independent* variables, *covariates* or *input variables* are named *features* in this case.

Other authors might reference machine learning as

*statistical learning*. See (Hastie 2009) as an example.↩Here the word

*reasonable*is used in a broad sense. It will depend entirely on time constraints, computational capacity, usage and other aspects of each learning application.↩

[section] [theorem] [theorem]Lemma [subsection]

For the purpose of this work we will be talking of the training set, noted by \(\mathrm{T}\) as the set of data examples. The objective is to build a probabilistic model which has the capacity to predict correctly the class instance of new data objects based on having seen information of other data objects.

For concreteness, let’s consider a reduced dataset built from Call Detail Records (CDRs) where samples are calls being made by users who can belong to any of following provinces : *Buenos Aires*, *Cordoba* and *Santa Fe*. Five measurements were made on all of the observations to account for the user’s number of calls and total duration of calls during a week of measurements. A short example of this dataset can be seen below:

\label{tab:sample_CDR}

User | CallsWeekend | TimeWeekend | CallsWeekDays | TimeWeekday | Province |
---|---|---|---|---|---|

BA343E | 15 | 89 | 8 | 24 | Santa Fe |

73F169 | 10 | 121 | 2 | 98 | Cordoba |

EA23AD | 12 | 43 | 5 | 154 | Buenos Aires |

In this form a row is representing the available data acquired from each acquired and columns represent types of measurements or information. In general, most machine learning problems will be associated with a training set \(\mathrm{T}\) of similar form as the one shown before. Where rows represent objects or *samples* and columns are measurements or *features* of our samples. Here \(\mathrm{T}\) will take the form of a paired couple of datasets \((X,Y)\) where \(X \in \mathbb{R}^{n \ x \ p}\) and <

Carlos Sarraute3 months ago · Public[section] [theorem] [theorem]Lemma [subsection]

Anda metiendo todas las referencias en un .bib unico que tiene todas las citas.