Logistic regression is used for classification. The independent variable of the data is quantitative and the dependent variable of the data is binary (0 or 1) i.e. in a class or not. Instead of modeling the response (0 or 1), the logistic model’s dependent variable is the probability that a data point belongs in the class.

*logistic function* is fit to the data. Regerring to the plot above, the model (blue curve) can take in the balance and output the probability of default. Notice that it does tell you if the person defaults or not; it tells you the probability of default. In order to perform classification, you must choose a cutoff, say 0.5, where all values above are predicted as default and all below are predicted as not default.

Introducion to Statistical Learning (James 2013) - section 4.3

Advanced Data Analysis from an Elementary Point of View (Dobson 2001) - chapter 7

An Introduction to Generalized Linear Models - chapter 7

Where \(y\in\lbrace0,1\rbrace\), we are fitting the logistic function,

\[{p(y=1;\beta)} = \frac{1}{1 + e^{-(\beta_0+\beta_1x)}}\]

This is the probability that \(y=1\) parameterized by \(\beta_i\), also written as \(p(x)\) since the probability defends on \(x\). In order to fit this function to the data, we must find the parameters \(\beta_i\) that maximize the *likelihood function*, \(L(\beta;y)\). The *likelihood function* is algebraically the same as the joint probability density function \(f(y;\beta)\) except that the change in notation reflects a shift of emphasis from the random variables \(\textit{y}_i\), with parameters \(\beta\) fixed, to parameters \(\beta\) with \(\textit{y}_i\) fixed. The *likelihood function* is given by

\[{L(\beta)} = \prod_{i=1}^np(x_i)^{y_i}(1-p(x_i))^{1-y_i}\]

We could substitute the equation for *p(x)* from above but we will not here. The form of the likelihood function, which is the same as the distribution of the response, reflects the fact that we assume the response follows a *binomial distribution*. Finding the parameters \(\beta_i\) that maximizes the likelihood function involves numerical analysis that are outside the scope of this paper and are not a concern for a modeler.

As stated in the overview, logistic regression gives a model whose output is the probability of y=1. To actually predict if the instance is in the class or not, the user must choose a cutoff value, say 0.5. Then we can say that \(y = 1\) for \(p(x)>0.5\). In the case of only one predictor variable, this corresponds to choosing a cutoff x value. With two predictors, a line is chosen that divides the values in and out of the class. And so on for higher dimensions. This is called the decision boundary, and the classification becomes clear.

Logistic regression can be used when there are n predictor variables, called *multiple logistic regression*. The curve that is fit to the data now takes the form

\[{p(y=1;\beta)} = \frac{1}{1 + e^{-(\beta_0+\beta_1x_1+\cdots+\beta_nx_n)}}\]