ROUGH DRAFT authorea.com/103182

# Overview

Beta regression is used when the output $$y$$ is between 0 and 1, that is $$y\in[0,1]$$. Common examples include the fraction of employees participating in a company 401k plan or any response that is a percentage. Linear regression has often been applied due to its simplicity but such data violates key assumptions. First, responses bounded by 0 and 1 are not normally distributed. Second, such data is usually heteroscedastic . For example, the variance shrinks as the mean approaches the boundary point 0, 1 (Liu 2014 p.1). Instead, beta regression assumes the response follows a beta distribution. The beta distribution is usually given by (Ferrari 2004) ${f(y;p,q)} = {\frac{\Gamma(p+q)}{\Gamma(p)\Gamma(q)}}y^{p-1}(1-y)^{q-1}, \quad 0<y<1$

Where the two parameters are $$p$$ and $$q$$. Changing the two parameters can alter the shape of distribution drastically, given the model a lot of flexibility. It is easy to show that ${E(y)} = \frac{p}{p+q} \equiv \mu$

We now change the parameters from $$p$$ and $$q$$ to $$\mu=\frac{p}{p+q}$$ and $$\phi=p+q$$. This change of variables will be helpful later. The beta distribution now looks like

${f(y;\mu,\phi)} = {\frac{\phi}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)}}y^{\mu\phi-1}(1-y)^{q-1}, \quad 0<y<1$

With the new parameters,

${E(y) = \mu}$

and

${Var(y) = \frac{\mu(1-\mu)}{1+\phi}}$

The parameter $$\phi$$ is known as the precision parameter since, for fixed $$\mu$$, the larger $$\phi$$ the smaller the variance of y; $$\phi^{-1}$$ is the dispersion parameter. The main motivation for using beta regression is that the beta distribution can take on many different shapes through the adjustment of the parameters $$\mu$$ and $$\phi$$. Some examples of beta distributions are shown below.

$\textbf{Y}= \begin{bmatrix} Y_{11} \\ Y_{12} \\ Y_{21} \\ Y_{22} \\ Y_{31} \\ Y_{32} \end{bmatrix}$

$\textbf{X}= \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix}$

$\beta= \begin{bmatrix} \mu_{1} \\ \mu_{2} \\ \mu_{3} \\ \end{bmatrix}$

$\varepsilon= \begin{bmatrix} \varepsilon_{11} \\ \varepsilon_{12} \\ \varepsilon_{21} \\ \varepsilon_{22} \\ \varepsilon_{31} \\ \varepsilon_{32} \end{bmatrix}$

$Y_{ij}= \beta_0 + \beta_1X_{ij1} + \beta_2X_{ij2} + \dots + \beta_{r-1}X_{ij,r-1} + \varepsilon_{ij}$