Authorea

Awaiting Activation edited untitled.tex about 8 years ago

Commit id: 99abfbd1a6624ab1f4b969c0f5aafa9cfca87f2d

deletions | additions

\section{Overview} Beta regression is used when the output $y$ is between 0 and 1, that is $y\in[0,1]$. Common examples include the fraction of employees participating in a company 401k plan or any response that is a percentage. Such Linear regression has often been applied due to its simplicity but such data violates key assumptions. First, responses bounded by 0 and 1 are not normally distributed. Second, such data is usually heteroskedastic: they show more variation around heteroscedastic. For example, the mean and less variation variance shrinks aswe approach the boundaries 0 and 1. Also, the distributions are usually asymmetric and Gaussian-based mean approaches are often inaccurate. the boundary point 0, 1. Instead, beta regression assumes the response to follow a \textit{beta distribution}. The beta distribution is usually given by \begin{equation} {f(y;p,q)} = {\frac{\Gamma(p+q)}{\Gamma(p)\Gamma(q)}}y^{p-1}(1-y)^{q-1}, \quad 0 \end{equation}