3.1.3 Root Transform Local Linear Regression

Root Transform Local Linear Regression (RTLLR) is a nonparametric technique that aims to turn probability density estimation into a nonparametric regression problem. The original method proposed in statistics literature \cite{Brown_2009} aims to decrease the bias from choosing parametric models and applies a transformation in order to stabilize the variance of the data. RTLLR also avoids the issue with boundary bias that comes with KDE estimation \cite{2006}. In this subsection, we introduce the motivation behind the RTLLR model, build the foundation and reasons why the model works and present the implementation of the model.
MOTIVATION
The motivation behind RTLLR is improving on the histogram's estimate of the pdf. Let \(X_1,X_2,...X_n\) be the univariate data with pdf \(f_X\). Without loss of generality, we assume that this data has been normalized to \(\left[0,1\right]\). Let \(T\) be a positive integer such that \(T\approx\frac{n}{10}\) \cite{2006}. Bin the data into \(T\) equal length intervals on the unit interval and let \(Q_i\) be the number of observations that fall in each subinterval \(I_i=\left[\frac{i-1}{T},\frac{i}{T}\right)\). Then, the joint distribution of the \(Q_i\)'s is multinomial \(Multi\left(n,p_1,...,p_T\right)\) where \(p_i\) is equal to the probability that some data point \(X_k\) will fall in the interval \(I_i\) (\(p_i = \int^{\frac{i}{T}}_{\frac{i-1}{T}}f(x)dx\)). In order to understand the relationship between the \(Q_i\)'s and the pdf \(f\), we will need to know the marginal distributions of the \(Q_i\)'s. We present two arguments for why \(Q_i\sim Poisson(np_i)\).